Sample Data and Code to Accompany "An Intuitive Markov Chain Lesson From Baseball"

Summary

The files below contain Matlab code and a sample data set (the 2001 Atlanta Braves) which students and instructors can use as a supplement to the Markov chain lesson described in the paper "An Intuitive Markov Chain Lesson From Baseball" (Sokol, Informs Transactions on Education 2004). The code and data set use a simple event model to make it easy for users to create and analyze their own batting order data sets.

Files

League data: This file contains league total data that can be used to calculate transition probabilities, situational values, event values, etc. Several leagues' totals are included. Note that only basic data is included here, so that students do not get bogged down in the details of baseball. Results using very detailed data (errors, baserunner advancement probabilities, steals, etc.) are not very different from those obtained with the basic data. The columns in this data file follow the same format as described below in "Creating New Data Sets".

Matlab code: The Matlab files in this zip archive contain the Markov chain calculations necessary to evaluate batting orders.

Data file: This file contains the input data for the 2001 Atlanta Braves. The instructions below describe how users can easily create their own data sets.

Usage

Installing and Running the Program

  1. Extract all of the files from the zip archive into the directory you plan to use for this application.
  2. Copy the data file (or create your own data file) into the same directory.
  3. Open Matlab from this directory.
  4. Start the program by giving Matlab the command battingorder datafilename where "datafilename" is the name of the file containing the data set you wish to use.
  5. Each time you are prompted for a batting order, enter it in the form 123456789, where each number is a player in your data set. (So, 123456789 has the first player in your data set batting first, the second player in your data set batting second, etc. 421356789 has the fourth player in your data set batting first, etc.) Note that this allows you to duplicate players; in fact, by entering an order like 555555555 you can answer the question "how many runs would my team score if everyone was like the fifth player in my data set".
  6. In addition to appearing on your screen, all output will be saved to the file datafilename.out, where "datafilename" is the same as above.
  7. To end the program, hit enter (without any numbers) at the batting order prompt.

Creating New Data Sets

Data sets should have 9 rows (one for each player) of 7 columns each:

  1. Column 1: home runs
  2. Column 2: triples
  3. Column 3: doubles
  4. Column 4: singles
  5. Column 5: walks
  6. Column 6: at-bats
  7. Column 7: player name (or other comments)

As mentioned above, the model used in the Matlab code is simplified to include only these basic events, so that users can easily create their own data sets from current or historical data.

Comments

Please send comments, bugs, etc. to Joel Sokol