The Machine Repair Problem is a simple abstraction of broader issues that arise when attempting to regulate a stochastic process whose exact state is not known to the decision-maker. It belongs to the more general class of partially-observable Markov decision problems. The specific problem to be solved here is loosely adapted from lecture notes on the MIT OpenCourseWare website.
A machine is to be operated over several production periods. At the start of the first period, it is in good operating condition, but during any operating period, it may break down. At the start of each period, the decision-maker has the option of performing a repair, which is guaranteed to return the machine to its initial operating state.
The decision-maker performs an inspection on the machine at the end of each period. The inspection yields an observation that the machine is probably operational, or probably broken, but the observations are not perfect. So the decision whether to repair is based on the history of past observations and decisions, which could occasionally be misleading.
A machine can be in one of two states: OPERATIONAL or FAILED. It is initially OPERATIONAL. During each period, the machine fails with probability 1/3.
At the start of each period, a decision is made either to CONTINUE using the machine in its current state or perform a REPAIR. The cost of repair is 1, regardless of the state it turns out to have been in. If the machine has already failed at the start of a period, a cost of 2 is incurred.
At the end of each period, the machine is inspected, with two possible outcomes: GOOD (the machine appears to be OPERATIONAL) or BAD (the machine appears to have FAILED). The probability of incorrectly reporting the machine's state is 1/4.
The machine failures and observation errors are statistically independent. The decision-maker cannot directly observe the machine state, and is not informed of the costs incurred from operating the machine in the failed state.
In our first attempt to solve the problem, we will consider three production periods.
In Hava, these problem parameters are expressed as:
token OPERATIONAL, CONTINUE, REPAIR, FAILED, GOOD, BAD;
pf = 1/3; // Probability machine fails in a period
pe = 1/4; // Probability of observation error in an inspection
g(machineState,action) =
if (action==REPAIR) {1}
else if (machineState==FAILED) {2}
else {0};
The basis for decision-making, also known as the decision-maker's state, is the current period number and the history of past decisions and observations. We declare this as a Hava structure:
struct State(period, history);
history
will contain observations only,
and its length will be no more than period
, and possibly less.
The dynamics are made precise by three functions of the state. The first
returns the observer's state at the end of a period that starts in state state
,
experiences action action
, and observes observation
.
The period of the new state is simply one plus the
period of the old state. If the decision is to CONTINUE
,
then the new history is simply a concatenation of the old history with the new observation.
If the decision is to REPAIR
,
then the new history contains only the new observation. This is written as:
nextState(state, action, observation) =
State(
state.period+1,
join(
if (action==CONTINUE)
{state.history}
else
{()},
observation
)
);
The second function returns the previous state. It is undefined if the current history is empty. Otherwise, it decrements the period, and removes the final observation from the history. It is written as:
prevState(state) =
State(
state.period-1,
join(period=1 to state.history.listSize-1)
{state.history[period]}
);
The last function simply extracts the most recent observation in the history. It, too, is undefined if the list is empty:
lastObservation(state) = state.history[state.history.listSize];
It is well known that whatever we need to determine from the state can also be determined from the following function of the state:
p(s) = Prob{the machine is in state PRODUCTIVE | the observer is in state s}
p(s) = 1
.
If the state history is not empty, we can apply
Bayes Law
to obtain this value from the value of the previous state (see Wikipedia article cited
at the top of this page):
p(state) =
if (state.history==()) {1}
else if (lastObservation(state)==GOOD)
{(1-pe)*(1-pf)*pp(state)/((1-pe)*(1-pf)*pp(state) + pe*(1-(1-pf)*pp(state)))}
else
{pe*(1-pf)*pp(state)/(pe*(1-pf)*pp(state) + (1-pe)*(1-(1-pf)*pp(state)))};
pp(state) = p(prevState(state));
We will need a couple more service functions, but we are far enough along now to write the Bellman Equation. This constitutes the heart of the formulation. It is expressed as three lines of Hava:
optAction(state) = argmin(action in (CONTINUE,REPAIR)) {v(state,action)};
v(state) = if (state.period < N) {v(state,optAction(state))} else {0};
v(state,action) =
g_(state,action) +
sum(observation in(GOOD,BAD))
{p(state,action,observation)*v(nextState(state,action,observation))};
These lines contain three identifiers we have yet to define:
The first function is actually straightforward (it happens!):
g_(state,action) = p(state)*g(OPERATIONAL, action) + (1-p(state))*g(FAILED,action);
The second is a bit more complicated. It is constructed recursively, so that, when action==REPAIR, it simply starts over, using an empty history.
p(state,action,observation) =
if (action==REPAIR)
{p(State(state.period, ()), CONTINUE, observation)}
else if (observation==GOOD)
{(1-pe)*(1-pf)*p(state) + pe*(1-(1-pf)*p(state))}
else
{pe*(1-pf)*p(state) + (1-pe)*(1-(1-pf)*p(state))};
The following lines complete the Hava formulation of this problem:
N = 3;
SOLUTION = v(State(0, ()));
Click here to view the complete solution, using observation history as the decision-maker's state.
Notice that the optimal policy for this problem is quite simple: So long as the machine looks GOOD, keep running it (CONTINUE). When it looks BAD, repair it (REPAIR). For different values of the parameters, the result will not be quite so simple. Try, for example, solving this problem when the observations are less reliable (pe=0.4), or the machine is less likely to fail during a period (pf=0.1).
Another intuitive property of the solution is that one decides to CONTINUE when p(state) is high, and REPAIR when p(state) is low. In other words, for some threshold p*, optU(state) = CONTINUE if p(s)>p* and optAction(state) = REPAIR if p(state)<p*. These kinds of simple rules no longer hold for generalizations of this problem, for example, if the machine can fail in more than one way.
Another way to approach this problem is to substitute the sufficient statistic for the history of observations in the decision-maker's state. While this introduces some difficulties in the formal analysis, it actually is easier to do in Hava! The trick is to use the granulation feature. Although the sufficient statistic is, in theory, any number from zero to one, we will only retain three significant digits, so that it can have only one thousand possible values. This introduces a small numerical error into the calculations, but not enough to seriously affect the outcome.
Click here to view the complete solution, using the sufficient statistic as the decision-maker's state.