An actor emits output signals stochastically according to their probabilities it calculates. The actor collects statistics on observed consequences of emitting various output signals in various action choice states. It treats such observed consequences as dependent only on a specific action choice state and a specific output signal emitted in the action choice state. An event history segment for observing the consequences is a cycle—a segment between an occurrence of that action choice state in the event history and the next occurrence of that action choice state in the event history. The actor uses the collected statistics to calculate the probabilities of emitting allowed output signals in a current action choice state.
For every occurrence of an action choice state in the event history, the actor updates statistics on a cycle type—the pair comprised of this action choice state and an output signal emitted at the previous occurrence of this action choice state in the event history. The actor updates the statistics with the following parameters:
The actor calculates the probabilities of output signals to emit in a current action choice state using statistics collected for pairs comprised of this action choice state and each allowed output signal. Such statistics includes the following parameters for event history segments between an occurrence of this action choice state with emitting the output signal and the next occurrence of this action choice state:
Using the sum of spur increments over the segments and the total time length of those segments, the actor can calculate the mean velocity of spur increment over the segments. The higher the mean spur increment velocity is for segments specified by the pair comprised of an action choice state and an output signal, the greater the probability of emitting the output signal in the action choice state can be.
Let us consider a simplistic example. Suppose the actor has recorded that emitting output signal 18 in an action choice state concluded with spur increment +7 over a period of 40 time units until the action choice state occurred the next time in the event history. Suppose the actor has also recorded that emitting output signal 19 at a different occurrence of this action choice state concluded with spur increment +6 over a period of 30 time units until the action choice state occurred again in the event history. Therefore, when emitting next output signal in this action choice state, the actor will select signal 19 with higher probability than signal 18 because spur increment velocity 6/30 for signal 19 is greater than spur increment velocity 7/40 for signal 18. When the actor emits the same output signal in the same action choice state more than once, it accumulates the statistics and uses mean spur increment velocity to calculate the probability of emitting an output signal.
In QSMM, the actor uses a more complex method of selecting an output signal, so the above example is a simplification that does not fully agree with practice. See Customizing the Relative Probability Function, for more information about available and user-defined functions utilizing various kinds of statistics to calculate the probability of emitting an output signal.
To improve reaction to latest tendencies in the event history, the actor may remember statistics for time periods shorter than the entire event history.