An actor emits output signals stochastically according to their probabilities the actor calculates. It collects statistics on observed consequences of emitting various output signals in various action choice states. The actor treats the observed consequences as dependent only on a specific action choice state and a specific output signal emitted in the action choice state. An event history segment for observing the consequences is a cycle—a segment between an occurrence of the action choice state in the event history and the next occurrence of the action choice state in the event history. The actor uses the collected statistics to calculate the probabilities of emitting output signals allowed in a current action choice state.
For every occurrence of an action choice state in the event history, the actor updates statistics on a cycle type—a pair comprised of the action choice state and an output signal emitted at the previous occurrence of the action choice state in the event history. The actor updates the statistics with the following parameters:
The actor calculates probabilities of output signals to emit in a current action choice state using statistics collected for pairs comprised of the action choice state and each allowed output signal. This statistics includes the following parameters for event history segments between an occurrence of the action choice state with emitting the output signal and the next occurrence of the action choice state:
Using the sum of spur increments over the segments and the total time length of the segments, the actor can calculate the mean velocity of spur increment over the segments. The higher the mean spur increment velocity is for segments specified by the pair comprised of an action choice state and output signal (i.e. specified by a cycle type), the greater the probability of emitting the output signal in the action choice state can be.
Let us consider a simplistic example. Suppose, an actor has recorded that emitting output signal 18 in an action choice state concluded with spur increment +7 over a period of 40 time units until the action choice state occurred the next time in the event history. Suppose, the actor has also recorded that emitting output signal 19 at another occurrence of the action choice state concluded with spur increment +6 over a period of 30 time units until the action choice state occurred again in the event history. Therefore, when emitting next output signal in the action choice state, the actor will select signal 19 with higher probability than signal 18 because spur increment velocity 6/30 for signal 19 is greater than spur increment velocity 7/40 for signal 18. When an actor emits the same output signal in the same action choice state more than once, the actor accumulates the statistics and uses mean spur increment velocity to calculate the probability of emitting an output signal.
In QSMM, the actor uses a more complex method of selecting an output signal, so the above example is a simplification that does not fully agree with practice. See Customizing the Relative Probability Function, for more information about available and user-defined functions utilizing various kinds of statistics to calculate the probability of emitting an output signal.
To improve reaction to latest tendencies in the event history, the actor may memorize statistics for time periods shorter than the entire event history.