An example of event history is shown in Figure 2.1. Filled dots denote input signals, and unfilled dots denote output signals. Above each dot, a signal identifier is indicated. In the example event history the pattern “input signal–input signal–input signal–output signal” is used, but in your programs you can use other patterns. Values below the time line indicate spur increments that take place between receiving input signals.
To make the description of actor operating principles clearer, a single spur type is used in the example. The spur type has a normal way of spur perception, weight 1, and a continuous type of time used to compute spur increment velocity. For that spur type, the goal of actor operation is to emit output signals to increase the value of the spur as quickly as possible.
An actor’s state represented by a list of signal identifiers of fixed length and being the base for choosing an output signal to emit is called an action choice state. As a rule, being in an action choice state, an actor produces an action by the means of emitting an output signal. The action needs to be optimal in a certain sense. Usually, an action choice state is a known current system or environment state requiring an optimal action to be produced.
In the example it is indicated that an action choice state is a list of identifiers of input signals received by the actor between emitting output signals. An action choice state denoted in the figure by a horizontal square bracket is represented by list <1, 2, 5>. In some cases, it may be necessary to include in an action choice state the last output signal emitted. Thus, an example action choice state would be represented by list <7, 1, 2, 5>.
An actor generates output signals stochastically. It calculates the probability of emitting a specific output signal in a specific action choice state using statistics collected on the event history. The actor keeps track of output signals it emits in action choice states it has been encountering since the beginning of its operation and monitors changes in spur. Simplistically speaking, the actor emits in an action choice state more often output signals that result in higher spur increment velocities calculated for periods of time passed between moments when the actor encountered the action choice state and the next moments when the actor encountered this action choice state again.
Let us consider a simplistic example. Suppose the actor has recorded that emitting output signal 9 in action choice state <1, 2, 5> concluded with spur increment +7 over a period of 40 time units until the actor encountered action choice state <1, 2, 5> the next time in the event history. Suppose the actor has also recorded that emitting output signal 8 in action choice state <1, 2, 5> concluded with spur increment +6 over a period of 30 time units until the actor encountered action choice state <1, 2, 5> in the event history again. Therefore, when generating next output signal in action choice state <1, 2, 5>, the actor will choose signal 8 with higher probability than signal 9 because spur increment velocity 6/30 for signal 8 is greater than spur increment velocity 7/40 for signal 9. When the actor emits the same output signal in the same action choice state more than once, it accumulates the statistics and uses mean spur increment velocity to calculate the probability of output signal choice.
In the QSMM framework more complex formulas are actually used, so the above explanation of the principle of output signal choice should be considered as a simplification that does not fully agree with practice but makes the algorithm easier to understand. For example, the actual formula that yields the probability of choice of a specific output signal in a specific action choice state can utilize other kinds of statistics on the event history. Such statistics can include the number of occurrences of that action choice state immediately followed by emitting the output signal. It can also include the total number of signals encountered between occurrences of that action choice state immediately followed by emitting the output signal and the next occurrences of that action choice state. See Customizing the Relative Probability Function, to get more information about available and user-defined functions for calculating the probability of emitting an output signal.
Specific moments of time, when an actor emits output signals, are determined by an application program that uses the actor. For example, an Actor API function might return information to the application program about an actor’s decision that it is optimal to emit output signal 8 in the current action choice state <1, 2, 5>. However, a particular moment of time, when the actor emits that output signal, must be chosen by the application program.