7.4.5 Assembler Program Structure

An assembler program for parsing a terminal symbol sequence consists of blocks, where every block is either a primitive block or a complex block. The assembler program can begin with a prod instruction and end with a ret instruction.

A primitive block is an rd, scn, wr, or call instruction.

A complex block corresponds to a ‘?’ or ‘*’ quantifier or a set of alternatives separated by ‘|’. Every complex block begins with a look-ahead analysis sub-block. An expression under the ‘?’ or ‘*’ quantifier or every alternative separated by ‘|’ is a sequence of blocks.

Look-Ahead Analysis Sub-Block

A look-ahead analysis sub-block provides branching based on a look-ahead terminal symbol. The sub-block begins with a peek instruction for fetching a look-ahead terminal symbol. A series of joe instructions after the peek instruction transfers control on particular look-ahead terminal symbols. A jmp or abrt instruction finalizes the series.

When a set of expected look-ahead terminal symbols is equal to a set of all possible terminal symbols, the jmp instruction transfers control on a remaining look-ahead terminal symbol not tested by the preceding joe instructions. Otherwise, the abrt instruction aborts parsing a terminal symbol sequence on encountering an unexpected terminal symbol.

Example

        ; FIRST: [ "a" "b" "c" ]
        peek    1
        joe     0, t1_0 ; "a"
        joe     1, t1_1 ; "b"
        joe     2, a3   ; "c"
        abrt

This assembler program fragment can analyze a look-ahead terminal symbol for the regular expression ‘"a" | [ "a" "b" ] | [ "a" "b" "c" ]’.

Jump targets of the joe instructions and an optional jmp instruction following them can be:

Example

The following code fragment contains definitions of t1_0 and t1_1 labels referenced in the previous example with peek, joe, and abrt instructions:

t1_0:
        ; "a"
        choice
        case    3.333333333333333E-01, a1
        case    3.333333333333333E-01, a2
        end     choice
        jmp     a3
t1_1:
        ; "b"
        jprob   0.5, a2
        jmp     a3

|’ Alternatives

A block for a set of alternatives separated by ‘|’ has the following structure:

        A look-ahead analysis sub-block transferring control to one
        of the labels aI0, aI1, ..., aIn

aI0:    prod    NONT_QUOTED, 0  ; increments the frequency of a
                                ; production for alternative I0

        Code for alternative I0

        jmp     eJ

aI1:    prod    NONT_QUOTED, 1  ; increments the frequency of a
                                ; production for alternative I1

        Code for alternative I1

        jmp     eJ

        ...

aIn:    prod    NONT_QUOTED, n  ; increments the frequency of a
                                ; production for alternative In

        Code for alternative In

eJ:     Code for an expression following the set of
        alternatives separated by `|'

?’ Quantifier

A block for the ‘?’ quantifier has the following structure:

        A look-ahead analysis sub-block transferring control to
        the label sI or bI

sI:     prod    NONT_QUOTED, 0  ; increments the frequency of an empty
                                ; production on omitting the execution
                                ; of the quantified expression
        jmp     eJ

bI:     prod    NONT_QUOTED, 1  ; increments the frequency of a
                                ; production corresponding to the
                                ; execution of the quantified
                                ; expression

        Code for the quantified expression

eJ:     Code for an expression following the `?' quantifier

*’ Quantifier

A block for the ‘*’ quantifier has the following structure:

rJ:     A look-ahead analysis sub-block transferring control to
        the label bI or sI

bI:     prod    NONT_QUOTED, 1  ; increments the frequency of a
                                ; production corresponding to a repeat
                                ; of the quantified expression

        Code for the quantified expression

        jmp     rJ

sI:     prod    NONT_QUOTED, 0  ; increments the frequency of an empty
                                ; production on finishing repeating the
                                ; quantified expression

        Code for an expression following the `*' quantifier