rege-asm (QSMM Programmer Manual)

Next: rege-test, Previous: pcfg-reach, Up: Auxiliary Programs [Contents][Index]

9.4 `rege-asm` ¶

The purpose of this program is debugging generating an assembler program and context-free grammar for a regular expression possibly located at the right-hand side of a production of a regular expression grammar.

The instruction set of a generated assembler program includes instructions for analyzing look-ahead terminal symbols, consuming terminal symbols, incrementing frequencies of productions of a context-free grammar, transferring control to subroutines for parsing nonterminal symbols, and returning control from the subroutines. For more information, see Assembler Instruction Set and Assembler Instruction Set.

A sequence of subexpressions in a regular expression results in a sequence of code blocks for the subexpressions in a generated assembler program. Assembler code blocks for ‘?’ and ‘*’ quantifiers and sets of alternatives separated by ‘|’ have specific structure. See Assembler Program Structure, for more information.

Run the program using one of the following command line formats.

Dumping an assembler program containing simplified instructions for a regular expression:
```
qsmm-example-rege-asm [ --nterm-min=INT ] REGEX
```

Dumping an assembler program containing normal instructions for a regular expression:

qsmm-example-rege-asm [ --nterm-min=INT ] [ --eos-marker ] --dump-asm=extended REGEX

Dumping a context-free grammar for a regular expression:

qsmm-example-rege-asm --dump-gram[=specific|replace] [ --recurs=right ] REGEX

Dumping statistics on a regular expression:

qsmm-example-rege-asm --dump-stats REGEX

The argument REGEX is a regular expression. Refer to Productions, for the regular expression syntax.

Example

$ qsmm-example-rege-asm --dump-asm=extended '([ "a" "b" ] D D .)* [ "b" "c" ]'
        ; BEG: ([ "a" "b" ] D D .)* [ "b" "c" ]
        prod    "E", 0
        ; BEG: ([ "a" "b" ] D D .)*
r1:
        ; FIRST: [ "a" "b" "c" ]
        peek    1
        joe     0, b1   ; "a"
        joe     1, t1_1 ; "b"
        jmp     s1      ; "c"
t1_1:
        ; "b"
        jprob   0.5, b1
        jmp     s1
b1:
        prod    "_E_1A", 1
                        ; stochastic on "b"
        ; BEG: [ "a" "b" ] D D .
        rd      "_E_1A", 1, 1, 1
                        ; [ "a" "b" ]
        call    D, 4
        call    D, 5
        rd      "_E_1A", 1, 4, 1
                        ; .
        ; END: [ "a" "b" ] D D .
        jmp     r1
s1:
        prod    "_E_1A", 0
                        ; stochastic on "b"
        ; END: ([ "a" "b" ] D D .)*
        rd      "E", 0, 1, 1
                        ; [ "b" "c" ]
        ; END: ([ "a" "b" ] D D .)* [ "b" "c" ]
        ret

The program rege-asm supports the following command line options:

--dump-asm[=simple|extended]

Dump an assembler program for probabilistic parsing a terminal symbol sequence according to the regular expression:

‘simple’: Dump an assembler program containing simplified instructions. The regular expression cannot contain terminal symbol classes and specific terminal symbols, but it can contain ‘.’.
‘extended’: Dump an assembler program containing normal instructions and instructions for setting up a correspondence between parts of the assembler program and the productions of a context-free grammar for the regular expression. The regular expression can contain terminal symbol classes and specific terminal symbols.

On omitting the option or its argument, the program rege-asm uses --dump-asm=simple.

--dump-gram[=specific|dot|replace] ¶

Dump a context-free grammar for the regular expression:

‘specific’: Dump the grammar where auxiliary nonterminal symbols _E_iT and _E_iTj replace terminal symbol sequences (groups) containing at least one terminal symbol class or ‘.’.
‘dot’: Dump the grammar where auxiliary nonterminal symbols _E_iT and _E_iTj replace terminal symbol sequences (groups) containing at least one terminal symbol class.
‘replace’: Dump the grammar where auxiliary nonterminal symbols _E_iT and _E_iTj replace any terminal symbol sequence.

In _E_iT and _E_iTj, i is the ordinal number of an auxiliary nonterminal symbol, and j is sequence length if it is greater than 1.

On omitting the option argument, the program uses --dump-gram=dot. On omitting the option, the program does not dump a context-free grammar.

--dump-stats

Dump statistics on the regular expression.

--eos-marker

Enable the use of the end-of-stream marker $$ in the regular expression. The end-of-stream marker becomes an extra element of a set of known terminal symbols.

--nterm-min=INT

The minimum number of terminal symbols. On passing the option --dump-asm=extended, the program rege-asm generates an assembler program referencing terminal symbols contained in the regular expression and, optionally, referencing the end-of-stream marker $$ (on passing the option --eos-marker). Pass the option --nterm-min=INT to generate an assembler program for a larger set of terminal symbols on passing the option --dump-asm=extended or generate an assembler program for a specified number of terminal symbols on passing the option --dump-asm=simple. The default minimum number of terminal symbols for generated assembler programs is 2.

--recurs=left|right

Recursion type for the productions of a context-free grammar dumped on passing the option --dump-gram[=specific|dot|replace]: left or right. By default, generate left-recursive productions.

9.4 rege-asm ¶

9.4 `rege-asm` ¶