9.5 rege-test

The purpose of this program is debugging parsing a regular expression grammar, dumping a parsed grammar with optional printing its FIRST sets, simplifying a regular expression grammar, and dumping assembler programs representing the nonterminal symbols of a regular expression grammar.

Run the program using one of the following command line formats.

  1. Dumping a parsed regular expression grammar with optional removing unreachable productions from it:
    qsmm-example-rege-test [ --nont-class ] [ --eos-marker ]
                           [ --simplify=reachable ] [ --dump-ord ]
                           [ --terse ] ( --retain=NONT )*
                           REGEX_GRAMMAR_FILE
    
  2. Partially simplifying a regular expression grammar:
    qsmm-example-rege-test --simplify [ --eos-marker ]
                           ( --retain=NONT )* REGEX_GRAMMAR_FILE
    
  3. Dumping the FIRST sets of a regular expression grammar:
    qsmm-example-rege-test --dump-first [ --dump-ord ]
                           [ --eos-marker ] REGEX_GRAMMAR_FILE
    
  4. Checking the correctness of a regular expression grammar and returning zero exit status if the grammar is correct or dumping a list of errors and returning non-zero exit status if the grammar has a syntax error:
    qsmm-example-rege-test --quiet [ --eos-marker ] REGEX_GRAMMAR_FILE
    
  5. Dumping assembler programs for the nonterminal symbols of a regular expression grammar:
    qsmm-example-rege-test --dump-asm [ --eos-marker ]
                           [ --term-prefix=STR ] REGEX_GRAMMAR_FILE
    

The argument REGEX_GRAMMAR_FILE specifies the name of a file containing a regular expression grammar. If that argument is ‘-’, the program reads a regular expression grammar from stdin. See Top-Down Template Grammar and Bottom-Up Template Grammar, for the regular expression grammar format.

The program rege-test supports the following command line options:

--dump-asm

Dump assembler programs for the nonterminal symbols of the regular expression grammar.

--dump-first

Include FIRST sets in a dumped regular expression grammar as comments. By default, dump a regular expression grammar without FIRST sets.

--dump-ord

Include ordinal numbers of AST nodes in a dumped regular expression grammar as comments. By default, dump a regular expression grammar without ordinal numbers of AST nodes.

--eos-marker

Enable the use of the end-of-stream marker $$ in the regular expression grammar. The end-of-stream marker becomes an extra element of a set of known terminal symbols.

--nont-class

Enable the use of nonterminal symbol classes in the regular expression grammar. See Nonterminal Symbol Classes, for more information. This mode is incompatible with the options --dump-asm, --dump-first, and --simplify[=all].

--simplify[=reachable|all]

Simplify the regular expression grammar before processing:

reachable

Retain only reachable productions in the grammar.

all

Remove unreachable productions and partially simplify a remaining grammar.

On omitting the option argument, the program uses --simplify=all. On omitting the option, the program does not simplify a parsed grammar.

-S, --retain=NONT

Retain a specified nonterminal symbol in the regular expression grammar when simplifying it using the option --simplify[=reachable|all]. The option -S, --retain=NONT can occur multiple times on the command line. On omitting this option, when simplifying a parsed grammar, the program may remove any nonterminal symbol from it except for start nonterminal symbols.

-q, --quiet

Do not dump a parsed regular expression grammar—only return zero exit status if the regular expression grammar is correct, or print error messages and return non-zero exit status if the grammar is incorrect. On omitting this option, the program can dump a regular expression grammar if the option --dump-asm is absent on the command line.

--term-prefix=STR

A prefix for names of virtual terminal symbols. If the name of a terminal symbol has the prefix, the terminal symbol is a virtual terminal symbol, otherwise the terminal symbol is a virtual nonterminal symbol. Currently, this option only affects whether or not wr instructions (see wr Instruction) in a generated assembler program have the second argument.

--terse

Dump regular expressions in the productions of a parsed regular expression grammar in condensed format. By default, dump the regular expressions in indented format.