rege-bottom-up ¶This tool converts a bottom-up template grammar to a top-down template grammar.
$ cat sample-rege-bottom-up.rg
S: A B
| A C
;
A: "a"
| "a" "a"
;
B: "b"
| "b" "b"
;
C: "c"
| "c" "c"
;
$ rege-markup-cfg sample-rege-bottom-up.rg |
rege-bottom-up --cfg-markup --er-vnont-scope=grammar -
:c0: = [ "_S" ]
;
:c1: = [ "_A" ]
;
:c2: = [ "_B" "_C" ]
;
:n: = [ "__ER" "__START" :c0: :c1: :c2: ]
;
START: C "__START"
;
C: C0 ( "_S" < "__START" >
| "__ER" [^ :n: ] #pt-__er_consume < "__START" >
)
;
C0: C1 ( "_A" C2 ( "_B" #pn-S-0 < "_S" >
| "_C" #pn-S-1 < "_S" >
| "__ER" #pn-__ER-0 < "__ER" >
)
| "__ER" #pn-__ER-0 < "__ER" >
)
;
C1: "%a" ( "%a" ( #pt-_A_2T2 [ "%b" "%c" ]~ #pn-A-1 < "_A" >
| #pt-__er [ $$ "%a" ]~ #pn-__ER-0 < "__ER" >
)
| #pt-_A_1T [ "%b" "%c" ]~ #pn-A-0 < "_A" >
| #pt-__er $$~ #pn-__ER-0 < "__ER" >
)
| [^ "%a" :n: ]~ #pn-__ER-0 < "__ER" >
;
C2: "%b" ( "%b" ( #pt-_B_2T2 [ $$ "%a" ]~ #pn-B-1 < "_B" >
| #pt-__er [ "%b" "%c" ]~ #pn-__ER-0 < "__ER" >
)
| #pt-_B_1T [ $$ "%a" ]~ #pn-B-0 < "_B" >
| #pt-__er "%c"~ #pn-__ER-0 < "__ER" >
)
| "%c" ( "%c" ( #pt-_C_2T2 [ $$ "%a" ]~ #pn-C-1 < "_C" >
| #pt-__er [ "%b" "%c" ]~ #pn-__ER-0 < "__ER" >
)
| #pt-_C_1T [ $$ "%a" ]~ #pn-C-0 < "_C" >
| #pt-__er "%b"~ #pn-__ER-0 < "__ER" >
)
| [ $$ "%a" ]~ #pn-__ER-0 < "__ER" >
;
For usual cases, use the command line format:
rege-bottom-up [ -o OUTPUT_TD_GRAMMAR_FILE ] --cfg-markup \
[ --er-vnont-scope=grammar|nont ] [ --template ] \
INPUT_BU_GRAMMAR_FILE
where INPUT_BU_GRAMMAR_FILE is the name of a file containing an input bottom-up template grammar (see Bottom-Up Template Grammar); the filename ‘-’ means stdin.
As required by the option --cfg-markup, the bottom-up template grammar should contain source PCFG markup inserted by the rege-markup-cfg tool (see rege-markup-cfg).
After inserting the source PCFG markup, the bottom-up grammar could have been converted to a factored context-free grammar by the rege-vit tool (see rege-vit).
The rege-bottom-up tool supports the following command line options:
Fetch source PCFG markup from a bottom-up (input) template grammar and insert #pt… and #pn… specifiers for postfix production markup into a top-down (output) template grammar.
On omitting this option, a bottom-up template grammar must not contain source PCFG markup.
Indicate expected sets of terminal and nonterminal symbols in comments in an intermediate grammar dumped by the option --out-gram-interm=FILE.
The scope of uniqueness of error virtual nonterminal symbols ‘__ER…’:
One error virtual nonterminal symbol ‘__ER’ per the entire top-down template grammar.
One error virtual nonterminal symbol ‘__ERidx’ per one nonterminal symbol of a top-down template grammar, where idx is a nonterminal symbol index.
One error virtual nonterminal symbol ‘__ERidx’ per one leaf of a regular expression for a nonterminal symbol of a top-down template grammar, where idx is the ordinal number of an error virtual nonterminal symbol.
The scopes ‘grammar’ and ‘nont’ enable the use of recursive nonterminal symbols in a bottom-up (input) template grammar.
The default value is ‘leaf’.
Extend a set of known terminal symbols extracted from a bottom-up (input) template grammar with terminal symbols contained in a terminal symbol sequence in a specified file. A set of known terminal symbols affects the interpretation of terminal symbol placeholders ‘.’ and exclusive terminal symbol classes. As a result of this, a set of known terminal symbols may affect the number of branches generated for parsing a set of terminal symbol sequences beginning with overlapping terminal symbol classes in a bottom-up template grammar.
Dump a top-down (output) template grammar to a specified file. By default, dump the grammar to stdout.
Dump to a file an intermediate grammar converted from a bottom-up (input) template grammar. The filename ‘-’ means stdout.
Where possible, remove put-back (lookahead) terminal symbols from a top-down (output) template grammar by introducing branches selectable nondeterministically.
Replace all virtual terminal symbols and virtual terminal symbol classes in a top-down (output) template grammar with a generic virtual terminal symbol class denoted by [ :t: ].
Get rid of error virtual nonterminal symbols ‘__ER…’.
Use condensed format to dump the productions of a top-down (output) template grammar. By default, dump the productions in indented format.