rege-vit ¶This tool generates a factored context-free grammar for processing entire parse units up to specified length (in terminal symbols) according to a regular expression grammar. Nonterminal symbols of the context-free grammar contain length specifications after the character ‘/’. The length specifications determine exact lengths of terminal symbol sequences consumed by the nonterminal symbols.
This example uses a regular expression grammar from rege-markup-cfg.
$ rege-markup-cfg sample-rege-markup-cfg.rg | rege-vit --cfg-markup 5 - S/2 #l-S: A/1 B/1 #r0 ;
S/3 #l-S: A/1 B/2 #r0
| A/2 B/1 #r0
;
S/4 #l-S: A/1 B/3 #r0
| A/2 B/2 #r0
| A/3 B/1 #r0
;
S/5 #l-S: A/1 B/4 #r0
| A/2 B/3 #r0
| A/3 B/2 #r0
| A/4 B/1 #r0
;
A/1 #l-A: _A_1T: . #r0
;
A/2 #l-A: _A_2T2: . . #r1
;
A/3 #l-A: _A_3T3: . . . #r2
;
A/4 #l-A: _A_4T4: . . . . #r3
;
B/1 #l-B: _B_1T: . #r0
;
B/2 #l-B: _B_2T2: . . #r1
;
B/3 #l-B: _B_3T3: . . . #r2
;
B/4 #l-B: _B_4T4: . . . . #r3
;
For usual cases, use the command line format:
rege-vit [ -o OUTPUT_FACTORED_CFG_FILE ] --cfg-markup \
MAX_PARSE_UNIT_LENGTH INPUT_BU_GRAMMAR_FILE
where:
MAX_PARSE_UNIT_LENGTHMaximum supported length of a parse unit counted in terminal symbols.
INPUT_BU_GRAMMAR_FILEThe name of a file containing an input bottom-up template grammar (see Bottom-Up Template Grammar).
The filename ‘-’ means stdin.
As required by the option --cfg-markup, the bottom-up template grammar should contain source PCFG markup inserted by the rege-markup-cfg tool (see rege-markup-cfg).
The rege-vit tool supports the following command line options:
Fetch source PCFG markup from an input regular expression grammar and generate a factored context-free grammar containing corresponding markup. On omitting this option, an input regular expression grammar must not contain source PCFG markup.
Indicate expected sets of terminal and nonterminal symbols in comments in an intermediate grammar dumped by the option --out-gram-interm=FILE.
Dump a factored context-free grammar to a specified file. By default, dump the grammar to stdout.
Dump to a file an intermediate grammar converted from an input regular expression grammar. The filename ‘-’ means stdout.
Replace all terminal symbols and terminal symbol classes in a factored context-free grammar with the terminal symbol placeholder ‘.’.