rege-markup-cfg ¶This tool marks up the productions of a source PCFG in a bottom-up template grammar. The productions can have terminal symbols at the right-hand side or nonterminal symbols at the right-hand side.
The tool adds terminal symbol segment names (see Named Segments) acting as nonterminal symbols at the left-hand side of productions of a source PCFG with terminal symbols at the right-hand side.
Example
If a production right-hand side contains the expression
( . . ) ( . . . )the
rege-markup-cfgtool may convert it to the expression_S_1T2: . . _S_2T3: . . .where the terminal symbol segment names
_S_1T2and_S_2T3act as nonterminal symbols of a source PCFG.
For source PCFG productions with nonterminal symbols at the right-hand side, rege-markup-cfg adds the markers #l… (see Marking a Left-Hand Side) and #r… (see Marking a Right-Hand Side).
Example
The
rege-markup-cfgtool converts the productionS: A ( B | C | D ) E F | G (H I)* J ;to the production
S #l-S: A #l-_S_1C ( B #r0 | C #r1 | D #r2 ) E F #r0 | G (#l-_S_2A H I #r0)* J #r1 ;
To mark up source PCFG productions for the ‘?’ quantifier, rege-markup-cfg changes the expression to two alternatives separated by ‘|’, where the first alternative is the empty one, and the second alternative is a subexpression under the ‘?’ quantifier.
Example
The
rege-markup-cfgtool converts the productionS: A (B C)? D ;to the production
S #l-S: A #l-_S_1Q ( #r0 | B C #r1 ) D #r0 ;
$ cat sample-rege-markup-cfg.rg
S: A B
;
A: .
| . .
| . . .
| . . . .
;
B = A ;
$ rege-markup-cfg sample-rege-markup-cfg.rg
S #l-S: A B #r0
;
A #l-A: _A_1T: . #r0
| _A_2T2: . . #r1
| _A_3T3: . . . #r2
| _A_4T4: . . . . #r3
;
B #l-B: _B_1T: . #r0
| _B_2T2: . . #r1
| _B_3T3: . . . #r2
| _B_4T4: . . . . #r3
;
For usual cases, use the command line format:
rege-markup-cfg [ -o OUTPUT_BU_GRAMMAR_FILE ] [ --nont-class ] INPUT_BU_GRAMMAR_FILE
where INPUT_BU_GRAMMAR_FILE is the name of a file containing an input bottom-up template grammar (see Bottom-Up Template Grammar); the filename ‘-’ means stdin.
The rege-markup-cfg tool supports the following command line options:
Allow using the error nonterminal symbol in a regular expression grammar.
See Error Nonterminal Symbol, for more information.
Allow using nonterminal symbol classes in a regular expression grammar. See Nonterminal Symbol Classes, for more information.
Dump a context-free grammar to a file. The filename ‘-’ means stdout. The context-free grammar is a source PCFG that does not contain production probabilities and productions with terminal symbols at the right-hand side. A markup added to a regular expression grammar corresponds to the source PCFG. This option conflicts with the option --nont-class. By default, do not dump the context-free grammar.
Dump a regular expression grammar containing the markup of source PCFG productions to a file. By default, dump the regular expression grammar to stdout.
Recursion type for a context-free grammar dumped using the option --out-cfg=FILE: left or right. The default value is ‘left’.