QSMM includes the adaptive top-down parser atd-parser and the adaptive bottom-up parser abu-parser.
They learn a deterministic regular expression grammar based on a nondeterministic template grammar with the goal to parse training terminal symbol sequences with a maximum probability.
The adaptive top-down parser and adaptive bottom-up parser use a top-down template grammar—a template regular expression grammar intended for parsing a terminal symbol sequence in the top-down direction.
The adaptive bottom-up parser requires obtaining a top-down template grammar from a bottom-up template grammar—a template regular expression grammar intended for parsing a terminal symbol sequence in the bottom-up direction.
To convert a bottom-up template grammar to a top-down template grammar, you use the rege-markup-cfg tool for marking up source PCFG productions in the bottom-up template grammar and the rege-bottom-up tool for converting the bottom-up template grammar containing a markup of source PCFG productions to the top-down template grammar.
Designing a bottom-up template grammar for parsing entire parse units might be a non-trivial task, as parsing a root nonterminal symbol might terminate before the end of a parse unit.
The rege-vit tool helps cope with this problem by creating a factored PCFG for parsing terminal symbol sequences up to specified length as full parse units.
To obtain a factored top-down template grammar, take a bottom-up template grammar with a markup of source PCFG productions generated by rege-markup-cfg, process the grammar by rege-vit, and pass the output to rege-bottom-up.
The pcfg-generate-seq tool generates a pseudo-random terminal symbol sequence according to a specified PCFG.
The pcfg-predict-eval tool might help evaluate the quality of prediction of symbols in a terminal symbol sequence generated according to a specified PCFG.