7 Tools

QSMM includes the adaptive top-down parser atd-parser and the adaptive bottom-up parser abu-parser. They learn a deterministic regular expression grammar based on a nondeterministic template grammar with the goal to parse training terminal symbol sequences with a maximum probability.

The adaptive top-down parser and adaptive bottom-up parser use a top-down template grammar—a template regular expression grammar intended for parsing a terminal symbol sequence in the top-down direction. The adaptive bottom-up parser requires obtaining a top-down template grammar from a bottom-up template grammar—a template regular expression grammar intended for parsing a terminal symbol sequence in the bottom-up direction. To convert a bottom-up template grammar to a top-down template grammar, you use the rege-markup-cfg tool for marking up source PCFG productions in the bottom-up template grammar and the rege-bottom-up tool for converting the bottom-up template grammar containing a markup of source PCFG productions to the top-down template grammar.

Designing a bottom-up template grammar for parsing entire parse units might be a non-trivial task, as parsing a root nonterminal symbol might terminate before the end of a parse unit. The rege-vit tool helps cope with this problem by creating a factored PCFG for parsing terminal symbol sequences up to specified length as full parse units. To obtain a factored top-down template grammar, take a bottom-up template grammar with a markup of source PCFG productions generated by rege-markup-cfg, process the grammar by rege-vit, and pass the output to rege-bottom-up.

The pcfg-generate-seq tool generates a pseudo-random terminal symbol sequence according to a specified PCFG. The pcfg-predict-eval tool might help evaluate the quality of prediction of symbols in a terminal symbol sequence generated according to a specified PCFG.