7.1.1.5 Named Segments

A named terminal symbol segment is a name followed by ‘:’ followed by a sequence of terminal symbols, terminal symbol placeholders, and terminal symbol classes. The name can contain digits, English letters, and ‘_’ and must begin with an English letter or ‘_’.

Example:

response_a: . . . .

The name can serve for descriptive purposes or for the identification of a sequence of terminal symbols, terminal symbol placeholders, and terminal symbol classes in a grammar.

Alternatively, the name can serve as the left-hand side of PCFG productions for read instances of named terminal symbol segments. Using the name as the left-hand side makes sense when multiple named terminal symbol segments have the same name. Multiple named terminal symbol segments can have the same name if they have the same length counted in terminal symbols.

Example

A grammar contains the named terminal symbol segment

position: "above" . .

in one production and the named terminal symbol segment

position: "below" . .

in another production.

Suppose that while parsing a training terminal symbol sequence, the first named terminal symbol segment consumed the two terminal symbol sequences

above the line
above the circle

and the second named terminal symbol segment consumed the two terminal symbol sequences

below the plane
below the sphere

A PCFG generated for the parsed terminal symbol sequence would contain the following definition of the nonterminal symbol:

position: "above" "the" "line"
        | "above" "the" "circle"
        | "below" "the" "plane"
        | "below" "the" "sphere"
;

Note: a bottom-up template grammar uses named terminal symbol segments to associate nonterminal symbols of a source PCFG with sequences of terminal symbols, terminal symbol placeholders, and terminal symbol classes. The productions of a source PCFG with the nonterminal symbols at the left-hand side have terminal symbol sequences at the right-hand side.