Parsing Expression Grammar - part 2
At the end of part1 I had a parser that will parse a PEG, but isn't that useful because its output is simply a recursive data structure of the matched elements.
The next step is to add the ability to process each matching production rule. I do this by associating a function with each rule, the function takes two arguments - the matching string and the parsed elements - and returns a parse result. The parse result type is extended to allow passing an arbitrary type as a parse result…
type ParseResult<'a> =
| Parsed of 'a
| TerminalSymbol of string
| Production of ParseResult<'a> list
| EmptyMatch
| Unmatched
This adds the Parsed of 'a
type to the discriminated union.
Because each grammar element now requires three pieces of information (name, expression, function) I define a record type to make handling this easier. The function declaration of parseExpression
changes as expected.
type GrammarRule<'a>(prod:Expression, func:(string -> ParseResult<'a> -> ParseResult<'a>)) =
member this.Prod = prod
member this.Func = func
let parseExpression (grammar:Map<string,GrammarRule<'a>>) start (input:string) =
The changes needed to parseExpression are small…
| NonTerminal x ->
let rule = grammar.[x]
match parse offset rule.Prod with
| (Unmatched, _) as y -> y
| (parsed, endOffset) -> (rule.Func input.[offset..endOffset - 1] parsed, endOffset)
// …
parse 0 <| NonTerminal start
The original pattern match for NonTerminal
simply recursively called parse
with the definition for the NonTerminal
name, now the same recursive call is made but, if the result is a successful match, the function associated with the match is called and the result of that function is passed as the result of the match.
Grammar example – 1
Taking the same grammar I used in part 1, I now need to add a function to each rule.
In this example, I am evaluating the expression as it is being parsed, this is the simplest approach, later on I will show a two stage approach where the expression is converted to an AST that can be evaluated later.
// bool <- "true" / "false"
let boolRule = GrammarRule<bool>(Choice [Terminal "true"; Terminal "false"], (fun s _ -> Parsed (s = "true")))
// paren <- "(" expr ")"
let parseParen _ = function
| Production [TerminalSymbol "("; x; TerminalSymbol ")"] -> x
| x -> unexpected x
let parenRule = GrammarRule<bool>(Sequence [Terminal "("; NonTerminal "expr"; Terminal ")"], parseParen)
// not <- "!" atom
let parseNot _ = function
| Production [TerminalSymbol "!"; Parsed x] -> Parsed <| not x
| x -> unexpected x
let notRule = GrammarRule<bool>(Sequence [Terminal "!"; NonTerminal "atom"], parseNot)
// atom <- bool / paren / not
let atomRule = GrammarRule<bool>(Choice [NonTerminal "bool"; NonTerminal "paren"; NonTerminal "not"], (fun _ x -> x))
// and <- atom ("&" and)?
// or <- and ("|" or)?
let parseBin _ = function
| Production [x; EmptyMatch] -> x
| Production [Parsed x; Production [TerminalSymbol "&"; Parsed y]] -> Parsed (x && y)
| Production [Parsed x; Production [TerminalSymbol "|"; Parsed y]] -> Parsed (x || y)
| x -> unexpected x
let andRule = GrammarRule<bool>(Sequence [NonTerminal "atom"; Optional (Sequence [Terminal "&"; NonTerminal "and"])], parseBin)
let orRule = GrammarRule<bool>(Sequence [NonTerminal "and"; Optional (Sequence [Terminal "|"; NonTerminal "or"])], parseBin)
// expr <- or
let exprRule = GrammarRule<bool>(NonTerminal "or", (fun _ x -> x))
// start <- expr <epsilon>
let parseStart _ = function
| Production [x; EmptyMatch] -> x
| x -> unexpected x
let startRule = GrammarRule<bool>(Sequence [NonTerminal "expr"; Epsilon], parseStart)
Hopefully each of these is self explanatory, the same function can be used to
parse both &
and |
because they have equivalent structure; the first pattern
in parseBin
is taken when the Optional
second element of the Sequence
is
not matched.
let g = Map.ofList [("bool", boolRule); ("paren", parenRule); ("not", notRule);
("atom", atomRule); ("and", andRule); ("or", orRule);
("expr", exprRule); ("start", startRule)]
This code can now evaluate simple boolean expressions…
> parseExpression g "start" "true&false";;
val it : ParseResult<bool> * int = (Parsed false, 10)
> parseExpression g "start" "true|true&false";;
val it : ParseResult<bool> * int = (Parsed true, 15)
> parseExpression g "start" "(true|true)&false";;
val it : ParseResult<bool> * int = (Parsed false, 17)
Grammar example – 2
In this example the same grammar (with some minor changes, such as allowing whitespace) demonstrates parsing into an AST (the BooleanExpr
type) which is then evaluated in a second stage.
type BooleanExpr =
| Atom of bool
| Binary of BooleanExpr * string * BooleanExpr
| Not of BooleanExpr
let rec evalBool = function
| Atom x -> x
| Binary (left, "&", right) -> if evalBool left then evalBool right else false
| Binary (left, "|", right) -> if evalBool left then true else evalBool right
| Binary (_, x, _) -> failwith <| "Unexpected binary operator: " + x
| Not x -> not <| evalBool x
let boolGrammar =
// bool <- "true" / "false"
let boolRule = GrammarRule<BooleanExpr>(Choice [Terminal "true"; Terminal "false"], (fun s _ -> Parsed (Atom (s = "true"))))
// space <- " "*
let spaceRule = GrammarRule<BooleanExpr>(ZeroOrMore (Terminal " "), (fun _ _ -> EmptyMatch))
// paren <- "(" space expr space ")"
let boolParen _ = function
| Production [TerminalSymbol "("; EmptyMatch; x; EmptyMatch; TerminalSymbol ")"] -> x
| x -> unexpected x
let parenRule = GrammarRule<BooleanExpr>(Sequence [Terminal "("; NonTerminal "space"; NonTerminal "expr"; NonTerminal "space"; Terminal ")"], boolParen)
// not <- "!" atom
let boolNot _ = function
| Production [TerminalSymbol "!"; Parsed x] -> Parsed (Not x)
| x -> unexpected x
let notRule = GrammarRule<BooleanExpr>(Sequence [Terminal "!"; NonTerminal "atom"], boolNot)
// atom <- "bool" / "paren" / "not"
let atomRule = GrammarRule<BooleanExpr>(Choice [NonTerminal "bool"; NonTerminal "paren"; NonTerminal "not"], (fun _ x -> x))
// or <- atom space ("|" space or)?
// and <- or space ("&" space and)?
let boolBinary _ = function
| Production [x; EmptyMatch; EmptyMatch] -> x
| Production [Parsed x; EmptyMatch; Production [TerminalSymbol op; EmptyMatch; Parsed y]] -> Parsed (Binary (x, op, y))
| x -> unexpected x
let orRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "atom"; NonTerminal "space"; Optional (Sequence [Terminal "|"; NonTerminal "space"; NonTerminal "or"])], boolBinary)
let andRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "or"; NonTerminal "space"; Optional (Sequence [Terminal "&"; NonTerminal "space"; NonTerminal "and"])], boolBinary)
// expr <- and
let exprRule = GrammarRule<BooleanExpr>(NonTerminal "and", (fun _ x -> x))
// start <- space expr space <epsilon>
let boolStart _ = function
| Production [EmptyMatch; x; EmptyMatch; EmptyMatch] -> x
| x -> unexpected x
let startRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "space"; NonTerminal "expr"; NonTerminal "space"; Epsilon], boolStart)
Map.ofList [("bool", boolRule); ("space", spaceRule); ("paren", parenRule); ("not", notRule);
("atom", atomRule); ("or", orRule); ("and", andRule); ("expr", exprRule); ("start", startRule)]
let parseAndEvalBool expr =
match parseExpression boolGrammar "start" expr with
| (Parsed b, _) -> printfn "%s = %A" expr (evalBool b)
| x -> unexpected x
The code for this artice is on github.
In part 3 I make things a little more practical, expressing the grammar as literal objects is very clumsy — I build a parser that parses textual PEG expressions…