Parsing Expression Grammar - part 2
At the end of part 1 I had a parser that will parse a PEG, but isn't that useful because its output is simply a recursive data structure of the matched elements.
The next step is to add the ability to process each matching production rule. I do this by associating a function with each rule, the function takes two arguments - the matching string and the parsed elements - and returns a parse result. The parse result type is extended to allow passing an arbitrary type as a parse result…
type ParseResult<'a> =
| Parsed of 'a
| TerminalSymbol of string
| Production of ParseResult<'a> list
| EmptyMatch
| UnmatchedThis adds the Parsed of 'a type to the discriminated union.
Because each grammar element now requires three pieces of information (name, expression, function) I define a record type to make handling this easier. The function declaration of parseExpression changes as expected.
type GrammarRule<'a>(prod:Expression, func:(string -> ParseResult<'a> -> ParseResult<'a>)) =
member this.Prod = prod
member this.Func = func
let parseExpression (grammar:Map<string,GrammarRule<'a>>) start (input:string) =The changes needed to parseExpression are small…
| NonTerminal x ->
let rule = grammar.[x]
match parse offset rule.Prod with
| (Unmatched, _) as y -> y
| (parsed, endOffset) -> (rule.Func input.[offset..endOffset - 1] parsed, endOffset)
// …
parse 0 <| NonTerminal startThe original pattern match for NonTerminal simply recursively called parse with the definition for the NonTerminal name, now the same recursive call is made but, if the result is a successful match, the function associated with the match is called and the result of that function is passed as the result of the match.
Grammar example – 1
Taking the same grammar I used in part 1, I now need to add a function to each rule.
In this example, I am evaluating the expression as it is being parsed, this is the simplest approach, later on I will show a two stage approach where the expression is converted to an AST that can be evaluated later.
// bool <- "true" / "false"
let boolRule = GrammarRule<bool>(Choice [Terminal "true"; Terminal "false"], (fun s _ -> Parsed (s = "true")))
// paren <- "(" expr ")"
let parseParen _ = function
| Production [TerminalSymbol "("; x; TerminalSymbol ")"] -> x
| x -> unexpected x
let parenRule = GrammarRule<bool>(Sequence [Terminal "("; NonTerminal "expr"; Terminal ")"], parseParen)
// not <- "!" atom
let parseNot _ = function
| Production [TerminalSymbol "!"; Parsed x] -> Parsed <| not x
| x -> unexpected x
let notRule = GrammarRule<bool>(Sequence [Terminal "!"; NonTerminal "atom"], parseNot)
// atom <- bool / paren / not
let atomRule = GrammarRule<bool>(Choice [NonTerminal "bool"; NonTerminal "paren"; NonTerminal "not"], (fun _ x -> x))
// and <- atom ("&" and)?
// or <- and ("|" or)?
let parseBin _ = function
| Production [x; EmptyMatch] -> x
| Production [Parsed x; Production [TerminalSymbol "&"; Parsed y]] -> Parsed (x && y)
| Production [Parsed x; Production [TerminalSymbol "|"; Parsed y]] -> Parsed (x || y)
| x -> unexpected x
let andRule = GrammarRule<bool>(Sequence [NonTerminal "atom"; Optional (Sequence [Terminal "&"; NonTerminal "and"])], parseBin)
let orRule = GrammarRule<bool>(Sequence [NonTerminal "and"; Optional (Sequence [Terminal "|"; NonTerminal "or"])], parseBin)
// expr <- or
let exprRule = GrammarRule<bool>(NonTerminal "or", (fun _ x -> x))
// start <- expr <epsilon>
let parseStart _ = function
| Production [x; EmptyMatch] -> x
| x -> unexpected x
let startRule = GrammarRule<bool>(Sequence [NonTerminal "expr"; Epsilon], parseStart)Hopefully each of these is self explanatory, the same function can be used to parse both & and | because they have equivalent structure; the first pattern in parseBin is taken when the Optional second element of the Sequence is not matched.
let g = Map.ofList [("bool", boolRule); ("paren", parenRule); ("not", notRule);
("atom", atomRule); ("and", andRule); ("or", orRule);
("expr", exprRule); ("start", startRule)]This code can now evaluate simple boolean expressions…
> parseExpression g "start" "true&false";;
val it : ParseResult<bool> * int = (Parsed false, 10)
> parseExpression g "start" "true|true&false";;
val it : ParseResult<bool> * int = (Parsed true, 15)
> parseExpression g "start" "(true|true)&false";;
val it : ParseResult<bool> * int = (Parsed false, 17)Grammar example – 2
In this example the same grammar (with some minor changes, such as allowing whitespace) demonstrates parsing into an AST (the BooleanExpr type) which is then evaluated in a second stage.
type BooleanExpr =
| Atom of bool
| Binary of BooleanExpr * string * BooleanExpr
| Not of BooleanExpr
let rec evalBool = function
| Atom x -> x
| Binary (left, "&", right) -> if evalBool left then evalBool right else false
| Binary (left, "|", right) -> if evalBool left then true else evalBool right
| Binary (_, x, _) -> failwith <| "Unexpected binary operator: " + x
| Not x -> not <| evalBool x
let boolGrammar =
// bool <- "true" / "false"
let boolRule = GrammarRule<BooleanExpr>(Choice [Terminal "true"; Terminal "false"], (fun s _ -> Parsed (Atom (s = "true"))))
// space <- " "*
let spaceRule = GrammarRule<BooleanExpr>(ZeroOrMore (Terminal " "), (fun _ _ -> EmptyMatch))
// paren <- "(" space expr space ")"
let boolParen _ = function
| Production [TerminalSymbol "("; EmptyMatch; x; EmptyMatch; TerminalSymbol ")"] -> x
| x -> unexpected x
let parenRule = GrammarRule<BooleanExpr>(Sequence [Terminal "("; NonTerminal "space"; NonTerminal "expr"; NonTerminal "space"; Terminal ")"], boolParen)
// not <- "!" atom
let boolNot _ = function
| Production [TerminalSymbol "!"; Parsed x] -> Parsed (Not x)
| x -> unexpected x
let notRule = GrammarRule<BooleanExpr>(Sequence [Terminal "!"; NonTerminal "atom"], boolNot)
// atom <- "bool" / "paren" / "not"
let atomRule = GrammarRule<BooleanExpr>(Choice [NonTerminal "bool"; NonTerminal "paren"; NonTerminal "not"], (fun _ x -> x))
// or <- atom space ("|" space or)?
// and <- or space ("&" space and)?
let boolBinary _ = function
| Production [x; EmptyMatch; EmptyMatch] -> x
| Production [Parsed x; EmptyMatch; Production [TerminalSymbol op; EmptyMatch; Parsed y]] -> Parsed (Binary (x, op, y))
| x -> unexpected x
let orRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "atom"; NonTerminal "space"; Optional (Sequence [Terminal "|"; NonTerminal "space"; NonTerminal "or"])], boolBinary)
let andRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "or"; NonTerminal "space"; Optional (Sequence [Terminal "&"; NonTerminal "space"; NonTerminal "and"])], boolBinary)
// expr <- and
let exprRule = GrammarRule<BooleanExpr>(NonTerminal "and", (fun _ x -> x))
// start <- space expr space <epsilon>
let boolStart _ = function
| Production [EmptyMatch; x; EmptyMatch; EmptyMatch] -> x
| x -> unexpected x
let startRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "space"; NonTerminal "expr"; NonTerminal "space"; Epsilon], boolStart)
Map.ofList [("bool", boolRule); ("space", spaceRule); ("paren", parenRule); ("not", notRule);
("atom", atomRule); ("or", orRule); ("and", andRule); ("expr", exprRule); ("start", startRule)]
let parseAndEvalBool expr =
match parseExpression boolGrammar "start" expr with
| (Parsed b, _) -> printfn "%s = %A" expr (evalBool b)
| x -> unexpected xThe code for this artice is on github.
In part 3 I make things a little more practical, expressing the grammar as literal objects is very clumsy — I build a parser that parses textual PEG expressions…