The ParserComb structure provides parser combinators over
character readers. This structure is modeled after the Haskell
combinators of Hutton and Meijer. The main difference is that
they return a single result, instead of a list of results.
This fact means that the or combinator is a committed choice;
once one branch succeeds, the others will not be enabled. While
this property is somewhat limiting, for many applications it
will not be a problem.
Synopsis
signature PARSER_COMB
structure ParserComb : PARSER_COMB
Interface
type ('a, 'strm) parser =
(char, 'strm) StringCvt.reader -> ('a, 'strm) StringCvt.reader
val result : 'a -> ('a, 'strm) parser
val failure : ('a, 'strm) parser
val wrap : (('a, 'strm) parser * ('a -> 'b)) -> ('b, 'strm) parser
val seq : (('a, 'strm) parser * ('b, 'strm) parser) -> (('a * 'b), 'strm) parser
val seqWith : (('a * 'b) -> 'c)
-> (('a, 'strm) parser * ('b, 'strm) parser)
-> ('c, 'strm) parser
val bind : (('a, 'strm) parser * ('a -> ('b, 'strm) parser))
-> ('b, 'strm) parser
val eatChar : (char -> bool) -> (char, 'strm) parser
val char : char -> (char, 'strm) parser
val string : string -> (string, 'strm) parser
val skipBefore : (char -> bool) -> ('a, 'strm) parser -> ('a, 'strm) parser
val or : (('a, 'strm) parser * ('a, 'strm) parser) -> ('a, 'strm) parser
val or' : ('a, 'strm) parser list -> ('a, 'strm) parser
val zeroOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
val oneOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
val option : ('a, 'strm) parser -> ('a option, 'strm) parser
val join : ('a option, 'strm) parser -> ('a, 'strm) parser
val token : (char -> bool) -> (string, 'strm) parser
Description
type ('a, 'strm) parser = (char, 'strm) StringCvt.reader -> ('a, 'strm) StringCvt.reader-
A parser is a function that takes a character reader and returns reader for the type of values the parser accepts.
val result : 'a -> ('a, 'strm) parser-
result v getc strmreturnsSOME(v, strm); i.e.,result vlifts the valuevto a parser that returnsvwithout consuming any input. val failure : ('a, 'strm) parser-
failure getc strmreturnsNONE; i.e. it is the parser that does not accept any input. val wrap : 'a, 'strm) parser * ('a -> 'b -> ('b, 'strm) parser-
wrap parser fcomposes the functionfwithparser val seq : (('a, 'strm) parser * ('b, 'strm) parser) -> (('a * 'b), 'strm) parser-
seq (parser1, parser2)is the sequential combination of the two parsers; i.e., a parser that will first parse a valuev1from the input usingparser1and then parse a valuev2usingparser2yielding the pair(v1, v2). val seqWith : (('a * 'b) -> 'c) -> (('a, 'strm) parser * ('b, 'strm) parser) -> ('c, 'strm) parser-
seqWith f (parser1, parser2)is the sequential combination of the two parsers composed with the functionf; i.e., a parser that will first parse a valuev1from the input usingparser1and then parse a valuev2usingparser2yielding the result off(v1, v2). This expression is equivalent towrap (seq (parser1, parser2), f) val bind : 'a, 'strm) parser * ('a -> ('b, 'strm) parser -> ('b, 'strm) parser-
bind parser freturns a parser that first usesparserto parse a valuevfrom the input and then continues using the parser that results fromf v. val eatChar : (char -> bool) -> (char, 'strm) parser-
eatChar predreturns a parser that parses one charactercfor whichpred creturnstrue. val char : char -> (char, 'strm) parser-
char creturns a parser that parses the characterc. val string : string -> (string, 'strm) parser-
string s`returns a parser that parses the string `s. val skipBefore : (char -> bool) -> ('a, 'strm) parser -> ('a, 'strm) parser-
skipBefore pred parserreturns a parser that first skips any prefix of characters that satisfy the predicatepredand then appliesparserto the input. val or : (('a, 'strm) parser * ('a, 'strm) parser) -> ('a, 'strm) parser-
or (parser1, parser2)returns the ordered choice of the two parsers; i.e., it returns a parser that first attempts to parse the input usingparser1; ifparser1fails on the input, then it usesparser2. val or' : ('a, 'strm) parser list -> ('a, 'strm) parser-
or' parsersreturns the ordered choice of a list of parsers. This expression is equivalent toList.foldr or failure parsers val zeroOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser-
zeroOrMore parserreturns a parser that parses a list of zero or more items usingparser. val oneOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser-
oneOrMore parserreturns a parser that parses a list of one or more items usingparser. val option : ('a, 'strm) parser -> ('a option, 'strm) parser-
option parserreturns a parser that parses an optional item (i.e., zero or one occurrences) usingparser. val join : ('a option, 'strm) parser -> ('a, 'strm) parser-
join parserreturns a parser that requires the optional item parsed byparserto be present. val token : (char -> bool) -> (string, 'strm) parser-
token predreturns a parser for a string of characters, where every character satisfies the predicate functionpred.
Examples
As noted above, the parser type and combinators are
designed around the
StringCvt.reader
representation of input streams.
Thus, the scan functions defined in the {basis-lib-url}/index.html[Basis Library]
are compatible with the parser type defined here. For example,
val boolParser : (bool, 'strm) parser = Bool.scan
val intParser : (int, 'strm) parser = Int.scan StringCvt.DEC
Let us define the abstract syntax of a small expression language with addition, numbers, and let-bound variables.
datatype exp
= VAR of string
| NUM of int
| ADD of exp * exp
| LET of string * exp * exp
We can use parser combinators to implement a simple parser for this language as follows.
We start by defining a few utility definitions:
structure P = ParserComb
val +> = P.seq
infixr 3 +>
fun skipWS getc = P.skipBefore Char.isSpace getc
We can then define parsers for the atomic expressions (numbers and variables):
fun numParser getc = P.wrap (Int.scan StringCvt.DEC, NUM) getc
fun idParser getc = P.seqWith
(fn (a, SOME b) => a ^ b | (a, NONE) => a)
(P.wrap (P.eatChar Char.isAlpha, str),
P.option (P.token Char.isAlphaNum))
getc
fun varParser getc = P.wrap(idParser, VAR) getc
We need the separate idParser to parse let-bound identifiers.
We then define three, mutually-recursive, functions to parse expressions.
fun letParser getc = P.wrap (
P.string "let" +> skipWS(idParser) +> skipWS(P.char #"=") +> expParser
+> skipWS(P.string "in") +> expParser,
fn (_, (x, (_, (e1, (_, e2))))) => LET(x, e1, e2)) getc
and expParser getc = P.wrap (
skipWS (P.seq (
P.or' [letParser, numParser, varParser],
addParser)),
fn (e, es) => List.foldl (fn (a, b) => ADD(b, a)) e es) getc
and addParser getc =
P.zeroOrMore (skipWS (P.wrap (P.char #"+" +> expParser, #2))) getc
Note that the letParser must appear before the varParser in the
list of parsers combined by or' to avoid treating the string "let"
as a variable. Another detail is that we use
List.foldl with a
function that swaps the order of its arguments in order
that addition is left associative.
If we evaluate the expression
StringCvt.scanString expParser " let x = 1+2 in x + x ";
we get the expected result
SOME (LET ("x", ADD (NUM 1, NUM 2), ADD (VAR "x", VAR "x")))