peg/lib.rs
1//! `rust-peg` is a simple yet flexible parser generator that makes it easy to
2//! write robust parsers. Based on the [Parsing Expression
3//! Grammar][wikipedia-peg] formalism, it provides a Rust macro that builds a
4//! recursive descent parser from a concise definition of the grammar.
5//!
6//! [wikipedia-peg]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
7//!
8//! ## Features
9//!
10//! * Parse input from `&str`, `&[u8]`, `&[T]` or custom types implementing
11//! traits
12//! * Customizable reporting of parse errors
13//! * Rules can accept arguments to create reusable rule templates
14//! * Precedence climbing for prefix/postfix/infix expressions
15//! * Helpful `rustc` error messages for errors in the grammar definition or the
16//! Rust code embedded within it
17//! * Rule-level tracing to debug grammars
18//!
19//! ## Overview
20//!
21//! The `peg::parser!{}` macro encloses a `grammar NAME() for INPUT_TYPE { ...
22//! }` definition containing a set of rules which match components of your
23//! language.
24//!
25//! Rules are defined with `rule NAME(PARAMETERS) -> RETURN_TYPE = PEG_EXPR`.
26//! The body of the rule, following the `=`, is a PEG expression, definining how
27//! the input is matched to produce a value.
28//!
29//! PEG expressions are evaluated at a particular position of the input. When an
30//! expression matches, it advances the position and optionally returns a value.
31//! The expression syntax and behavior is [documented
32//! below](#expression-reference).
33//!
34//! The macro expands to a Rust `mod` containing a function for each rule marked
35//! `pub` in the grammar. To parse an input sequence, call one of these
36//! functions. The call returns a `Result<T, ParseError>` carrying either the
37//! successfully parsed value returned by the rule, or a `ParseError` containing
38//! the failure position and the set of tokens expected there.
39//!
40//! ## Example
41//!
42//! Parse a comma-separated list of numbers surrounded by brackets into a `Vec<u32>`:
43//!
44//! ```rust
45//! peg::parser!{
46//! grammar list_parser() for str {
47//! rule number() -> u32
48//! = n:$(['0'..='9']+) {? n.parse().or(Err("u32")) }
49//!
50//! pub rule list() -> Vec<u32>
51//! = "[" l:(number() ** ",") "]" { l }
52//! }
53//! }
54//!
55//! pub fn main() {
56//! assert_eq!(list_parser::list("[1,1,2,3,5,8]"), Ok(vec![1, 1, 2, 3, 5, 8]));
57//! }
58//! ```
59//!
60//! ## Expression Reference
61//!
62//! ### Atoms
63//!
64//! * `"keyword"` - _Literal:_ match a literal string.
65//! * `['0'..='9']` - _Pattern:_ match a single element that matches a Rust `match`-style
66//! pattern. [(details)](#pattern-expressions)
67//! * `[^ '0'..='9']` - _Inverted pattern:_ match a single element that does not match a Rust `match`-style
68//! pattern. [(details)](#pattern-expressions)
69//! * `some_rule()` - _Rule:_ match a rule defined elsewhere in the grammar and return its
70//! result. Arguments in the parentheses are Rust expressions.
71//! * `_` or `__` or `___` - _Rule (underscore):_ As a special case, rule names
72//! consisting of underscores can be defined and invoked without parentheses. These are
73//! conventionally used to match whitespace between tokens.
74//! * `(e)` - _Parentheses:_ wrap an expression into a group to override
75//! normal precedence. Returns the same value as the inner expression. (Use
76//! an _Action_ block to set the return value for a sequence).
77//!
78//! ### Combining
79//!
80//! * `e1 e2 e3` - _Sequence:_ match expressions in sequence (`e1` followed by `e2` followed by
81//! `e3`), ignoring the return values.
82//! * `a:e1 e2 b:e3 c:e4 { rust }` - _Action:_ match `e1`, `e2`, `e3`, `e4` in
83//! sequence, like above. If they match successfully, run the Rust code in
84//! the block and return its return value. The variable names before the
85//! colons in the sequence are bound to the results of the
86//! corresponding expressions. It is important that the Rust code embedded
87//! in the grammar is deterministic and free of side effects, as it may be
88//! called multiple times.
89//! * `a:e1 b:e2 c:e3 {? rust }` - _Conditional action:_ Like above, but the
90//! Rust block returns a `Result<T, &str>` instead of a value directly. On
91//! `Ok(v)`, it matches successfully and returns `v`. On `Err(e)`, the match
92//! of the entire expression fails and it tries alternatives or reports a
93//! parse failure with the `&str` `e`.
94//! * `e1 / e2 / e3` - _Ordered choice:_ try to match `e1`. If the match succeeds, return its
95//! result, otherwise try `e2`, and so on.
96//!
97//! ### Repetition
98//! * `expression?` - _Optional:_ match zero or one repetitions of `expression`. Returns an
99//! `Option`.
100//! * `expression*` - _Repeat:_ match zero or more repetitions of `expression` and return the
101//! results as a `Vec`.
102//! * `expression+` - _One-or-more:_ match one or more repetitions of `expression` and return the
103//! results as a `Vec`.
104//! * `expression*<n,m>` - _Range repeat:_ match between `n` and `m` repetitions of `expression`
105//! return the results as a `Vec`. [(details)](#repeat-ranges)
106//! * `expression ** delim` - _Delimited repeat:_ match zero or more repetitions of `expression`
107//! delimited with `delim` and return the results as a `Vec`.
108//! * `expression **<n,m> delim` - _Delimited repeat (range):_ match between `n` and `m` repetitions of `expression`
109//! delimited with `delim` and return the results as a `Vec`. [(details)](#repeat-ranges)
110//! * `expression ++ delim` - _Delimited repeat (one or more):_ match one or more repetitions of `expression`
111//! delimited with `delim` and return the results as a `Vec`.
112//!
113//! ### Special
114//! * `$(e)` - _Slice:_ match the expression `e`, and return the slice of the input
115//! corresponding to the match.
116//! * `&e` - _Positive lookahead:_ Match only if `e` matches at this position,
117//! without consuming any characters.
118//! * `!e` - _Negative lookahead:_ Match only if `e` does not match at this
119//! position, without consuming any characters.
120//! * `position!()` - return a `usize` representing the current offset into
121//! the input without consuming anything.
122//! * `quiet!{ e }` - match the expression `e`, but don't report literals within it as "expected" in
123//! error messages.
124//! * `expected!("something")` - fail to match, and report the specified string as expected
125//! at the current location.
126//! * `precedence!{ ... }` - Parse infix, prefix, or postfix expressions by precedence climbing.
127//! [(details)](#precedence-climbing)
128//! * `#{|input, pos| ... }` - _Custom:_ The provided closure is passed the full input and current
129//! parse position, and returns a [`RuleResult`].
130//!
131//! ## Expression details
132//!
133//! ### Pattern expressions
134//!
135//! The `[pat]` syntax expands into a [Rust `match`
136//! pattern](https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html) against the next character
137//! (or element) of the input.
138//!
139//! When the pattern begins with `^`, the matching behavior is inverted:
140//! the expression succeeds only if the pattern does *not* match.
141//! `[^' ']` matches any character other than a space.
142//!
143//! To match sets of characters, use Rust's `..=` inclusive range pattern
144//! syntax and `|` to match multiple patterns. For example `['a'..='z' | 'A'..='Z']` matches an
145//! upper or lower case ASCII alphabet character.
146//!
147//! If your input type is a slice of an enum type, a pattern could match an enum variant like
148//! `[Token::Operator('+')]`.
149//!
150//! Variables captured by the pattern are accessible in a subsequent action
151//! block: `[Token::Integer(i)] { i }`.
152//!
153//! The pattern expression also evaluates to the matched element, which can be
154//! captured into a variable or used as the return value of a rule: `c:['+'|'-']`.
155//!
156//! Like Rust `match`, pattern expressions support guard expressions:
157//! `[c if c.is_ascii_digit()]`.
158//!
159//! `[_]` matches any single element. As this always matches except at end-of-file, combining it
160//! with negative lookahead as `![_]` is the idiom for matching EOF in PEG.
161//!
162//! ### Repeat ranges
163//!
164//! The repeat operators `*` and `**` can be followed by an optional range specification of the
165//! form `<n>` (exact), `<n,>` (min-inclusive), `<,m>` (max-inclusive) or `<n,m>` (range-inclusive), where `n` and `m` are either
166//! integers, or a Rust `usize` expression enclosed in `{}`.
167//!
168//! ### Precedence climbing
169//!
170//! `precedence!{ rules... }` provides a convenient way to parse infix, prefix, and postfix
171//! operators using the [precedence
172//! climbing](http://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing)
173//! algorithm.
174//!
175//! ```rust,no_run
176//! # peg::parser!{grammar doc() for str {
177//! # pub rule number() -> i64 = "..." { 0 }
178//! pub rule arithmetic() -> i64 = precedence!{
179//! x:(@) "+" y:@ { x + y }
180//! x:(@) "-" y:@ { x - y }
181//! --
182//! x:(@) "*" y:@ { x * y }
183//! x:(@) "/" y:@ { x / y }
184//! --
185//! x:@ "^" y:(@) { x.pow(y as u32) }
186//! --
187//! n:number() { n }
188//! "(" e:arithmetic() ")" { e }
189//! }
190//! # }}
191//! # fn main() {}
192//! ```
193//!
194//! Each `--` introduces a new precedence level that binds more tightly than previous precedence
195//! levels. The levels consist of one or more operator rules each followed by a Rust action
196//! expression.
197//!
198//! The `(@)` and `@` are the operands, and the parentheses indicate associativity. An operator
199//! rule beginning and ending with `@` is an infix expression. Prefix and postfix rules have one
200//! `@` at the beginning or end, and atoms do not include `@`.
201//!
202//! ## Input types
203//!
204//! The first line of the grammar declares an input type. This is normally
205//! `str`, but `rust-peg` handles input types through a series of traits. The
206//! library comes with implementations for `str`, `[u8]`, and `[T]`. Define the
207//! traits below to use your own types as input to `peg` grammars:
208//!
209//! * [`Parse`] is the base trait required for all inputs. The others are only required to use the
210//! corresponding expressions.
211//! * [`ParseElem`] implements the `[_]` pattern operator, with a method returning the next item of
212//! the input to match.
213//! * [`ParseLiteral`] implements matching against a `"string"` literal.
214//! * [`ParseSlice`] implements the `$()` operator, returning a slice from a span of indexes.
215//!
216//! As a more complex example, the body of the `peg::parser!{}` macro itself is
217//! parsed with `peg`, using a [definition of these traits][gh-flat-token-tree]
218//! for a type that wraps Rust's `TokenTree`.
219//!
220//! [gh-flat-token-tree]: https://github.com/kevinmehall/rust-peg/blob/master/peg-macros/tokens.rs
221//!
222//! ## End-of-file handling
223//!
224//! Normally, parsers report an error if the top-level rule matches without consuming all the input.
225//! To allow matching a prefix of the input, add the `#[no_eof]` attribute before `pub rule`.
226//! Take care to not miss a malformed `x` at the last position if the rule ends with a `x()*`
227//! repeat expression.
228//!
229//! ## Rule parameters
230//!
231//! Rules can be parameterized with types, lifetimes, and values, just like Rust functions.
232//!
233//! In addition to Rust values, rules can also accept PEG expression fragments as arguments by using
234//! `rule<R>` as a parameter type. When calling such a rule, use `<>` around a PEG expression in the
235//! argument list to capture the expression and pass it to the rule.
236//!
237//! For example:
238//!
239//! ```rust,no_run
240//! # peg::parser!{grammar doc() for str {
241//! rule num_radix(radix: u32) -> u32
242//! = n:$(['0'..='9']+) {? u32::from_str_radix(n, radix).or(Err("number")) }
243//!
244//! rule list<T>(x: rule<T>) -> Vec<T> = "[" v:(x() ** ",") ","? "]" {v}
245//!
246//! pub rule octal_list() -> Vec<u32> = list(<num_radix(8)>)
247//! # }}
248//! # fn main() {}
249//! ```
250//!
251//! ## Failure reporting
252//!
253//! When a match fails, position information is automatically recorded to report a set of
254//! "expected" tokens that would have allowed the parser to advance further.
255//!
256//! Some rules should never appear in error messages, and can be suppressed with `quiet!{e}`:
257//! ```rust,no_run
258//! # peg::parser!{grammar doc() for str {
259//! rule whitespace() = quiet!{[' ' | '\n' | '\t']+}
260//! # }}
261//! # fn main() {}
262//! ```
263//!
264//! If you want the "expected" set to contain a more helpful string instead of character sets, you
265//! can use `quiet!{}` and `expected!()` together:
266//!
267//! ```rust,no_run
268//! # peg::parser!{grammar doc() for str {
269//! rule identifier()
270//! = quiet!{[ 'a'..='z' | 'A'..='Z']['a'..='z' | 'A'..='Z' | '0'..='9' ]*}
271//! / expected!("identifier")
272//! # }}
273//! # fn main() {}
274//! ```
275//!
276//! ## Imports
277//!
278//! ```rust,no_run
279//! mod ast {
280//! pub struct Expr;
281//! }
282//!
283//! peg::parser!{grammar doc() for str {
284//! use self::ast::Expr;
285//! }}
286//! # fn main() {}
287//! ```
288//!
289//! The grammar may begin with a series of `use` declarations, just like in Rust, which are
290//! included in the generated module. Unlike normal `mod {}` blocks, `use super::*` is inserted by
291//! default, so you don't have to deal with this most of the time.
292//!
293//! ## Rustdoc comments
294//!
295//! `rustdoc` comments with `///` before a `grammar` or `pub rule` are propagated to the resulting
296//! module or function:
297//!
298//! ```rust,no_run
299//! # peg::parser!{grammar doc() for str {
300//! /// Parse an array expression.
301//! pub rule array() -> Vec<i32> = "[...]" { vec![] }
302//! # }}
303//! # fn main() {}
304//! ```
305//!
306//! As with all procedural macros, non-doc comments are ignored by the lexer and can be used like
307//! in any other Rust code.
308//!
309//! ## Caching and left recursion
310//!
311//! A `rule` without parameters can be prefixed with `#[cache]` if it is likely
312//! to be checked repeatedly in the same position. This memoizes the rule result
313//! as a function of input position, in the style of a [packrat
314//! parser][wp-peg-packrat].
315//!
316//! [wp-peg-packrat]: https://en.wikipedia.org/wiki/Parsing_expression_grammar#Implementing_parsers_from_parsing_expression_grammars
317//!
318//! However, idiomatic code avoids structures that parse the same input
319//! repeatedly, so the use of `#[cache]` is often not a performance win. Simple
320//! rules may also be faster to re-match than the additional cost of the hash
321//! table lookup and insert.
322//!
323//! For example, a complex rule called `expr` might benefit from caching if used
324//! like `expr() "x" / expr() "y" / expr() "z"`, but this could be rewritten to
325//! `expr() ("x" / "y" / "z")` which would be even faster.
326//!
327//! `#[cache_left_rec]` extends the `#[cache]` mechanism with the ability to resolve
328//! left-recursive rules, which are otherwise an error.
329//!
330//! The `precedence!{}` syntax is another way to handle nested operators and avoid
331//! repeatedly matching an expression rule.
332//!
333//! ## Tracing
334//!
335//! If you pass the `peg/trace` feature to Cargo when building your project, a
336//! trace of the rules attempted and matched will be printed to stdout when
337//! parsing. For example,
338//! ```sh
339//! $ cargo run --features peg/trace
340//! ...
341//! [PEG_TRACE] Matched rule type at 8:5
342//! [PEG_TRACE] Attempting to match rule ident at 8:12
343//! [PEG_TRACE] Attempting to match rule letter at 8:12
344//! [PEG_TRACE] Failed to match rule letter at 8:12
345//! ...
346//! ```
347
348extern crate peg_macros;
349extern crate peg_runtime as runtime;
350
351pub use peg_macros::parser;
352pub use runtime::*;