peg/
lib.rs

1//! `rust-peg` is a simple yet flexible parser generator that makes it easy to
2//! write robust parsers. Based on the [Parsing Expression
3//! Grammar][wikipedia-peg] formalism, it provides a Rust macro that builds a
4//! recursive descent parser from a concise definition of the grammar.
5//!
6//! [wikipedia-peg]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
7//!
8//! ## Features
9//!
10//! * Parse input from `&str`, `&[u8]`, `&[T]` or custom types implementing
11//!   traits
12//! * Customizable reporting of parse errors
13//! * Rules can accept arguments to create reusable rule templates
14//! * Precedence climbing for prefix/postfix/infix expressions
15//! * Helpful `rustc` error messages for errors in the grammar definition or the
16//!   Rust code embedded within it
17//! * Rule-level tracing to debug grammars
18//!
19//! ## Overview
20//!
21//! The `peg::parser!{}` macro encloses a `grammar NAME() for INPUT_TYPE { ...
22//! }` definition containing a set of rules which match components of your
23//! language.
24//!
25//! Rules are defined with `rule NAME(PARAMETERS) -> RETURN_TYPE = PEG_EXPR`.
26//! The body of the rule, following the `=`, is a PEG expression, definining how
27//! the input is matched to produce a value.
28//!
29//! PEG expressions are evaluated at a particular position of the input. When an
30//! expression matches, it advances the position and optionally returns a value.
31//! The expression syntax and behavior is [documented
32//! below](#expression-reference).
33//!
34//! The macro expands to a Rust `mod` containing a function for each rule marked
35//! `pub` in the grammar. To parse an input sequence, call one of these
36//! functions. The call returns a `Result<T, ParseError>` carrying either the
37//! successfully parsed value returned by the rule, or a `ParseError` containing
38//! the failure position and the set of tokens expected there.
39//!
40//! ## Example
41//!
42//! Parse a comma-separated list of numbers surrounded by brackets into a `Vec<u32>`:
43//!
44//! ```rust
45//! peg::parser!{
46//!   grammar list_parser() for str {
47//!     rule number() -> u32
48//!       = n:$(['0'..='9']+) {? n.parse().or(Err("u32")) }
49//!
50//!     pub rule list() -> Vec<u32>
51//!       = "[" l:(number() ** ",") "]" { l }
52//!   }
53//! }
54//!
55//! pub fn main() {
56//!     assert_eq!(list_parser::list("[1,1,2,3,5,8]"), Ok(vec![1, 1, 2, 3, 5, 8]));
57//! }
58//! ```
59//!
60//! ## Expression Reference
61//!
62//! ### Atoms
63//!
64//!   * `"keyword"` - _Literal:_ match a literal string.
65//!   * `['0'..='9']`  - _Pattern:_ match a single element that matches a Rust `match`-style
66//!     pattern. [(details)](#pattern-expressions)
67//!   * `[^ '0'..='9']`  - _Inverted pattern:_ match a single element that does not match a Rust `match`-style
68//!     pattern. [(details)](#pattern-expressions)
69//!   * `some_rule()` - _Rule:_ match a rule defined elsewhere in the grammar and return its
70//!     result. Arguments in the parentheses are Rust expressions.
71//!   * `_` or `__` or `___` - _Rule (underscore):_ As a special case, rule names
72//!     consisting of underscores can be defined and invoked without parentheses. These are
73//!     conventionally used to match whitespace between tokens.
74//!   * `(e)` - _Parentheses:_ wrap an expression into a group to override
75//!     normal precedence. Returns the same value as the inner expression. (Use
76//!     an _Action_ block to set the return value for a sequence).
77//!
78//! ### Combining
79//!
80//!   * `e1 e2 e3` - _Sequence:_ match expressions in sequence (`e1` followed by `e2` followed by
81//!     `e3`), ignoring the return values.
82//!   * `a:e1 e2 b:e3 c:e4 { rust }` - _Action:_ match `e1`, `e2`, `e3`, `e4` in
83//!     sequence, like above. If they match successfully, run the Rust code in
84//!     the block and return its return value. The variable names before the
85//!     colons in the sequence are bound to the results of the
86//!     corresponding expressions. It is important that the Rust code embedded
87//!     in the grammar is deterministic and free of side effects, as it may be
88//!     called multiple times.
89//!   * `a:e1 b:e2 c:e3 {? rust }` - _Conditional action:_ Like above, but the
90//!     Rust block returns a `Result<T, &str>` instead of a value directly. On
91//!     `Ok(v)`, it matches successfully and returns `v`. On `Err(e)`, the match
92//!     of the entire expression fails and it tries alternatives or reports a
93//!     parse failure with the `&str` `e`.
94//!   * `e1 / e2 / e3` - _Ordered choice:_ try to match `e1`. If the match succeeds, return its
95//!     result, otherwise try `e2`, and so on.
96//!
97//! ### Repetition
98//!   * `expression?` - _Optional:_ match zero or one repetitions of `expression`. Returns an
99//!     `Option`.
100//!   * `expression*` - _Repeat:_ match zero or more repetitions of `expression` and return the
101//!     results as a `Vec`.
102//!   * `expression+` - _One-or-more:_ match one or more repetitions of `expression` and return the
103//!     results as a `Vec`.
104//!   * `expression*<n,m>` - _Range repeat:_ match between `n` and `m` repetitions of `expression`
105//!     return the results as a `Vec`. [(details)](#repeat-ranges)
106//!   * `expression ** delim` - _Delimited repeat:_ match zero or more repetitions of `expression`
107//!     delimited with `delim` and return the results as a `Vec`.
108//!   * `expression **<n,m> delim` - _Delimited repeat (range):_ match between `n` and `m` repetitions of `expression`
109//!     delimited with `delim` and return the results as a `Vec`. [(details)](#repeat-ranges)
110//!   * `expression ++ delim` - _Delimited repeat (one or more):_ match one or more repetitions of `expression`
111//!     delimited with `delim` and return the results as a `Vec`.
112//!
113//!  ### Special
114//!   * `$(e)` - _Slice:_ match the expression `e`, and return the slice of the input
115//!     corresponding to the match.
116//!   * `&e` - _Positive lookahead:_ Match only if `e` matches at this position,
117//!     without consuming any characters.
118//!   * `!e` - _Negative lookahead:_ Match only if `e` does not match at this
119//!     position, without consuming any characters.
120//!   * `position!()` - return a `usize` representing the current offset into
121//!     the input without consuming anything.
122//!   * `quiet!{ e }` - match the expression `e`, but don't report literals within it as "expected" in
123//!     error messages.
124//!   * `expected!("something")` - fail to match, and report the specified string as expected
125//!     at the current location.
126//!   * `precedence!{ ... }` - Parse infix, prefix, or postfix expressions by precedence climbing.
127//!     [(details)](#precedence-climbing)
128//!   * `#{|input, pos| ... }` - _Custom:_ The provided closure is passed the full input and current
129//!      parse position, and returns a [`RuleResult`].
130//!
131//! ## Expression details
132//!
133//! ### Pattern expressions
134//!
135//! The `[pat]` syntax expands into a [Rust `match`
136//! pattern](https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html) against the next character
137//! (or element) of the input.
138//! 
139//! When the pattern begins with `^`, the matching behavior is inverted:
140//! the expression succeeds only if the pattern does *not* match.
141//! `[^' ']` matches any character other than a space.
142//!
143//! To match sets of characters, use Rust's `..=` inclusive range pattern
144//! syntax and `|` to match multiple patterns. For example `['a'..='z' | 'A'..='Z']` matches an
145//! upper or lower case ASCII alphabet character.
146//!
147//! If your input type is a slice of an enum type, a pattern could match an enum variant like
148//! `[Token::Operator('+')]`.
149//!
150//! Variables captured by the pattern are accessible in a subsequent action
151//! block: `[Token::Integer(i)] { i }`.
152//! 
153//! The pattern expression also evaluates to the matched element, which can be
154//! captured into a variable or used as the return value of a rule: `c:['+'|'-']`.
155//! 
156//! Like Rust `match`, pattern expressions support guard expressions:
157//! `[c if c.is_ascii_digit()]`.
158//!
159//! `[_]` matches any single element. As this always matches except at end-of-file, combining it
160//! with negative lookahead as `![_]` is the idiom for matching EOF in PEG.
161//! 
162//! ### Repeat ranges
163//!
164//! The repeat operators `*` and `**` can be followed by an optional range specification of the
165//! form `<n>` (exact), `<n,>` (min-inclusive), `<,m>` (max-inclusive) or `<n,m>` (range-inclusive), where `n` and `m` are either
166//! integers, or a Rust `usize` expression enclosed in `{}`.
167//!
168//! ### Precedence climbing
169//!
170//! `precedence!{ rules... }` provides a convenient way to parse infix, prefix, and postfix
171//! operators using the [precedence
172//! climbing](http://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing)
173//! algorithm.
174//!
175//! ```rust,no_run
176//! # peg::parser!{grammar doc() for str {
177//! # pub rule number() -> i64 = "..." { 0 }
178//! pub rule arithmetic() -> i64 = precedence!{
179//!   x:(@) "+" y:@ { x + y }
180//!   x:(@) "-" y:@ { x - y }
181//!   --
182//!   x:(@) "*" y:@ { x * y }
183//!   x:(@) "/" y:@ { x / y }
184//!   --
185//!   x:@ "^" y:(@) { x.pow(y as u32) }
186//!   --
187//!   n:number() { n }
188//!   "(" e:arithmetic() ")" { e }
189//! }
190//! # }}
191//! # fn main() {}
192//! ```
193//!
194//! Each `--` introduces a new precedence level that binds more tightly than previous precedence
195//! levels. The levels consist of one or more operator rules each followed by a Rust action
196//! expression.
197//!
198//! The `(@)` and `@` are the operands, and the parentheses indicate associativity. An operator
199//! rule beginning and ending with `@` is an infix expression. Prefix and postfix rules have one
200//! `@` at the beginning or end, and atoms do not include `@`.
201//!
202//! ## Input types
203//!
204//!  The first line of the grammar declares an input type. This is normally
205//!  `str`, but  `rust-peg` handles input types through a series of traits. The
206//!  library comes with implementations for `str`, `[u8]`, and `[T]`. Define the
207//!  traits below to use your own types as input to `peg` grammars:
208//!
209//!   * [`Parse`] is the base trait required for all inputs. The others are only required to use the
210//!     corresponding expressions.
211//!   * [`ParseElem`] implements the `[_]` pattern operator, with a method returning the next item of
212//!     the input to match.
213//!   * [`ParseLiteral`] implements matching against a `"string"` literal.
214//!   * [`ParseSlice`] implements the `$()` operator, returning a slice from a span of indexes.
215//!
216//! As a more complex example, the body of the `peg::parser!{}` macro itself is
217//! parsed with `peg`, using a [definition of these traits][gh-flat-token-tree]
218//! for a type that wraps Rust's `TokenTree`.
219//!
220//! [gh-flat-token-tree]: https://github.com/kevinmehall/rust-peg/blob/master/peg-macros/tokens.rs
221//!
222//! ## End-of-file handling
223//!
224//! Normally, parsers report an error if the top-level rule matches without consuming all the input.
225//! To allow matching a prefix of the input, add the `#[no_eof]` attribute before `pub rule`.
226//! Take care to not miss a malformed `x` at the last position if the rule ends with a `x()*`
227//! repeat expression.
228//!
229//! ## Rule parameters
230//!
231//! Rules can be parameterized with types, lifetimes, and values, just like Rust functions.
232//!
233//! In addition to Rust values, rules can also accept PEG expression fragments as arguments by using
234//! `rule<R>` as a parameter type. When calling such a rule, use `<>` around a PEG expression in the
235//! argument list to capture the expression and pass it to the rule.
236//!
237//! For example:
238//!
239//! ```rust,no_run
240//! # peg::parser!{grammar doc() for str {
241//! rule num_radix(radix: u32) -> u32
242//!   = n:$(['0'..='9']+) {? u32::from_str_radix(n, radix).or(Err("number")) }
243//!
244//! rule list<T>(x: rule<T>) -> Vec<T> = "[" v:(x() ** ",") ","? "]" {v}
245//!
246//! pub rule octal_list() -> Vec<u32> = list(<num_radix(8)>)
247//! # }}
248//! # fn main() {}
249//! ```
250//!
251//! ## Failure reporting
252//!
253//! When a match fails, position information is automatically recorded to report a set of
254//! "expected" tokens that would have allowed the parser to advance further.
255//!
256//! Some rules should never appear in error messages, and can be suppressed with `quiet!{e}`:
257//! ```rust,no_run
258//! # peg::parser!{grammar doc() for str {
259//! rule whitespace() = quiet!{[' ' | '\n' | '\t']+}
260//! # }}
261//! # fn main() {}
262//! ```
263//!
264//! If you want the "expected" set to contain a more helpful string instead of character sets, you
265//! can use `quiet!{}` and `expected!()` together:
266//!
267//! ```rust,no_run
268//! # peg::parser!{grammar doc() for str {
269//! rule identifier()
270//!   = quiet!{[ 'a'..='z' | 'A'..='Z']['a'..='z' | 'A'..='Z' | '0'..='9' ]*}
271//!   / expected!("identifier")
272//! # }}
273//! # fn main() {}
274//! ```
275//!
276//! ## Imports
277//!
278//! ```rust,no_run
279//! mod ast {
280//!    pub struct Expr;
281//! }
282//!
283//! peg::parser!{grammar doc() for str {
284//!     use self::ast::Expr;
285//! }}
286//! # fn main() {}
287//! ```
288//!
289//! The grammar may begin with a series of `use` declarations, just like in Rust, which are
290//! included in the generated module. Unlike normal `mod {}` blocks, `use super::*` is inserted by
291//! default, so you don't have to deal with this most of the time.
292//!
293//! ## Rustdoc comments
294//!
295//! `rustdoc` comments with `///` before a `grammar` or `pub rule` are propagated to the resulting
296//! module or function:
297//!
298//! ```rust,no_run
299//! # peg::parser!{grammar doc() for str {
300//! /// Parse an array expression.
301//! pub rule array() -> Vec<i32> = "[...]" { vec![] }
302//! # }}
303//! # fn main() {}
304//! ```
305//!
306//! As with all procedural macros, non-doc comments are ignored by the lexer and can be used like
307//! in any other Rust code.
308//!
309//! ## Caching and left recursion
310//!
311//! A `rule` without parameters can be prefixed with `#[cache]` if it is likely
312//! to be checked repeatedly in the same position. This memoizes the rule result
313//! as a function of input position, in the style of a [packrat
314//! parser][wp-peg-packrat].
315//!
316//! [wp-peg-packrat]: https://en.wikipedia.org/wiki/Parsing_expression_grammar#Implementing_parsers_from_parsing_expression_grammars
317//!
318//! However, idiomatic code avoids structures that parse the same input
319//! repeatedly, so the use of `#[cache]` is often not a performance win. Simple
320//! rules may also be faster to re-match than the additional cost of the hash
321//! table lookup and insert.
322//!
323//! For example, a complex rule called `expr` might benefit from caching if used
324//! like `expr() "x" / expr() "y" / expr() "z"`, but this could be rewritten to
325//! `expr() ("x" / "y" / "z")` which would be even faster.
326//!
327//! `#[cache_left_rec]` extends the `#[cache]` mechanism with the ability to resolve
328//! left-recursive rules, which are otherwise an error.
329//!
330//! The `precedence!{}` syntax is another way to handle nested operators and avoid
331//! repeatedly matching an expression rule.
332//!
333//! ## Tracing
334//!
335//! If you pass the `peg/trace` feature to Cargo when building your project, a
336//! trace of the rules attempted and matched will be printed to stdout when
337//! parsing. For example,
338//! ```sh
339//! $ cargo run --features peg/trace
340//! ...
341//! [PEG_TRACE] Matched rule type at 8:5
342//! [PEG_TRACE] Attempting to match rule ident at 8:12
343//! [PEG_TRACE] Attempting to match rule letter at 8:12
344//! [PEG_TRACE] Failed to match rule letter at 8:12
345//! ...
346//! ```
347
348extern crate peg_macros;
349extern crate peg_runtime as runtime;
350
351pub use peg_macros::parser;
352pub use runtime::*;