pom

Crates.io Build Status Discord

PEG parser combinators created using operator overloading without macros.

Document

What is PEG?

PEG stands for parsing expression grammar, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree. Each parsing function conceptually takes an input string as its argument, and yields one of the following results:

  • success, in which the function may optionally move forward or consume one or more characters of the input string supplied to it, or
  • failure, in which case no input is consumed.

Read more on Wikipedia.

What is parser combinator?

A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing.

Parsers built using combinators are straightforward to construct, readable, modular, well-structured and easily maintainable. With operator overloading, a parser combinator can take the form of an infix operator, used to glue different parsers to form a complete rule. Parser combinators thereby enable parsers to be defined in an embedded style, in code which is similar in structure to the rules of the formal grammar. And the code is easier to debug than macros.

The main advantage is that you don't need to go through any kind of code generation step, you're always using the vanilla language underneath. Aside from build issues (and the usual issues around error messages and debuggability, which in fairness are about as bad with macros as with code generation), it's usually easier to freely intermix grammar expressions and plain code.

List of predefined parsers and combinators

Basic Parsers Description
empty() Always succeeds, consume no input.
end() Match end of input.
sym(t) Match a single terminal symbol t.
seq(s) Match sequence of symbols.
list(p,s) Match list of p, separated by s.
one_of(set) Success when current input symbol is one of the set.
none_of(set) Success when current input symbol is none of the set.
is_a(predicate) Success when predicate return true on current input symbol.
not_a(predicate) Success when predicate return false on current input symbol.
take(n) Read n symbols.
skip(n) Skip n symbols.
call(pf) Call a parser factory, can used to create recursive parsers.
Parser Combinators Description
p q
p + q Match p and q, if both success return a pair of results.
p - q Match p and q, if both success return result of p.
p * q Match p and q, if both success return result of q.
p >> q Parse p and get result P, then parse and return result of q(P).
-p Success when p success, doen't consume input.
!p Success when p fail, doen't consume input.
p.opt() Make parser optional. Returns an Option.
p.repeat(m..n) p.repeat(0..) repeat p zero or more times
p.repeat(1..) repeat p one or more times
p.repeat(1..4) match p at least 1 and at most 3 times
p.repeat(5) repeat p exactly 5 times
p.map(f) Convert parser result to desired value.
p.convert(f) Convert parser result to desired value, fail in case of conversion error.
p.pos() Get input position after matching p.
p.collect() Collect all matched input symbols.
p.discard() Discard parser output.
p.name(_) Give parser a name to identify parsing errors.

The choice of operators is established by their operator precedence, arity and "meaning". Use * to ignore the result of first operand on the start of an expression, + and - can fulfill the need on the rest of the expression.

For example, A * B * C - D + E - F will return the results of C and E as a pair.

Example code

extern crate pom;
use pom::DataInput;
use pom::parser::*;

let mut input = DataInput::new(b"abcde");
let parser = sym(b'a') * none_of(b"AB") - sym(b'c') + seq(b"de");
let output = parser.parse(&mut input);
assert_eq!(output, Ok( (b'b', vec![b'd', b'e']) ) );

Example JSON parser

extern crate pom;
use pom::{Parser, DataInput};
use pom::parser::*;

use std::str::FromStr;
use std::collections::HashMap;

#[derive(Debug, PartialEq)]
pub enum JsonValue {
    Null,
    Bool(bool),
    Str(String),
    Num(f64),
    Array(Vec<JsonValue>),
    Object(HashMap<String,JsonValue>)
}

fn space() -> Parser<u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}

fn number() -> Parser<u8, f64> {
    let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0');
    let frac = sym(b'.') + one_of(b"0123456789").repeat(1..);
    let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..);
    let number = sym(b'-').opt() + integer + frac.opt() + exp.opt();
    number.collect().convert(String::from_utf8).convert(|s|f64::from_str(&s))
}

fn string() -> Parser<u8, String> {
    let special_char = sym(b'\\') | sym(b'/') | sym(b'"')
        | sym(b'b').map(|_|b'\x08') | sym(b'f').map(|_|b'\x0C')
        | sym(b'n').map(|_|b'\n') | sym(b'r').map(|_|b'\r') | sym(b't').map(|_|b'\t');
    let escape_sequence = sym(b'\\') * special_char;
    let string = sym(b'"') * (none_of(b"\\\"") | escape_sequence).repeat(0..) - sym(b'"');
    string.convert(String::from_utf8)
}

fn array() -> Parser<u8, Vec<JsonValue>> {
    let elems = list(call(value), sym(b',') * space());
    sym(b'[') * space() * elems - sym(b']')
}

fn object() -> Parser<u8, HashMap<String, JsonValue>> {
    let member = string() - space() - sym(b':') - space() + call(value);
    let members = list(member, sym(b',') * space());
    let obj = sym(b'{') * space() * members - sym(b'}');
    obj.map(|members|members.into_iter().collect::<HashMap<_,_>>())
}

fn value() -> Parser<u8, JsonValue> {
    ( seq(b"null").map(|_|JsonValue::Null)
    | seq(b"true").map(|_|JsonValue::Bool(true))
    | seq(b"false").map(|_|JsonValue::Bool(false))
    | number().map(|num|JsonValue::Num(num))
    | string().map(|text|JsonValue::Str(text))
    | array().map(|arr|JsonValue::Array(arr))
    | object().map(|obj|JsonValue::Object(obj))
    ) - space()
}

pub fn json() -> Parser<u8, JsonValue> {
    space() * value() - end()
}

fn main() {
    let test = br#"
 {
        "Image": {
            "Width":  800,
            "Height": 600,
            "Title":  "View from 15th Floor",
            "Thumbnail": {
                "Url":    "http://www.example.com/image/481989943",
                "Height": 125,
                "Width":  100
            },
            "Animated" : false,
            "IDs": [116, 943, 234, 38793]
        }
    }"#;

    let mut input = DataInput::new(test);
    println!("{:?}", json().parse(&mut input));
}

You can run this example with the following command:

cargo run --example json

Benchmark

Parser Time to parse the same JSON file
pom: json_byte 621,319 ns/iter (+/- 20,318)
pom: json_char 627,110 ns/iter (+/- 11,463)
pest: json_char 13,359 ns/iter (+/- 811)

Releases

If you need to stick with Rust stable, use pom 1.0.0, otherwise you can try pom 2.0.0-alpha.



pom

不和谐

使用无宏操作符重载的PEG解析器组合器

文件

什么是PEG ?

PEG代表解析表达语法,是一种分析形式语法,即它是用一组用于识别语言中的字符串的规则来描述一种正式语言。 不同于CFG,PEG不能含糊;如果一个字符串解析,它只有一个有效的解析树。 每个解析函数概念上都将一个输入字符串作为参数,并产生以下结果之一:

  • 成功,其中该功能可以可选地向前移动或者消耗提供给它的输入字符串的一个或多个字符,或
  • 失败,在这种情况下无需输入。

详细了解维基百科

什么是解析器组合器?

解析器组合器是一个高阶函数,接受多个解析器作为输入,并返回一个新的解析器作为其输出。 解析器组合器实现了递归下降解析策略,便于模块化分段构造和测试。

使用组合器构建的解析器可以直接构建,可读,模块化,结构化,易于维护。 使用操作符重载,解析器组合器可以采用中缀操作符的形式,用于粘贴不同的解析器以形成一个完整的规则。 解析器组合器因此使分析器能够以嵌入式样式定义,其代码与结构与形式语法规则相似。 代码比宏更容易调试。

主要优点是您无需经历任何代码生成步骤,您始终使用下面的香草语言。 除了构建问题(以及关于错误消息和可调试性的常见问题,它们在公平性方面与代码生成一样糟糕),通常更容易自由地混合语法表达式和简单的代码。

预定义解析器和组合器列表

Basic Parsers Description
empty() Always succeeds, consume no input.
end() Match end of input.
sym(t) Match a single terminal symbol t.
seq(s) Match sequence of symbols.
list(p,s) Match list of p, separated by s.
one_of(set) Success when current input symbol is one of the set.
none_of(set) Success when current input symbol is none of the set.
is_a(predicate) Success when predicate return true on current input symbol.
nota(predicate) Success when predicate return false on current input symbol.
take(n) Read n symbols.
skip(n) Skip n symbols.
call(pf) Call a parser factory, can used to create recursive parsers.
Parser Combinators Description
p q
p + q Match p and q, if both success return a pair of results.
p - q Match p and q, if both success return result of p.
p * q Match p and q, if both success return result of q.
p >> q Parse p and get result P, then parse and return result of q(P).
-p Success when p success, doen't consume input.
!p Success when p fail, doen't consume input.
p.opt() Make parser optional. Returns an Option.
p.repeat(m..n) p.repeat(0..) repeat p zero or more times
p.repeat(1..) repeat p one or more times
p.repeat(1..4) match p at least 1 and at most 3 times
p.repeat(5) repeat p exactly 5 times
p.map(f) Convert parser result to desired value.
p.convert(f) Convert parser result to desired value, fail in case of conversion error.
p.pos() Get input position after matching p.
p.collect() Collect all matched input symbols.
p.discard() Discard parser output.
p.name() Give parser a name to identify parsing errors.
运营商的选择是由运营商的优先级,优缺点和意义所决定的。 使用 * 忽略表达式开头的第一个操作数的结果, + - 可以满足表达式的其余部分的需要。

例如, A * B * C - D + E - F 将C和E的结果作为一对返回。

示例代码

extern crate pom;
use pom::DataInput;
use pom::parser::*;

let mut input = DataInput::new(b"abcde"); let parser = sym(b'a') none_of(b"AB") - sym(b'c') + seq(b"de"); let output = parser.parse(&mut input); assert_eq!(output, Ok( (b'b', vec![b'd', b'e']) ) );

示例JSON解析器

extern crate pom;
use pom::{Parser, DataInput};
use pom::parser::;

use std::str::FromStr; use std::collections::HashMap;

#[derive(Debug, PartialEq)] pub enum JsonValue { Null, Bool(bool), Str(String), Num(f64), Array(Vec<JsonValue>), Object(HashMap<String,JsonValue>) }

fn space() -> Parser<u8, ()> { one_of(b" \t\r\n").repeat(0..).discard() }

fn number() -> Parser<u8, f64> { let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0'); let frac = sym(b'.') + one_of(b"0123456789").repeat(1..); let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..); let number = sym(b'-').opt() + integer + frac.opt() + exp.opt(); number.collect().convert(String::from_utf8).convert(|s|f64::from_str(&s)) }

fn string() -> Parser<u8, String> { let specialchar = sym(b'\') | sym(b'/') | sym(b'"') | sym(b'b').map(||b'\x08') | sym(b'f').map(||b'\x0C') | sym(b'n').map(||b'\n') | sym(b'r').map(||b'\r') | sym(b't').map(||b'\t'); let escape_sequence = sym(b'\') special_char; let string = sym(b'"') (none_of(b"\&#34;") | escape_sequence).repeat(0..) - sym(b'"'); string.convert(String::from_utf8) }

fn array() -> Parser<u8, Vec<JsonValue>> { let elems = list(call(value), sym(b',') space()); sym(b'[') space() * elems - sym(b']') }

fn object() -> Parser<u8, HashMap<String, JsonValue>> { let member = string() - space() - sym(b':') - space() + call(value); let members = list(member, sym(b',') space()); let obj = sym(b'{') space() * members - sym(b'}'); obj.map(|members|members.intoiter().collect::<HashMap<,_>>()) }

fn value() -> Parser<u8, JsonValue> { ( seq(b"null").map(||JsonValue::Null) | seq(b"true").map(||JsonValue::Bool(true)) | seq(b"false").map(|_|JsonValue::Bool(false)) | number().map(|num|JsonValue::Num(num)) | string().map(|text|JsonValue::Str(text)) | array().map(|arr|JsonValue::Array(arr)) | object().map(|obj|JsonValue::Object(obj)) ) - space() }

pub fn json() -> Parser<u8, JsonValue> { space() * value() - end() }

fn main() { let test = br#" { "Image": { "Width": 800, "Height": 600, "Title": "View from 15th Floor", "Thumbnail": { "Url": "http://www.example.com/image/481989943&#34;, "Height": 125, "Width": 100 }, "Animated" : false, "IDs": [116, 943, 234, 38793] } }"#;

<span class="pl-k">let</span> <span class="pl-k">mut</span> input <span class="pl-k">=</span> DataInput<span class="pl-k">::</span><span class="pl-en">new</span>(test);
<span class="pl-c1">println!</span>(<span class="pl-s">&#34;{:?}&#34;</span>, <span class="pl-en">json</span>().<span class="pl-en">parse</span>(<span class="pl-k">&amp;</span><span class="pl-k">mut</span> input));

}

您可以使用以下命令运行此示例:

cargo run –example json

基准

Parser Time to parse the same JSON file
pom: json_byte 621,319 ns/iter (+/- 20,318)
pom: json_char 627,110 ns/iter (+/- 11,463)
pest: json_char 13,359 ns/iter (+/- 811)

发布

如果您需要坚持使用Rust稳定,请使用pom 1.0.0,否则可尝试使用pom 2.0.0-alpha。




相关问题推荐