如何使用boost :: spirit将语法解析为std :: set?

问题描述：

如何将 boost :: spirit 语法的结果解析为 std :: set ?

How to parse the result of a boost::spirit grammar into an std::set?

作为学习如何使用 boost :: spirit 的练习，我正在设计X.500/LDAP专有名称的解析器.可以在 RFC-1779 中以BNF格式找到语法.

As an exercise to learn how to use boost::spirit, I am designing a parser for X.500/LDAP Distinguished Names. The grammar can be found in a BNF format in the RFC-1779.

我展开"并将其翻译为 boost :: spirit 规则.那是第一步.基本上，DN是一组RDN(相对专有名称)，它们本身是(Key，Value)对的元组.

I "unrolled" it and translated it into boost::spirit rules. That's the first step. Basically, a DN is a set of RDN (Relative Distinguished Names) which themselves are tuples of (Key,Value) pairs.

我考虑使用

typedef std::unordered_map<std::string, std::string> rdn_type;

代表每个RDN.然后将RDN收集到 std :: set< rdn_type>

to represent each RDN. The RDNs are then gathered into a std::set<rdn_type>

我的问题是，在浏览 boost :: spirit 的(好的)文档时，我不知道如何填充集合.

My issue is that going through the (good) documentation of boost::spirit, I didn't find out how to populate the set.

我当前的代码可以在 github 上找到，我我会尽可能地完善它.

My current code can be found on github and I'm trying to refine it whenever I can.

发起撒旦舞来召集SO最受欢迎的北极熊:p

为了解决所有问题，我在这里添加了一个代码副本，因为它有点长，所以我将其放在末尾:)

In order to have an all-at-one-place question, I add a copy of the code here, it's a bit long so I put it at the end :)

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;

typedef std::unordered_map<std::string, std::string> dn_key_value_map;

template <typename Iterator>
struct dn_grammar_common : public qi::grammar<Iterator, std::multiset<dn_key_value_map>(), ascii::space_type> {
  struct dn_reserved_chars_ : public qi::symbols<char, char> {
    dn_reserved_chars_() {
      add
        ("\\", "\\")
        ("=" , "=")
        ("+" , "+")
        ("," , ",")
        (";" , ";")
        ("#" , "#")
        ("<" , "<")
        (">" , ">")
        ("\"", "\"")
        ("%" , "%");
    }
  } dn_reserved_chars;
  dn_grammar_common() : dn_grammar_common::base_type(dn) {
    // Useful using directives
    using namespace qi::labels;

    // Low level rules
    // Key can only contain alphanumerical characters and dashes
    key = ascii::no_case[qi::lexeme[(*qi::alnum) >> (*(qi::char_('-') >> qi::alnum))]];
    escaped_hex_char = qi::lexeme[(&qi::char_("\\")) >> qi::repeat(2)[qi::char_("0-9a-fA-F")]];
    escaped_sequence = escaped_hex_char |
                      qi::lexeme[(&qi::char_("\\")) >> dn_reserved_chars];
    // Rule for a fully escaped string (used as Attribute Value) => "..."
    quote_string = qi::lexeme[qi::lit('"') >>
      *(escaped_sequence | (qi::char_ - qi::char_("\\\""))) >>
      qi::lit('"')
    ];
    // Rule for an hexa string (used as Attribute Value) => #23AD5D...
    hex_string = (&qi::char_("#")) >> *qi::lexeme[(qi::repeat(2)[qi::char_("0-9a-fA-F")])];

    // Value is either:
    // - A regular string (that can contain escaped sequences)
    // - A fully escaped string (that can also contain escaped sequences)
    // - An hexadecimal string
    value = (qi::lexeme[*((qi::char_ - dn_reserved_chars) | escaped_sequence)]) |
            quote_string |
            hex_string;

    // Higher level rules
    rdn_pair = key >> '=' >> value;
    // A relative distinguished name consists of a sequence of pairs (Attribute = AttributeValue)
    // Separated with a +
    rdn = rdn_pair % qi::char_("+");
    // The DN is a set of RDNs separated by either a "," or a ";".
    // The two separators can coexist in a given DN, though it is not
    // recommended practice.
    dn = rdn % (qi::char_(",;"));
  }
  qi::rule<Iterator, std::set<dn_key_value_map>(), ascii::space_type> dn;
  qi::rule<Iterator, dn_key_value_map(), ascii::space_type> rdn;
  qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> rdn_pair;
  qi::rule<Iterator, std::string(), ascii::space_type> key, value, hex_string, quote_string;
  qi::rule<Iterator, std::string(), ascii::space_type> escaped_hex_char, escaped_sequence;
};

答

我怀疑您只需要 fusion/adapted/std_pair.hpp

让我尝试使其编译

好

您的开始规则不兼容

your start rule was incompatible

 qi::rule<Iterator, std::multiset<dn_key_value_map>(), ascii::space_type> dn;

符号表应映射到字符串，而不是char

the symbol table should map to string, not char

struct dn_reserved_chars_ : public qi::symbols<char, std::string> {

或，您应该将映射值更改为char文字.

or you should change the mapped values to char literals.

为什么要使用它而不是 char _("\\ = +，;#<> \"％)?

更新

已经完成了对语法的审查(完全从实现的角度出发，因此我实际上没有阅读RFC来检查假设).

Update

Have completed my review of the Grammar (purely from the implementation point-of-view, so I haven't actually read the RFC to check the assumptions).

我在此处创建了拉取请求: https://github.com/Rerito/pkistore/pull/1

一般说明

General Notes

无序地图无法排序，因此使用 map< string，string>
从技术上讲，外部集不是RFC中的集(?)，因此向量(也使相对域名之间的输出更符合输入顺序)
删除了迷信内容(融合集/地图完全与std :: set/map无关.只需要std_pair.hpp即可使地图正常工作

语法规则:

符号< char，char> 需要 char 值(不是." ，而是'.')
许多简化

symbols<char,char> requires char values (not "." but '.')
Many simplifications

删除& char _(...)实例(它们不匹配任何内容，它是只是一个断言)
删除无能的 no_case []
删除了不必要的 lexeme [] 指令；大多数已经实现通过从规则声明中删除船长
完全删除了一些规则声明(规则def并不复杂足以保证产生的间接费用)，例如 hex_string
制成的 key 至少需要一个字符(未检查规格).注意如何

remove &char_(...) instances (they don't match anything, it's just an assertion)
remove impotent no_case[]
removed unnecessary lexeme[] directives; most have been realized by removing the skipper from the rule declarations
removed some rule declarations at all (the rule def aren't complex enough to warrant the overhead incurred), e.g. hex_string
made key require at least one character (not checked the specs). Note how

key = ascii::no_case[qi::lexeme[(*qi::alnum) >> (*(qi::char_('-') >> qi::alnum))]];

成为

key = raw[ alnum >> *(alnum | '-') ];

原始表示输入序列将逐字反映(而不是逐个字符地构建副本)

raw means that the input sequence will be reflected verbatim (instead of building a copy character by character)

对 value 上的分支进行了重新排序(未选中，但我下注未定)字符串基本上会吃掉其他所有东西)

reordered branches on value (not checked, but I wager unqouted strings would basically eat everything else)

测试

根据rfc中的示例"部分添加了一个测试程序test.cpp(3.).

Added a test program test.cpp, based on the Examples section in the rfc (3.).

添加了一些我自己设计的更复杂的示例.

Added some more complicated examples of my own devising.

尾端松动

要做的事情:查看规范中的实际规则和要求

To do: review the specs for actual rules and requirements on

转义特殊字符
在各种内部包含空格(包括换行符)串香:

escaping special characters
inclusion of whitespace (incl. newline characters) inside the various string flavours:

十六进制#xxxx字符串可能允许换行(对我来说很有意义)
未加引号的字符串可能(同义)

还启用了可选的 BOOST_SPIRIT_DEBUG

还在语法内部设置了船长(安全！)

Also made the skipper internal to the grammar (security!)

还提供了一个方便使用的功能，该功能使解析器可用不会泄漏实施细节(Qi)

Also made a convenience free function that makes the parser usable without leaking implementation details (Qi)

实时演示

在Coliru上直播 >

//#include "dn_parser.hpp"
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <set>

namespace pkistore {
    namespace parsing {

    namespace qi      = boost::spirit::qi;
    namespace ascii   = boost::spirit::ascii;

    namespace ast {
        typedef std::map<std::string, std::string> rdn;
        typedef std::vector<rdn> dn;
    }

    template <typename Iterator>
    struct dn_grammar_common : public qi::grammar<Iterator, ast::dn()> {
        dn_grammar_common() : dn_grammar_common::base_type(start) {
            using namespace qi;

            // syntax as defined in rfc1779
            key          = raw[ alnum >> *(alnum | '-') ];

            char_escape  = '\\' >> (hexchar | dn_reserved_chars);
            quote_string = '"' >> *(char_escape | (char_ - dn_reserved_chars)) >> '"' ;

            value        =  quote_string 
                         | '#' >> *hexchar
                         | *(char_escape | (char_ - dn_reserved_chars))
                         ;

            rdn_pair     = key >> '=' >> value;

            rdn          = rdn_pair % qi::char_("+");
            dn           = rdn % qi::char_(",;");

            start        = skip(qi::ascii::space) [ dn ];

            BOOST_SPIRIT_DEBUG_NODES((start)(dn)(rdn)(rdn_pair)(key)(value)(quote_string)(char_escape))
        }

    private:
        qi::int_parser<char, 16, 2, 2> hexchar;

        qi::rule<Iterator, ast::dn()> start;

        qi::rule<Iterator, ast::dn(), ascii::space_type> dn;
        qi::rule<Iterator, ast::rdn(), ascii::space_type> rdn;
        qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> rdn_pair;

        qi::rule<Iterator, std::string()> key, value, quote_string;
        qi::rule<Iterator, char()>        char_escape;

        struct dn_reserved_chars_ : public qi::symbols<char, char> {
            dn_reserved_chars_() {
                add ("\\", '\\') ("\"", '"')
                    ("=" , '=')  ("+" , '+')
                    ("," , ',')  (";" , ';')
                    ("#" , '#')  ("%" , '%')
                    ("<" , '<')  (">" , '>')
                    ;
            }
        } dn_reserved_chars;
    };

    } // namespace parsing

    static parsing::ast::dn parse(std::string const& input) {
        using It = std::string::const_iterator;

        pkistore::parsing::dn_grammar_common<It> const g;

        It f = input.begin(), l = input.end();
        pkistore::parsing::ast::dn parsed;

        bool ok = boost::spirit::qi::parse(f, l, g, parsed);

        if (!ok || (f!=l))
            throw std::runtime_error("dn_parse failure");

        return parsed;
    }
} // namespace pkistore

int main() {
    for (std::string const input : {
            "OU=Sales + CN=J. Smith, O=Widget Inc., C=US",
            "OU=#53616c6573",
            "OU=Sa\\+les + CN=J. Smi\\%th, O=Wid\\,\\;get In\\3bc., C=US",
            //"CN=Marshall T. Rose, O=Dover Beach Consulting, L=Santa Clara,\nST=California, C=US",
            //"CN=FTAM Service, CN=Bells, OU=Computer Science,\nO=University College London, C=GB",
            //"CN=Markus Kuhn, O=University of Erlangen, C=DE",
            //"CN=Steve Kille,\nO=ISODE Consortium,\nC=GB",
            //"CN=Steve Kille ,\n\nO =   ISODE Consortium,\nC=GB",
            //"CN=Steve Kille, O=ISODE Consortium, C=GB\n",
        })
    {
        auto parsed = pkistore::parse(input);

        std::cout << "===========\n" << input << "\n";
        for(auto const& dn : parsed) {
            std::cout << "-----------\n";
            for (auto const& kv : dn) {
                std::cout << "\t" << kv.first << "\t->\t" << kv.second << "\n";
            }
        }
    }
}

打印:

===========
OU=Sales + CN=J. Smith, O=Widget Inc., C=US
-----------
    CN  ->  J. Smith
    OU  ->  Sales 
-----------
    O   ->  Widget Inc.
-----------
    C   ->  US
===========
OU=#53616c6573
-----------
    OU  ->  Sales
===========
OU=Sa\+les + CN=J. Smi\%th, O=Wid\,\;get In\3bc., C=US
-----------
    CN  ->  J. Smi%th
    OU  ->  Sa+les 
-----------
    O   ->  Wid,;get In;c.
-----------
    C   ->  US

如何使用boost :: spirit将语法解析为std :: set?

更新

Update

实时演示

相关推荐