使用Parsec解析非二进制运算符

使用Parsec解析非二进制运算符

问题描述:

传统上,算术运算符被认为是二进制运算符(左或右关联),因此大多数工具仅处理二进制运算符.

Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.

有没有一种简便的方法可以使用Parsec解析算术运算符,Parsec可以具有任意数量的参数?

Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?

例如,以下表达式应解析到树中

For example, the following expression should be parsed into the tree

(a + b) + c + d * e + f

是的!关键是首先要解决一个更简单的问题,那就是将 + * 建模为只有两个孩子的树节点.要添加四件事,我们将使用 + 三次.

Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.

这是一个很重要的问题,因为有一个 Text.Parsec.Expr 模块可以解决此问题.您的示例实际上可以通过示例代码中的示例进行解析文档.我在这里做了一些简化:

This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:

module Lib where

import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens

data Expr =
    Identifier String
  | Multiply Expr Expr
  | Add Expr Expr

instance Show Expr where
  show (Identifier s) = s
  show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
  show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"

-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell

-- Infix parser.
infix_ operator func =
  Expr.Infix (reserved operator >> return func) Expr.AssocLeft

parser =
  Expr.buildExpressionParser table term <?> "expression"
  where
    table = [[infix_ "*" Multiply], [infix_ "+" Add]]

term =
  parens parser
  <|> (Identifier <$> identifier)
  <?> "term"

在GHCi中运行此程序

Running this in GHCi:

λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)

有很多方法可以将此树转换为所需的形式.这是一个骇人听闻的速度缓慢的东西:

There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:

data Expr' =
    Identifier' String
  | Add' [Expr']
  | Multiply' [Expr']
  deriving (Show)

collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e@(Add l r) f =
  collect l f ++ collect r f
collect e@(Multiply l r) f =
  collect l f ++ collect r f

isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False

isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False

optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e@(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e@(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))

但是,我要指出的是,出于解析器或编译器的目的,几乎总是 Expr 是Good Enough™.

I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.