如何在不执行SQL表达式的情况下对其进行验证(在用户前端)?

问题描述：

我想在没有在集群上实际运行查询的情况下验证spark-sql查询在语法上是否正确.

I want to validate if spark-sql query is syntactically correct or not without actually running the query on the cluster.

实际用例是我正在尝试开发一个用户界面，该界面接受用户输入spark-sql查询，并且我应该能够验证所提供的查询在语法上是否正确. 另外，如果在解析查询后，我可以针对最好的Spark最佳做法提供有关该查询的任何建议.

Actual use case is that I am trying to develop a user interface, which accepts user to enter a spark-sql query and I should be able to verify if the query provided is syntactically correct or not. Also if after parsing the query, I can give any recommendation about the query with respect to spark best practices that would be best.

答

SparkSqlParser

Spark SQL使用

SparkSqlParser

Spark SQL uses SparkSqlParser as the parser for Spark SQL expressions.

您可以使用SparkSession(和SessionState)访问SparkSqlParser，如下所示:

You can access SparkSqlParser using SparkSession (and SessionState) as follows:

val spark: SparkSession = ...
val parser = spark.sessionState.sqlParser

scala> parser.parseExpression("select * from table")
res1: org.apache.spark.sql.catalyst.expressions.Expression = ('select * 'from) AS table#0

提示:为org.apache.spark.sql.execution.SparkSqlParser记录器启用INFO日志记录级别，以查看内部发生的情况.

TIP: Enable INFO logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.

仅此一项并不能为您提供最正确的防弹盾，以防止错误的SQL表达式，并认为

That alone won't give you the most bullet-proof shield against incorrect SQL expressions and think sql method is a better fit.

sql(sqlText:String):DataFrame 使用Spark执行SQL查询，并将结果作为DataFrame返回.可以使用"spark.sql.dialect"来配置用于SQL解析的方言.

sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.

请同时参阅以下内容.

scala> parser.parseExpression("hello world")
res5: org.apache.spark.sql.catalyst.expressions.Expression = 'hello AS world#2

scala> spark.sql("hello world")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'hello' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)

== SQL ==
hello world
^^^

  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
  ... 49 elided

如何在不执行SQL表达式的情况下对其进行验证(在用户前端)?

SparkSqlParser

SparkSqlParser

相关推荐