如何在不执行SQL表达式的情况下对其进行验证(在用户前端)?
我想在没有在集群上实际运行查询的情况下验证spark-sql查询在语法上是否正确.
I want to validate if spark-sql query is syntactically correct or not without actually running the query on the cluster.
实际用例是我正在尝试开发一个用户界面,该界面接受用户输入spark-sql查询,并且我应该能够验证所提供的查询在语法上是否正确. 另外,如果在解析查询后,我可以针对最好的Spark最佳做法提供有关该查询的任何建议.
Actual use case is that I am trying to develop a user interface, which accepts user to enter a spark-sql query and I should be able to verify if the query provided is syntactically correct or not. Also if after parsing the query, I can give any recommendation about the query with respect to spark best practices that would be best.
SparkSqlParser
SparkSqlParser
Spark SQL uses SparkSqlParser as the parser for Spark SQL expressions.
您可以使用SparkSession
(和SessionState
)访问SparkSqlParser
,如下所示:
You can access SparkSqlParser
using SparkSession
(and SessionState
) as follows:
val spark: SparkSession = ...
val parser = spark.sessionState.sqlParser
scala> parser.parseExpression("select * from table")
res1: org.apache.spark.sql.catalyst.expressions.Expression = ('select * 'from) AS table#0
提示:为org.apache.spark.sql.execution.SparkSqlParser
记录器启用INFO
日志记录级别,以查看内部发生的情况.
TIP: Enable INFO
logging level for org.apache.spark.sql.execution.SparkSqlParser
logger to see what happens inside.
仅此一项并不能为您提供最正确的防弹盾,以防止错误的SQL表达式,并认为
That alone won't give you the most bullet-proof shield against incorrect SQL expressions and think sql method is a better fit.
sql(sqlText:String):DataFrame 使用Spark执行SQL查询,并将结果作为DataFrame返回.可以使用"spark.sql.dialect"来配置用于SQL解析的方言.
sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
请同时参阅以下内容.
scala> parser.parseExpression("hello world")
res5: org.apache.spark.sql.catalyst.expressions.Expression = 'hello AS world#2
scala> spark.sql("hello world")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'hello' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)
== SQL ==
hello world
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
... 49 elided