如何在没有Scala的情况下测试Spark SQL查询

问题描述:

我试图弄清楚如何针对Cassandra数据库测试Spark SQL查询-就像在SQL Server Management Studio中一样.目前,我必须打开Spark控制台并键入Scala命令,这确实很乏味且容易出错.

I am trying to figure out how to test Spark SQL queries against a Cassandra database -- kind of like you would in SQL Server Management Studio. Currently I have to open the Spark Console and type Scala commands which is really tedious and error prone.

类似的东西:

scala > var query = csc.sql("select * from users");
scala > query.collect().foreach(println)

特别是对于较长的查询,这可能是一个真正的痛苦.

Especially with longer queries this can be a real pain.

这似乎是一种非常低效的方法,无法测试您的查询是否正确以及您将获得哪些数据.另一个问题是,当您的查询错误时,您会收到一英里长的错误消息,并且必须向上滚动控制台才能找到它.如何在不使用控制台或编写自己的应用程序的情况下测试我的Spark查询?

This seems like a terribly inefficient way to test if your query is correct and what data you will get back. The other issue is when your query is wrong you get back a mile long error message and you have to scroll up the console to find it. How do I test my spark queries without using the console or writing my own application?

您可以使用bin/spark-sql避免构造Scala程序,而只需编写SQL.

You could use bin/spark-sql to avoid construct Scala program and just write SQL.

为了使用bin/spark-sql,您可能需要使用-Phive-Phive-thriftserver重建火花.

In order to use bin/spark-sql you may need to rebuild your spark with -Phive and -Phive-thriftserver.

有关 Building Spark 的更多信息.注意:请勿针对Scala2.11构建,目前尚未准备好旧服务器的依赖关系.

More informations on Building Spark. Note: do not build against Scala2.11, thrift server dependencies seem not ready for the moment.