enableHiveSupport在Java Spark代码中引发错误
我有一个非常简单的应用程序,试图使用spark从/src/main/resources读取orc文件.我不断收到此错误:
I have a very simple application that is trying to read an orc file from /src/main/resources using spark. I keep getting this error:
由于未找到Hive类,因此无法在Hive支持下实例化SparkSession.
Unable to instantiate SparkSession with Hive support because Hive classes are not found.
我尝试添加依赖项
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
如此处建议:无法在Hive支持下实例化SparkSession,因为未找到Hive类
但是,无论我添加了什么,我仍然会收到此错误.
however, no matter what I have added, I still get this error.
我正在通过NetBeans IDE在本地Windows计算机上运行此程序.
I am running this on my local windows machine through NetBeans IDE.
我的代码:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.*;
public class Main {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.enableHiveSupport()
.appName("Java Spark SQL basic example")
.getOrCreate();
Dataset<Row> df = spark.read().orc("/src/main/resources/testdir");
spark.close();
}
}
如果在IDE
中运行,建议在SparkSession
对象中使用.master("local")
.
If you are running in IDE
, I recommend to use .master("local")
in you SparkSession
object.
下一个重点是spark-hive的版本应与spark-core和spark-sql的版本匹配.为了安全起见,您可以将依赖项定义为
Next important point is that the version of spark-hive should match with spark-core and spark-sql versions. for safety you can define dependency as
<properties>
<spark.version>2.0.0</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>