用Waterdrop工具从hive中导入数据到clickhouse中去出现的错误。求助
sh /data01/software/waterdrop/waterdrop-1.5.1/bin/start-waterdrop.sh --config /data01/test/hive_to_ck_data.conf -e client -m 'local[4]' -i dt=2020-03-09
[INFO] spark conf: --conf "spark.sql.catalogImplementation=hive" --conf "spark.executor.memory=2g" --conf "spark.executor.instances=2" --conf "spark.app.name=Waterdrop" --conf "spark.executor.cores=2"
Warning: Ignoring non-Spark config property: "spark.executor.memory
Warning: Ignoring non-Spark config property: "spark.app.name
Warning: Ignoring non-Spark config property: "spark.executor.instances
Warning: Ignoring non-Spark config property: "spark.executor.cores
Warning: Ignoring non-Spark config property: "spark.sql.catalogImplementation
21/01/30 17:58:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] Loading config file: /data01/test/hive_to_ck_data.conf
[INFO] parsed config file: {
"spark" : {
"spark.sql.catalogImplementation" : "hive",
"spark.app.name" : "Waterdrop",
"spark.executor.instances" : 2,
"spark.executor.cores" : 2,
"spark.executor.memory" : "2g"
},
"input" : [
{
"pre_sql" : "select t.* from(select dt,id,order_id,buyer_id,receiver,contact_phone,id_card,province_code,province_name,city_code,city_name,area_code,area_name,street,address,zip_code,email,longitude,latitude,deleted,created_time,revised_time from test01.t_order_address_mall a where dt = '2020-03-09') t",
"plugin_name" : "hive",
"table_name" : "access_log"
}
],
"filter" : [],
"output" : [
{
"database" : "hd_marketing_db",
"password" : "123qwe",
"host" : "192.168.1.60:8123",
"fields" : [
"dt",
"id",
"order_id",
"buyer_id",
"receiver",
"contact_phone",
"id_card",
"province_code",
"province_name",
"city_code",
"city_name",
"area_code",
"area_name",
"street",
"address",
"zip_code",
"email",
"longitude",
"latitude",
"deleted",
"created_time",
"revised_time"
],
"plugin_name" : "clickhouse",
"table" : "t_order_address_mall_test",
"username" : "chuser"
}
]
}
[INFO] loading SparkConf:
spark.executor.extraJavaOptions => -Ddt=2020-03-09
spark.jars => file:/data01/software/waterdrop/waterdrop-1.5.1/lib/Waterdrop-1.5.1-2.11.8.jar
spark.executor.memory => 2g
spark.app.name => Waterdrop
spark.master => local[4]
spark.sql.catalogImplementation => hive
spark.executor.instances => 2
spark.submit.deployMode => client
spark.driver.extraJavaOptions => -Ddt=2020-03-09
spark.executor.cores => 2
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/01/30 17:58:09 INFO SparkContext: Running Spark version 2.4.5
21/01/30 17:58:09 INFO SparkContext: Submitted application: Waterdrop
21/01/30 17:58:09 INFO SecurityManager: Changing view acls to: root
21/01/30 17:58:09 INFO SecurityManager: Changing modify acls to: root
21/01/30 17:58:09 INFO SecurityManager: Changing view acls groups to:
21/01/30 17:58:09 INFO SecurityManager: Changing modify acls groups to:
21/01/30 17:58:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/01/30 17:58:09 INFO Utils: Successfully started service 'sparkDriver' on port 34449.
21/01/30 17:58:09 INFO SparkEnv: Registering MapOutputTracker
21/01/30 17:58:09 INFO SparkEnv: Registering BlockManagerMaster
21/01/30 17:58:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/01/30 17:58:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/01/30 17:58:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-aed6dfdf-604f-4a8e-8716-4d3a551c7053
21/01/30 17:58:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/01/30 17:58:09 INFO SparkEnv: Registering OutputCommitCoordinator
21/01/30 17:58:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/01/30 17:58:10 INFO SparkUI: Bound SparkUI to eeds-bd01, and started at http://eeds-bd01:4040
21/01/30 17:58:10 INFO SparkContext: Added JAR file:/data01/software/waterdrop/waterdrop-1.5.1/lib/Waterdrop-1.5.1-2.11.8.jar at spark://eeds-bd01:34449/jars/Waterdrop-1.5.1-2.11.8.jar with timestamp 1612000690190
21/01/30 17:58:10 INFO Executor: Starting executor ID driver on host localhost
21/01/30 17:58:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43175.
21/01/30 17:58:10 INFO NettyBlockTransferService: Server created on eeds-bd01:43175
21/01/30 17:58:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/01/30 17:58:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, eeds-bd01, 43175, None)
21/01/30 17:58:10 INFO BlockManagerMasterEndpoint: Registering block manager eeds-bd01:43175 with 366.3 MB RAM, BlockManagerId(driver, eeds-bd01, 43175, None)
21/01/30 17:58:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, eeds-bd01, 43175, None)
21/01/30 17:58:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, eeds-bd01, 43175, None)
find and register UDFs & UDAFs
21/01/30 17:58:11 INFO SharedState: loading hive config file: file:/data01/software/waterdrop/spark-2.4.5-bin-hadoop2.7/conf/hive-site.xml
21/01/30 17:58:11 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
21/01/30 17:58:11 INFO SharedState: Warehouse path is '/user/hive/warehouse'.
21/01/30 17:58:11 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
found and registered UDFs count[2], UDAFs count[0]
21/01/30 17:58:12 INFO ClickHouseDriver: Driver registered
21/01/30 17:58:12 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
21/01/30 17:58:13 INFO metastore: Trying to connect to metastore with URI thrift://eeds-bd01:9083
21/01/30 17:58:13 INFO metastore: Connected to metastore.
21/01/30 17:58:13 INFO SessionState: Created local directory: /tmp/923442e6-2630-463a-8241-59160cbdc1ea_resources
21/01/30 17:58:13 INFO SessionState: Created HDFS directory: /tmp/hive/root/923442e6-2630-463a-8241-59160cbdc1ea
21/01/30 17:58:13 INFO SessionState: Created local directory: /tmp/root/923442e6-2630-463a-8241-59160cbdc1ea
21/01/30 17:58:13 INFO SessionState: Created HDFS directory: /tmp/hive/root/923442e6-2630-463a-8241-59160cbdc1ea/_tmp_space.db
21/01/30 17:58:13 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
##
*#* *#* *#* ##
*#* *#* *#* ##
*#* *###* *#* ** ##
*#* *#*#* *#* ## ##
*#* *#*#* *#* ## ##
*#* ******* *#* ******* ####### ****** ## ***# ****** ## ## ***# ******** ## ******
*#* *#* *#* *#* ####*##* ####### **#####* ##**### **#####*## ##**### **######** ##**#####*
*#* *#* *#* *#* *** *#* ## *#** **#* ##**** **#** **### ##**** **#******#** ##*******#*
*#* *#* *#* *#** *#* ## *#* *#* ##** *#* *## ##** *#* *#* ##* *#*
*#* *#** **#* *#* ## ## *#* *#* ##* *#* ## ##* *#* *#* ## *#*
*#* *#* *#* *#* *****## ## *########* ## *#* ## ## *#* *#* ## *#*
*#* *#* *#* *#* **##***## ## *########* ## *# ## ## *# #* ## #*
*#***#* *#**#** *#** ## ## *#* ## *# ## ## *#* *#* ## *#*
*#*#* *#*#* *#* ## ## *#* ## *#* ## ## *#* *#* ## *#*
*#*#* *#*#* *#* ##* *# *#** ## *#* *## ## *#* *#* ##* *#*
*###* *###* *#*****##* *#** **#*** *** ## *#*******## ## **#** ***#** ##*** **#**
*##** **##* *#####**##* *#### **####### ## *#####**## ## **######** ##*#####**
*#* *#* ********** **## ***#**** ## ****** ## ## ******** ## ******
##
##
##
##
**
21/01/30 17:58:13 INFO Clickhouse: insert into t_order_address_mall_test (`dt`,`id`,`order_id`,`buyer_id`,`receiver`,`contact_phone`,`id_card`,`province_code`,`province_name`,`city_code`,`city_name`,`area_code`,`area_name`,`street`,`address`,`zip_code`,`email`,`longitude`,`latitude`,`deleted`,`created_time`,`revised_time`) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
21/01/30 17:58:14 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
21/01/30 17:58:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 308.6 KB, free 366.0 MB)
21/01/30 17:58:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.2 KB, free 366.0 MB)
21/01/30 17:58:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eeds-bd01:43175 (size: 26.2 KB, free: 366.3 MB)
21/01/30 17:58:14 INFO SparkContext: Created broadcast 0 from
21/01/30 17:58:15 INFO FileInputFormat: Total input paths to process : 1
21/01/30 17:58:15 INFO SparkContext: Starting job: foreachPartition at Clickhouse.scala:162
21/01/30 17:58:15 INFO DAGScheduler: Got job 0 (foreachPartition at Clickhouse.scala:162) with 1 output partitions
21/01/30 17:58:15 INFO DAGScheduler: Final stage: ResultStage 0 (foreachPartition at Clickhouse.scala:162)
21/01/30 17:58:15 INFO DAGScheduler: Parents of final stage: List()
21/01/30 17:58:15 INFO DAGScheduler: Missing parents: List()
21/01/30 17:58:15 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at foreachPartition at Clickhouse.scala:162), which has no missing parents
21/01/30 17:58:15 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 25.9 KB, free 365.9 MB)
21/01/30 17:58:15 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 10.9 KB, free 365.9 MB)
21/01/30 17:58:15 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eeds-bd01:43175 (size: 10.9 KB, free: 366.3 MB)
21/01/30 17:58:15 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1163
21/01/30 17:58:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at foreachPartition at Clickhouse.scala:162) (first 15 tasks are for partitions Vector(0))
21/01/30 17:58:15 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
21/01/30 17:58:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 8077 bytes)
21/01/30 17:58:15 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/01/30 17:58:15 INFO Executor: Fetching spark://eeds-bd01:34449/jars/Waterdrop-1.5.1-2.11.8.jar with timestamp 1612000690190
21/01/30 17:58:15 INFO TransportClientFactory: Successfully created connection to eeds-bd01/192.168.1.155:34449 after 38 ms (0 ms spent in bootstraps)
21/01/30 17:58:15 INFO Utils: Fetching spark://eeds-bd01:34449/jars/Waterdrop-1.5.1-2.11.8.jar to /tmp/spark-066dfbf7-d276-4414-81d8-91e51d10e76d/userFiles-76a6ce6e-1d5e-484b-ad92-3a95b575aa93/fetchFileTemp2005876359851025212.tmp
21/01/30 17:58:15 INFO Executor: Adding file:/tmp/spark-066dfbf7-d276-4414-81d8-91e51d10e76d/userFiles-76a6ce6e-1d5e-484b-ad92-3a95b575aa93/Waterdrop-1.5.1-2.11.8.jar to class loader
21/01/30 17:58:16 INFO HadoopRDD: Input split: hdfs://eeds-bd01:8020/data01/hive/warehouse/test01.db/t_order_address_mall/dt=2020-03-09/part-m-00000.snappy:0+4820
21/01/30 17:58:16 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
21/01/30 17:58:16 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
21/01/30 17:58:16 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
21/01/30 17:58:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
21/01/30 17:58:16 INFO TaskSchedulerImpl: Cancelling stage 0
21/01/30 17:58:16 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
21/01/30 17:58:16 INFO DAGScheduler: ResultStage 0 (foreachPartition at Clickhouse.scala:162) failed in 0.572 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
21/01/30 17:58:16 INFO DAGScheduler: Job 0 failed: foreachPartition at Clickhouse.scala:162, took 0.628301 s
Exception in thread "main" java.lang.Exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:43)
at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:980)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:978)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:978)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2741)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2741)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2741)
at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3355)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3351)
at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2740)
at io.github.interestinglab.waterdrop.output.batch.Clickhouse.process(Clickhouse.scala:162)
at io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:251)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:215)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:214)
at scala.collection.immutable.List.foreach(List.scala:392)
at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:214)
at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:120)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:38)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:38)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:38)
at scala.util.Try$.apply(Try.scala:192)
at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:38)
... 13 more
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
21/01/30 17:58:16 INFO SparkContext: Invoking stop() from shutdown hook
21/01/30 17:58:16 INFO SparkUI: Stopped Spark web UI at http://eeds-bd01:4040
21/01/30 17:58:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/01/30 17:58:16 INFO MemoryStore: MemoryStore cleared
21/01/30 17:58:16 INFO BlockManager: BlockManager stopped
21/01/30 17:58:16 INFO BlockManagerMaster: BlockManagerMaster stopped
21/01/30 17:58:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/01/30 17:58:16 INFO SparkContext: Successfully stopped SparkContext
21/01/30 17:58:16 INFO ShutdownHookManager: Shutdown hook called
21/01/30 17:58:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-d3ff91dd-da13-424c-adae-d53bae836d97
21/01/30 17:58:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-066dfbf7-d276-4414-81d8-91e51d10e76d
在交互环境,暂时修改
scala> spark.conf.set("spark.debug.maxToStringFields","100")
===========
永久修改
在节点(集群)的sparkEnv.sh中修改或新增spark.debug.maxToStringFields变量
在hadoop下/etc/hadoop/hadoop-env.sh
文件下第52行
后面添加下方配置:
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
JAVA_LIBRARY_PATH环境变量
export JAVA_LIBRARY_PATH=’/usr/local/sinasrv2/hadoop/hadoop-2.6.4/lib/native’
看看有没有设置
这里还要特别说明的是,一定要注意自己的Hadoop以及JDK版本型号,配置环境变量的时候不能有丝毫差错!
还有一点就是有些人在/etc/profile文件中配置变量,那就要在该文件中进行修改,本人是在~/.bashrc文件中进行修改的。
source ~/.bashrc 使之生效。
再执行,则不会出现上述warning信息