用Waterdrop工具从hive中导入数据到clickhouse中去出现的错误。求助

用Waterdrop工具从hive中导入数据到clickhouse中去出现的错误。求助

问题描述:

sh /data01/software/waterdrop/waterdrop-1.5.1/bin/start-waterdrop.sh --config /data01/test/hive_to_ck_data.conf -e client -m 'local[4]' -i dt=2020-03-09

[INFO] spark conf: --conf "spark.sql.catalogImplementation=hive" --conf "spark.executor.memory=2g" --conf "spark.executor.instances=2" --conf "spark.app.name=Waterdrop" --conf "spark.executor.cores=2"
Warning: Ignoring non-Spark config property: "spark.executor.memory
Warning: Ignoring non-Spark config property: "spark.app.name
Warning: Ignoring non-Spark config property: "spark.executor.instances
Warning: Ignoring non-Spark config property: "spark.executor.cores
Warning: Ignoring non-Spark config property: "spark.sql.catalogImplementation
21/01/30 17:58:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] Loading config file: /data01/test/hive_to_ck_data.conf
[INFO] parsed config file: {
    "spark" : {
        "spark.sql.catalogImplementation" : "hive",
        "spark.app.name" : "Waterdrop",
        "spark.executor.instances" : 2,
        "spark.executor.cores" : 2,
        "spark.executor.memory" : "2g"
    },
    "input" : [
        {
            "pre_sql" : "select t.* from(select dt,id,order_id,buyer_id,receiver,contact_phone,id_card,province_code,province_name,city_code,city_name,area_code,area_name,street,address,zip_code,email,longitude,latitude,deleted,created_time,revised_time from test01.t_order_address_mall a where dt = '2020-03-09') t",
            "plugin_name" : "hive",
            "table_name" : "access_log"
        }
    ],
    "filter" : [],
    "output" : [
        {
            "database" : "hd_marketing_db",
            "password" : "123qwe",
            "host" : "192.168.1.60:8123",
            "fields" : [
                "dt",
                "id",
                "order_id",
                "buyer_id",
                "receiver",
                "contact_phone",
                "id_card",
                "province_code",
                "province_name",
                "city_code",
                "city_name",
                "area_code",
                "area_name",
                "street",
                "address",
                "zip_code",
                "email",
                "longitude",
                "latitude",
                "deleted",
                "created_time",
                "revised_time"
            ],
            "plugin_name" : "clickhouse",
            "table" : "t_order_address_mall_test",
            "username" : "chuser"
        }
    ]
}

[INFO] loading SparkConf: 
    spark.executor.extraJavaOptions => -Ddt=2020-03-09
    spark.jars => file:/data01/software/waterdrop/waterdrop-1.5.1/lib/Waterdrop-1.5.1-2.11.8.jar
    spark.executor.memory => 2g
    spark.app.name => Waterdrop
    spark.master => local[4]
    spark.sql.catalogImplementation => hive
    spark.executor.instances => 2
    spark.submit.deployMode => client
    spark.driver.extraJavaOptions => -Ddt=2020-03-09
    spark.executor.cores => 2
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/01/30 17:58:09 INFO SparkContext: Running Spark version 2.4.5
21/01/30 17:58:09 INFO SparkContext: Submitted application: Waterdrop
21/01/30 17:58:09 INFO SecurityManager: Changing view acls to: root
21/01/30 17:58:09 INFO SecurityManager: Changing modify acls to: root
21/01/30 17:58:09 INFO SecurityManager: Changing view acls groups to: 
21/01/30 17:58:09 INFO SecurityManager: Changing modify acls groups to: 
21/01/30 17:58:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/01/30 17:58:09 INFO Utils: Successfully started service 'sparkDriver' on port 34449.
21/01/30 17:58:09 INFO SparkEnv: Registering MapOutputTracker
21/01/30 17:58:09 INFO SparkEnv: Registering BlockManagerMaster
21/01/30 17:58:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/01/30 17:58:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/01/30 17:58:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-aed6dfdf-604f-4a8e-8716-4d3a551c7053
21/01/30 17:58:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/01/30 17:58:09 INFO SparkEnv: Registering OutputCommitCoordinator
21/01/30 17:58:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/01/30 17:58:10 INFO SparkUI: Bound SparkUI to eeds-bd01, and started at http://eeds-bd01:4040
21/01/30 17:58:10 INFO SparkContext: Added JAR file:/data01/software/waterdrop/waterdrop-1.5.1/lib/Waterdrop-1.5.1-2.11.8.jar at spark://eeds-bd01:34449/jars/Waterdrop-1.5.1-2.11.8.jar with timestamp 1612000690190
21/01/30 17:58:10 INFO Executor: Starting executor ID driver on host localhost
21/01/30 17:58:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43175.
21/01/30 17:58:10 INFO NettyBlockTransferService: Server created on eeds-bd01:43175
21/01/30 17:58:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/01/30 17:58:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, eeds-bd01, 43175, None)
21/01/30 17:58:10 INFO BlockManagerMasterEndpoint: Registering block manager eeds-bd01:43175 with 366.3 MB RAM, BlockManagerId(driver, eeds-bd01, 43175, None)
21/01/30 17:58:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, eeds-bd01, 43175, None)
21/01/30 17:58:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, eeds-bd01, 43175, None)
find and register UDFs & UDAFs
21/01/30 17:58:11 INFO SharedState: loading hive config file: file:/data01/software/waterdrop/spark-2.4.5-bin-hadoop2.7/conf/hive-site.xml
21/01/30 17:58:11 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
21/01/30 17:58:11 INFO SharedState: Warehouse path is '/user/hive/warehouse'.
21/01/30 17:58:11 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
found and registered UDFs count[2], UDAFs count[0]
21/01/30 17:58:12 INFO ClickHouseDriver: Driver registered
21/01/30 17:58:12 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
21/01/30 17:58:13 INFO metastore: Trying to connect to metastore with URI thrift://eeds-bd01:9083
21/01/30 17:58:13 INFO metastore: Connected to metastore.
21/01/30 17:58:13 INFO SessionState: Created local directory: /tmp/923442e6-2630-463a-8241-59160cbdc1ea_resources
21/01/30 17:58:13 INFO SessionState: Created HDFS directory: /tmp/hive/root/923442e6-2630-463a-8241-59160cbdc1ea
21/01/30 17:58:13 INFO SessionState: Created local directory: /tmp/root/923442e6-2630-463a-8241-59160cbdc1ea
21/01/30 17:58:13 INFO SessionState: Created HDFS directory: /tmp/hive/root/923442e6-2630-463a-8241-59160cbdc1ea/_tmp_space.db
21/01/30 17:58:13 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
                                                                                  ##                                                            
      *#*      *#*      *#*                                                       ##                                                            
      *#*      *#*      *#*                                                       ##                                                            
      *#*     *###*     *#*               **                                      ##                                                            
      *#*     *#*#*     *#*               ##                                      ##                                                            
       *#*    *#*#*    *#*                ##                                      ##                                                            
       *#*   *******   *#*   *******    #######     ******      ## ***#    ****** ##     ## ***#    ********      ## ******                     
       *#*   *#* *#*   *#*   ####*##*   #######    **#####*     ##**###   **#####*##     ##**###   **######**     ##**#####*                    
       *#*   *#* *#*   *#*   ***  *#*     ##       *#** **#*    ##****   **#** **###     ##****   **#******#**    ##*******#*                   
        *#*  *#* *#*  *#**        *#*     ##      *#*    *#*    ##**     *#*     *##     ##**     *#*      *#*    ##*     *#*                   
        *#* *#** **#* *#*          ##     ##      *#*    *#*    ##*      *#*      ##     ##*      *#*      *#*    ##      *#*                   
        *#* *#*   *#* *#*     *****##     ##      *########*    ##       *#*      ##     ##       *#*      *#*    ##      *#*                   
        *#* *#*   *#* *#*   **##***##     ##      *########*    ##       *#       ##     ##       *#        #*    ##       #*                   
        *#***#*   *#**#**   *#**   ##     ##      *#*           ##       *#       ##     ##       *#*      *#*    ##      *#*                   
         *#*#*     *#*#*    *#*    ##     ##      *#*           ##       *#*      ##     ##       *#*      *#*    ##      *#*                   
         *#*#*     *#*#*    *#*    ##*    *#      *#**          ##       *#*     *##     ##       *#*      *#*    ##*     *#*                   
         *###*     *###*    *#*****##*    *#**    **#*** ***    ##       *#*******##     ##       **#** ***#**    ##*** **#**                   
         *##**     **##*    *#####**##*   *####    **#######    ##        *#####**##     ##        **######**     ##*#####**                    
          *#*       *#*      **********    **##     ***#****    ##         ****** ##     ##         ********      ## ******                     
                                                                                                                  ##                            
                                                                                                                  ##                            
                                                                                                                  ##                            
                                                                                                                  ##                            
                                                                                                                  **                            
21/01/30 17:58:13 INFO Clickhouse: insert into t_order_address_mall_test (`dt`,`id`,`order_id`,`buyer_id`,`receiver`,`contact_phone`,`id_card`,`province_code`,`province_name`,`city_code`,`city_name`,`area_code`,`area_name`,`street`,`address`,`zip_code`,`email`,`longitude`,`latitude`,`deleted`,`created_time`,`revised_time`) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
21/01/30 17:58:14 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
21/01/30 17:58:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 308.6 KB, free 366.0 MB)
21/01/30 17:58:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.2 KB, free 366.0 MB)
21/01/30 17:58:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eeds-bd01:43175 (size: 26.2 KB, free: 366.3 MB)
21/01/30 17:58:14 INFO SparkContext: Created broadcast 0 from 
21/01/30 17:58:15 INFO FileInputFormat: Total input paths to process : 1
21/01/30 17:58:15 INFO SparkContext: Starting job: foreachPartition at Clickhouse.scala:162
21/01/30 17:58:15 INFO DAGScheduler: Got job 0 (foreachPartition at Clickhouse.scala:162) with 1 output partitions
21/01/30 17:58:15 INFO DAGScheduler: Final stage: ResultStage 0 (foreachPartition at Clickhouse.scala:162)
21/01/30 17:58:15 INFO DAGScheduler: Parents of final stage: List()
21/01/30 17:58:15 INFO DAGScheduler: Missing parents: List()
21/01/30 17:58:15 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at foreachPartition at Clickhouse.scala:162), which has no missing parents
21/01/30 17:58:15 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 25.9 KB, free 365.9 MB)
21/01/30 17:58:15 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 10.9 KB, free 365.9 MB)
21/01/30 17:58:15 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eeds-bd01:43175 (size: 10.9 KB, free: 366.3 MB)
21/01/30 17:58:15 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1163
21/01/30 17:58:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at foreachPartition at Clickhouse.scala:162) (first 15 tasks are for partitions Vector(0))
21/01/30 17:58:15 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
21/01/30 17:58:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 8077 bytes)
21/01/30 17:58:15 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/01/30 17:58:15 INFO Executor: Fetching spark://eeds-bd01:34449/jars/Waterdrop-1.5.1-2.11.8.jar with timestamp 1612000690190
21/01/30 17:58:15 INFO TransportClientFactory: Successfully created connection to eeds-bd01/192.168.1.155:34449 after 38 ms (0 ms spent in bootstraps)
21/01/30 17:58:15 INFO Utils: Fetching spark://eeds-bd01:34449/jars/Waterdrop-1.5.1-2.11.8.jar to /tmp/spark-066dfbf7-d276-4414-81d8-91e51d10e76d/userFiles-76a6ce6e-1d5e-484b-ad92-3a95b575aa93/fetchFileTemp2005876359851025212.tmp
21/01/30 17:58:15 INFO Executor: Adding file:/tmp/spark-066dfbf7-d276-4414-81d8-91e51d10e76d/userFiles-76a6ce6e-1d5e-484b-ad92-3a95b575aa93/Waterdrop-1.5.1-2.11.8.jar to class loader
21/01/30 17:58:16 INFO HadoopRDD: Input split: hdfs://eeds-bd01:8020/data01/hive/warehouse/test01.db/t_order_address_mall/dt=2020-03-09/part-m-00000.snappy:0+4820
21/01/30 17:58:16 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
21/01/30 17:58:16 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

21/01/30 17:58:16 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
21/01/30 17:58:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
21/01/30 17:58:16 INFO TaskSchedulerImpl: Cancelling stage 0
21/01/30 17:58:16 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
21/01/30 17:58:16 INFO DAGScheduler: ResultStage 0 (foreachPartition at Clickhouse.scala:162) failed in 0.572 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
21/01/30 17:58:16 INFO DAGScheduler: Job 0 failed: foreachPartition at Clickhouse.scala:162, took 0.628301 s
Exception in thread "main" java.lang.Exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:43)
    at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:980)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:978)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:978)
    at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2741)
    at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2741)
    at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2741)
    at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3355)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
    at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3351)
    at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2740)
    at io.github.interestinglab.waterdrop.output.batch.Clickhouse.process(Clickhouse.scala:162)
    at io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:251)
    at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:215)
    at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:214)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:214)
    at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:120)
    at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:38)
    at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:38)
    at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:38)
    at scala.util.Try$.apply(Try.scala:192)
    at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:38)
    ... 13 more
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
21/01/30 17:58:16 INFO SparkContext: Invoking stop() from shutdown hook
21/01/30 17:58:16 INFO SparkUI: Stopped Spark web UI at http://eeds-bd01:4040
21/01/30 17:58:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/01/30 17:58:16 INFO MemoryStore: MemoryStore cleared
21/01/30 17:58:16 INFO BlockManager: BlockManager stopped
21/01/30 17:58:16 INFO BlockManagerMaster: BlockManagerMaster stopped
21/01/30 17:58:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/01/30 17:58:16 INFO SparkContext: Successfully stopped SparkContext
21/01/30 17:58:16 INFO ShutdownHookManager: Shutdown hook called
21/01/30 17:58:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-d3ff91dd-da13-424c-adae-d53bae836d97
21/01/30 17:58:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-066dfbf7-d276-4414-81d8-91e51d10e76d
 

在交互环境,暂时修改

scala> spark.conf.set("spark.debug.maxToStringFields","100")

=========== 

永久修改

在节点(集群)的sparkEnv.sh中修改或新增spark.debug.maxToStringFields变量

在hadoop下/etc/hadoop/hadoop-env.sh文件下第52行后面添加下方配置:

export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"

JAVA_LIBRARY_PATH环境变量

export JAVA_LIBRARY_PATH=’/usr/local/sinasrv2/hadoop/hadoop-2.6.4/lib/native’

看看有没有设置

这里还要特别说明的是,一定要注意自己的Hadoop以及JDK版本型号,配置环境变量的时候不能有丝毫差错!
还有一点就是有些人在/etc/profile文件中配置变量,那就要在该文件中进行修改,本人是在~/.bashrc文件中进行修改的。
source ~/.bashrc 使之生效。
再执行,则不会出现上述warning信息