【史上最详细的sqoop导入数据到hive中(2)】
1)将数据从关系数据库导入文件到hive表中
Table 1. Common arguments
--connect <jdbc-uri> |
Specify JDBC connect string |
--connection-manager <class-name> |
Specify connection manager class to use |
--driver <class-name> |
Manually specify JDBC driver class to use |
--hadoop-home <dir> |
Override $HADOOP_HOME |
--help |
Print usage instructions |
-P |
Read password from console |
--password <password> |
Set authentication password |
--username <username> |
Set authentication username |
--verbose |
Print more information while working |
--connection-param-file <filename> |
Optional properties file that provides connection parameters |
导入数据到hive参数
Table 7. Hive arguments:
--hive-home <dir> |
Override $HIVE_HOME
|
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
例子:
[root@hadoop0 bin]# ./sqoop import --connect jdbc:mysql://192.168.1.101/test -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ','
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
99/06/23 19:07:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
99/06/23 19:07:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
99/06/23 19:07:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
99/06/23 19:07:49 INFO tool.CodeGenTool: Beginning code generation
99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272
99/06/23 19:08:09 INFO mapreduce.Job: Job job_local417795562_0001 completed successfully
99/06/23 19:08:10 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=18029933
FILE: Number of bytes written=18472806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=7294
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=108
Map output records=108
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=17
Total committed heap usage (bytes)=84185088
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=7294
99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Transferred 7.123 KB in 16.6249 seconds (438.7394 bytes/sec)
99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Retrieved 108 records.
99/06/23 19:08:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:08:10 INFO hive.HiveImport: Loading uploaded data into Hive
99/06/23 19:08:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hive2.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
99/06/23 19:08:54 INFO hive.HiveImport:
99/06/23 19:08:54 INFO hive.HiveImport: Logging initialized using configuration in file:/opt/bigdata/hive2.0/conf/hive-log4j2.properties
99/06/23 19:09:27 INFO hive.HiveImport: OK
99/06/23 19:09:27 INFO hive.HiveImport: Time taken: 3.378 seconds
99/06/23 19:09:32 INFO hive.HiveImport: Loading data to table default.emps
99/06/23 19:09:35 INFO hive.HiveImport: OK
99/06/23 19:09:35 INFO hive.HiveImport: Time taken: 7.827 seconds
99/06/23 19:09:36 INFO hive.HiveImport: Hive import complete.
99/06/23 19:09:36 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
[root@hadoop0 bin]#
错误解决方案:
99/06/23 18:57:26 INFO mapreduce.Job: Running job: job_local1178174559_0001
99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null
99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
99/06/23 18:57:26 INFO mapred.LocalJobRunner: Waiting for map tasks
99/06/23 18:57:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1178174559_0001_m_000000_0
99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
99/06/23 18:57:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
99/06/23 18:57:27 INFO db.DBInputFormat: Using read commited transaction isolation
99/06/23 18:57:27 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
99/06/23 18:57:27 INFO mapred.LocalJobRunner: map task executor complete.
99/06/23 18:57:27 WARN mapred.LocalJobRunner: job_local1178174559_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:403)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:237)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:263)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:515)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:758)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class people not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 12 more
99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 running in uber mode : false
99/06/23 18:57:27 INFO mapreduce.Job: map 0% reduce 0%
99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 failed with state FAILED due to: NA
99/06/23 18:57:27 INFO mapreduce.Job: Counters: 0
99/06/23 18:57:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 8.0751 seconds (0 bytes/sec)
99/06/23 18:57:27 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Retrieved 0 records.
99/06/23 18:57:27 ERROR tool.ImportTool: Error during import: Import job failed!
解决方案:copy文件系统/tmp下面生成的表对应的class文件和jar文件,到sqoop目录下面的bin中,再次执行命令OK
[root@hadoop0 bin]# ./sqoop import --connect jdbc:mysql://192.168.1.101/test -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ','
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
99/06/23 19:07:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
99/06/23 19:07:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
99/06/23 19:07:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
99/06/23 19:07:29 INFO tool.CodeGenTool: Beginning code generation
99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272
Note: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
99/06/23 19:07:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.jar
99/06/23 19:07:32 WARN manager.MySQLManager: It looks like you are importing from mysql.
99/06/23 19:07:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
99/06/23 19:07:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
99/06/23 19:07:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
99/06/23 19:07:32 INFO mapreduce.ImportJobBase: Beginning import of people
99/06/23 19:07:32 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
99/06/23 19:07:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
99/06/23 19:07:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
99/06/23 19:07:34 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
99/06/23 19:07:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
99/06/23 19:07:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
99/06/23 19:07:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
99/06/23 19:07:35 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoop0:9000/user/root/people already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
解决方案:删除对应的目录即可
[root@hadoop0 bin]# ./hadoop fs -rmr /user/root/people
rmr: DEPRECATED: Please use 'rm -r' instead.
99/06/23 19:07:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
99/06/23 19:07:44 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/root/people