【史上最详细的sqoop导入数据到hive中（2）】

【史上最详细的sqoop导入数据到hive中（二）】

1）将数据从关系数据库导入文件到hive表中

Table 1. Common arguments

Argument Description

`--connect <jdbc-uri>`	Specify JDBC connect string
`--connection-manager <class-name>`	Specify connection manager class to use
`--driver <class-name>`	Manually specify JDBC driver class to use
`--hadoop-home <dir>`	Override $HADOOP_HOME
`--help`	Print usage instructions
`-P`	Read password from console
`--password <password>`	Set authentication password
`--username <username>`	Set authentication username
`--verbose`	Print more information while working
`--connection-param-file <filename>`	Optional properties file that provides connection parameters

导入数据到hive参数

Table 7. Hive arguments:

Argument Description

`--hive-home <dir>`	Override `$HIVE_HOME`
`--hive-import`	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
`--hive-overwrite`	Overwrite existing data in the Hive table.
`--create-hive-table`	If set, then the job will fail if the target hive
	table exits. By default this property is false.
`--hive-table <table-name>`	Sets the table name to use when importing to Hive.
`--hive-drop-import-delims`	Drops \n, \r, and \01 from string fields when importing to Hive.
`--hive-delims-replacement`	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
`--hive-partition-key`	Name of a hive field to partition are sharded on
`--hive-partition-value <v>`	String-value that serves as partition key for this imported into hive in this job.
`--map-column-hive <map>`	Override default mapping from SQL type to Hive type for configured columns.

例子：

[root@hadoop0 bin]# ./sqoop import --connect jdbc:mysql://192.168.1.101/test -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ','

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

99/06/23 19:07:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

99/06/23 19:07:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

99/06/23 19:07:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

99/06/23 19:07:49 INFO tool.CodeGenTool: Beginning code generation

99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:07:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272

99/06/23 19:08:09 INFO mapreduce.Job: Job job_local417795562_0001 completed successfully

99/06/23 19:08:10 INFO mapreduce.Job: Counters: 20

File System Counters

FILE: Number of bytes read=18029933

FILE: Number of bytes written=18472806

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=0

HDFS: Number of bytes written=7294

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=3

Map-Reduce Framework

Map input records=108

Map output records=108

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=17

Total committed heap usage (bytes)=84185088

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=7294

99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Transferred 7.123 KB in 16.6249 seconds (438.7394 bytes/sec)

99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Retrieved 108 records.

99/06/23 19:08:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:08:10 INFO hive.HiveImport: Loading uploaded data into Hive

99/06/23 19:08:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hive2.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

99/06/23 19:08:54 INFO hive.HiveImport:

99/06/23 19:08:54 INFO hive.HiveImport: Logging initialized using configuration in file:/opt/bigdata/hive2.0/conf/hive-log4j2.properties

99/06/23 19:09:27 INFO hive.HiveImport: OK

99/06/23 19:09:27 INFO hive.HiveImport: Time taken: 3.378 seconds

99/06/23 19:09:32 INFO hive.HiveImport: Loading data to table default.emps

99/06/23 19:09:35 INFO hive.HiveImport: OK

99/06/23 19:09:35 INFO hive.HiveImport: Time taken: 7.827 seconds

99/06/23 19:09:36 INFO hive.HiveImport: Hive import complete.

99/06/23 19:09:36 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

[root@hadoop0 bin]#

错误解决方案：

99/06/23 18:57:26 INFO mapreduce.Job: Running job: job_local1178174559_0001

99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null

99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter

99/06/23 18:57:26 INFO mapred.LocalJobRunner: Waiting for map tasks

99/06/23 18:57:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1178174559_0001_m_000000_0

99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

99/06/23 18:57:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

99/06/23 18:57:27 INFO db.DBInputFormat: Using read commited transaction isolation

99/06/23 18:57:27 INFO mapred.MapTask: Processing split: 1=1 AND 1=1

99/06/23 18:57:27 INFO mapred.LocalJobRunner: map task executor complete.

99/06/23 18:57:27 WARN mapred.LocalJobRunner: job_local1178174559_0001

java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found

at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)

at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:403)

at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:237)

at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:263)

at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:515)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:758)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: Class people not found

at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)

at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)

... 12 more

99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 running in uber mode : false

99/06/23 18:57:27 INFO mapreduce.Job: map 0% reduce 0%

99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 failed with state FAILED due to: NA

99/06/23 18:57:27 INFO mapreduce.Job: Counters: 0

99/06/23 18:57:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead

99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 8.0751 seconds (0 bytes/sec)

99/06/23 18:57:27 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Retrieved 0 records.

99/06/23 18:57:27 ERROR tool.ImportTool: Error during import: Import job failed!

解决方案：copy文件系统/tmp下面生成的表对应的class文件和jar文件，到sqoop目录下面的bin中，再次执行命令OK

[root@hadoop0 bin]# ./sqoop import --connect jdbc:mysql://192.168.1.101/test -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ','

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

99/06/23 19:07:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

99/06/23 19:07:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

99/06/23 19:07:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

99/06/23 19:07:29 INFO tool.CodeGenTool: Beginning code generation

99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:07:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272

Note: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

99/06/23 19:07:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.jar

99/06/23 19:07:32 WARN manager.MySQLManager: It looks like you are importing from mysql.

99/06/23 19:07:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

99/06/23 19:07:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

99/06/23 19:07:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

99/06/23 19:07:32 INFO mapreduce.ImportJobBase: Beginning import of people

99/06/23 19:07:32 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

99/06/23 19:07:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

99/06/23 19:07:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

99/06/23 19:07:34 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

99/06/23 19:07:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

99/06/23 19:07:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id

99/06/23 19:07:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

99/06/23 19:07:35 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoop0:9000/user/root/people already exists

at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)

at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)

at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)

at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)

at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)

at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)

at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

解决方案：删除对应的目录即可

[root@hadoop0 bin]# ./hadoop fs -rmr /user/root/people

rmr: DEPRECATED: Please use 'rm -r' instead.

99/06/23 19:07:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

99/06/23 19:07:44 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /user/root/people

【史上最详细的sqoop导入数据到hive中（2）】

相关推荐