hive创造外部表过程
hive创建外部表过程
hadoop@hadoopSlave2:/opt/hadoop/testData$ touch hiveOuter.log hadoop@hadoopSlave2:/opt/hadoop/testData$ vim hiveOuter.log 20140413 jim0 2014-06-06 20140414 jim1 2014-06-06 ... hadoop@hadoopSlave2:/opt/hadoop/testData$ scp -rv hiveOuter.log hadoopMaster:/opt/hadoop/hive/ hadoop@hadoopSlave2:/opt/hadoop/testData$ ssh hadoopMaster hadoop@hadoopMaster:~$ cd /opt/hadoop/hive/ hadoop@hadoopMaster:/opt/hadoop/hive$ hdfs dfs -moveFromLocal /opt/hadoop/hive/hiveOuter.log /user/hive/ hadoop@hadoopMaster:/opt/hadoop/hive$ hdfs dfs -ls -R / -rw-r--r-- 2 hadoop supergroup 2591 2014-06-06 10:00 /user/hive/hiveOuter.log drwxr-xr-x - hadoop supergroup 0 2014-06-06 09:51 /user/hive/warehouse drwxr-xr-x - hadoop supergroup 0 2014-06-06 09:51 /user/hive/warehouse/test1 ... hive> create EXTERNAL table IF NOT EXISTS userOuter (id int,name string, ed string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location 'hdfs://hadoopMaster:9000/user/hive/warehouse/test1'; OK Time taken: 0.309 seconds hive> show tables; OK userouter Time taken: 0.045 seconds, Fetched: 12 row(s) hive> LOAD DATA INPATH 'hdfs://hadoopMaster:9000/user/hive/warehouse/hiveOuter.log' INTO TABLE userOuter partition(ptDate='20140606'); Loading data to table default.hiveOuter partition (ptdate=20140606) Partition default.testouter{ptdate=20140606} stats: [numFiles=3, numRows=0, totalSize=4312, rawDataSize=0] OK Time taken: 1.153 seconds hive> SELECT * FROM userouter; OK 20140413 jim0 2014-06-06 20140414 jim1 2014-06-06 20140415 jim2 2014-06-06 .... 此时hive将该表的数据文件信息保存到metadata数据库中。 hadoop@hadoopMaster:~$ mysql -uroot -proot Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 47 Server version: 5.5.37-0ubuntu0.12.04.1 (Ubuntu) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | hiveDB | | mysql | | performance_schema | | schools | +--------------------+ 5 rows in set (0.00 sec) mysql> use hiveDB; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed 可以看到该表的类型为EXTERNAL_TABLE。 mysql> SELECT * FROM TBLS WHERE TBL_NAME = 'userouter'; +--------+-------------+-------+------------------+--------+-----------+-------+-----------+----------------+--------------------+--------------------+ | TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | +--------+-------------+-------+------------------+--------+-----------+-------+-----------+----------------+--------------------+--------------------+ | 25 | 1402020143 | 1 | 0 | hadoop | 0 | 26 | userouter | EXTERNAL_TABLE | NULL | NULL | +--------+-------------+-------+------------------+--------+-----------+-------+-----------+----------------+--------------------+--------------------+ 1 row in set (0.00 sec) 在表SDS中记录了表userouter的数据文件路径为hdfs://hadoopMaster:9000/user/hive/warehouse/test1。 mysql> SELECT * FROM SDS WHERE SD_ID=25; +-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+ | 25 | 24 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://hadoopMaster:9000/user/hive/warehouse/test1 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 25 | +-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+ 1 row in set (0.00 sec)