将文本加载到Orc文件
如何将文本文件加载到Hive orc外部表中?
How to load text file into Hive orc external table?
create table MyDB.TEST (
Col1 String,
Col2 String,
Col3 String,
Col4 String)
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
我已经在上表中创建了Orc.但是在从表中获取数据时,它显示以下错误 失败
I have already created above table as Orc. but while fetching data from table it show below error Failed with exception
java.io.IOException:org.apache.orc.FileFormatException:格式错误的ORC文件hdfs://localhost:9000/Ext/sqooporc/part-m-00000.无效的 后记.
java.io.IOException:org.apache.orc.FileFormatException: Malformed ORC file hdfs://localhost:9000/Ext/sqooporc/part-m-00000. Invalid postscript.
有多个步骤.详细说明.
There are multiple steps to that. Follows the details.
-
创建一个能够从纯文本文件读取的配置单元表.假设您的文件是逗号分隔文件,并且文件位于HDFS上的/user/data/file1.txt位置,则语法如下.
Create a hive table which is able to read from the plain text file. Assuming that your file is a comma delimited file and your file is on HDFS on a location called /user/data/file1.txt, follows will be the syntax.
create table MyDB.TEST (
Col1 String,
Col2 String,
Col3 String,
Col4 String
)
row format delimited
fields terminated by ','
location '/user/data/file1.txt';
现在您有了一个与您所拥有的数据格式同步的架构.
Now you have a schema which is in sync with the format of the data you posses.
- 使用ORC模式创建另一个表
现在,您需要像之前创建的那样创建ORC表.这是用于创建该表的简单语法.
Now you need to create the ORC table as you were creating earlier. Here is a simpler syntax for creating that table.
create table MyDB.TEST_ORC (
Col1 String,
Col2 String,
Col3 String,
Col4 String)
STORED AS ORC;
-
您的TEST_ORC表现在是空表.您可以使用以下命令,使用TEST表中的数据填充该表.
Your TEST_ORC table is an empty table now. You can populate this table using the data from TEST table using the following command.
INSERT OVERWRITE TABLE TEST_ORC SELECT * FROM TEST;
上述语句将从TEST表中选择所有记录,并将尝试将这些记录写入TEST_ORC表.由于TEST_ORC是ORC表,因此将数据写入表后会立即将其转换为ORC格式.
The aforementioned statement will select all the records from TEST table and will try to write those records to TEST_ORC table. Since TEST_ORC is an ORC table, the data will be converted to ORC format on the fly when written into the table.
您甚至可以检查TEST_ORC表中ORC文件的存储位置.
You can even check the storage location of TEST_ORC table for ORC files.
现在您的数据为ORC格式,并且表TEST_ORC具有解析所需的架构.如果不需要,您可以立即删除TEST表.
Now your data is in ORC format and your table TEST_ORC has the required schema to parse it. You may drop your TEST table now, if not needed.
希望有帮助!