配置单元外部表指向带有嵌入式双引号的CSV文件
问题描述:
我正在尝试创建一个指向CSV文件的外部Hive表。
我的CSV文件中有一列(col2),该列值中可能包含双引号和逗号。
I am trying to create an external Hive table pointing to a CSV file. My CSV file has a column(col2) that could have double quotes and comma as part of the column value.
每列中的数据:
Col1 : 150
Col2 : BATWING, ABC "D " TEST DATA
Col3 : 300
CSV格式的行:
150,"BATWING, ABC ""D "" TEST DATA",300
创建表DDL :
CREATE EXTERNAL TABLE test (
col1 INT,
col2 STRING,
col3 INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '"'
LOCATION 's3://test-folder/test-file.csv'
查询表时,我在col3中看到空值。
When I query the table, I see null values in col3.
创建表时我在这里缺少什么?感谢您提供帮助
What am I missing here while creating the table? Any help is appreciated
答
使用 OpenCSVSerde
。这是一个示例
创建表格
CREATE TABLE bala (col1 int, col2 string, col3 int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES(
"separatorChar" = ",", "escapeChar"='\"'
);
加载数据
hive>LOAD DATA INPATH '/../test.csv' INTO TABLE bala
Loading data to table bala
Table testing.bala stats: [numFiles=1, totalSize=40]
OK
Time taken: 0.514 seconds
检查是否已加载
hive> select * from bala;
OK
150 BATWING, ABC "D " TEST DATA 300
Time taken: 0.288 seconds, Fetched: 1 row(s)