配置单元外部表指向带有嵌入式双引号的CSV文件

问题描述:

我正在尝试创建一个指向CSV文件的外部Hive表。
我的CSV文件中有一列(col2),该列值中可能包含双引号和逗号。

I am trying to create an external Hive table pointing to a CSV file. My CSV file has a column(col2) that could have double quotes and comma as part of the column value.

每列中的数据:

Col1 : 150
Col2 : BATWING, ABC "D " TEST DATA
Col3 : 300

CSV格式的行:

150,"BATWING, ABC ""D "" TEST DATA",300

创建表DDL :

CREATE EXTERNAL TABLE test (
   col1 INT, 
   col2 STRING, 
   col3 INT) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
ESCAPED BY '"' 
LOCATION 's3://test-folder/test-file.csv'

查询表时,我在col3中看到空值。

When I query the table, I see null values in col3.

创建表时我在这里缺少什么?感谢您提供帮助

What am I missing here while creating the table? Any help is appreciated

使用 OpenCSVSerde 。这是一个示例

创建表格

CREATE TABLE bala (col1 int, col2 string, col3 int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES(
  "separatorChar" = ",", "escapeChar"='\"'
);

加载数据

hive>LOAD DATA INPATH '/../test.csv' INTO TABLE bala
Loading data to table bala
Table testing.bala stats: [numFiles=1, totalSize=40]
OK
Time taken: 0.514 seconds

检查是否已加载

hive> select * from bala;
OK
150 BATWING, ABC "D " TEST DATA 300
Time taken: 0.288 seconds, Fetched: 1 row(s)