我可以按/存储通过"CREATE TABLE AS SELECT ....."创建的表进行聚类吗?在蜂巢?
我正在尝试在Hive中创建一个表
I am trying to create a table in Hive
CREATE TABLE BUCKET_TABLE AS
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll
CLUSTERED BY (key) INTO 1000 BUCKETS;
此语法失败-但我不确定是否有可能执行此组合语句.有任何想法吗?
This syntax is failing - but I am not sure if it is even possible to do this combined statement. Any ideas?
在这个问题上出现问题,发现没有提供答案.我进一步看了一下,并在Hive文档中找到了答案.
Came across this question and saw there was no answer provided. I looked further and found the answer in the Hive documentation.
由于对CTAS的以下限制,这永远不会起作用:
This will never work, because of the following restrictions on CTAS:
- 目标表不能是分区表.
- 目标表不能是外部表.
- 目标表不能是列表存储表.
此外 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
创建[临时] [外部]表[如果不存在] [数据库名称.]表名称
...
[由(col_name,col_name,...)聚类[由(col_name [ASC | DESC],...)排序]]放入num_buckets BUCKETS]
...
[AS select_statement];
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
...
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
...
[AS select_statement];
聚簇要求先定义该列,然后cfg转到As select_statement,因此这时是不可能的.
Clustering requires the column to be defined and then the cfg goes to the As select_statement Therefore at this time it is not possible.
(可选)您可以更改表并添加存储桶,但这不会更改现有数据.
Optionally, you can ALTER the table and add buckets, but this does not change existing data.
CREATE TABLE BUCKET_TABLE
STORED AS ORC AS
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll limit 0;
ALTER TABLE BUCKET_TABLE CLUSTERED BY (key) INTO 1000 BUCKETS;
ALTER TABLE BUCKET_TABLE SET TBLPROPERTIES ('transactional'='true');
INSERT INTO BUCKET_TABLE
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll;