我可以按/存储通过"CREATE TABLE AS SELECT ....."创建的表进行聚类吗?在蜂巢?

问题描述：

我正在尝试在Hive中创建一个表

I am trying to create a table in Hive

CREATE TABLE BUCKET_TABLE AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll
CLUSTERED BY (key) INTO 1000 BUCKETS;

此语法失败-但我不确定是否有可能执行此组合语句.有任何想法吗?

This syntax is failing - but I am not sure if it is even possible to do this combined statement. Any ideas?

答

在这个问题上出现问题，发现没有提供答案.我进一步看了一下，并在Hive文档中找到了答案.

Came across this question and saw there was no answer provided. I looked further and found the answer in the Hive documentation.

由于对CTAS的以下限制，这永远不会起作用:

This will never work, because of the following restrictions on CTAS:

目标表不能是分区表.
目标表不能是外部表.
目标表不能是列表存储表.

来源: https://cwiki.apache.org/confluence/display/Hive/LanguageManual + DDL#LanguageManualDDL-CreateTableAsSelect％28CTAS

此外 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

创建[临时] [外部]表[如果不存在] [数据库名称.]表名称
...
[由(col_name，col_name，...)聚类[由(col_name [ASC | DESC]，...)排序]]放入num_buckets BUCKETS]
...
[AS select_statement];

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
...
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
...
[AS select_statement];

聚簇要求先定义该列，然后cfg转到As select_statement，因此这时是不可能的.

Clustering requires the column to be defined and then the cfg goes to the As select_statement Therefore at this time it is not possible.

(可选)您可以更改表并添加存储桶，但这不会更改现有数据.

Optionally, you can ALTER the table and add buckets, but this does not change existing data.

CREATE TABLE BUCKET_TABLE 
STORED AS ORC AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll limit 0;
ALTER TABLE BUCKET_TABLE CLUSTERED BY (key) INTO 1000 BUCKETS;
ALTER TABLE BUCKET_TABLE SET TBLPROPERTIES ('transactional'='true');
INSERT INTO BUCKET_TABLE 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll;

我可以按/存储通过"CREATE TABLE AS SELECT ....."创建的表进行聚类吗?在蜂巢?

相关推荐