在Google大查询中按未嵌套的值分组时获取不同值的总和
我正在查询有很多行的google大查询表,但是我感兴趣的行看起来像这样:
I am querying google big query table which has many rows but the ones I am interested in looks like this:
date fullVisitorId hits.product.productSKU hits.product.v2ProductName hits.transaction.transactionId
20210427 63546815 MM52AF panda 149816182
20210427 65198162 KGSA5A giraffe 321498182
我试图通过计算不同的 hits.transaction.transactionId
来计算总交易额.
I am trying to count the total transactions by counting distinct hits.transaction.transactionId
.
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
product.productSKU as sku,
product.v2ProductName as v2,
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
product.productSKU as sku,
product.v2ProductName as v2,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;
哪个返回:
month sku v2 pdp add_cart conversions transactions
2021-04-01 AHBS 615 10146410 365569 46885 46640
2021-03-01 HERD 154 10074095 399483 58162 57811
但是 transactions
是不正确的,我使用以下命令获得了正确的输出:
But transactions
is not correct, I get the correct output using this:
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, -- changed this
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list, -- changed this
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
0 AS views_impressions,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
0 AS add_shortlist,
count(case when hits.ecommerceaction.action_type = '5' then fullvisitorid else null end) AS checkouts,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;
哪个会返回正确的交易
month pdp add_cart conversions transactions
2021-04-01 9978511 396333 46885 30917
2021-03-01 15101718 568904 58162 23017
但是使用这个:
...
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list,
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
...
不允许我在第二条选择语句中分组或选择 productSKU_list
和 productName_list
.
Does not allow me to group or select productSKU_list
and productName_list
in my second select statement.
我相信这是因为,如果一个订单是由购物篮中的多个商品组成的,则在Google大查询中有多行具有相同的 hits.transaction.transactionId
我尝试通过以下方式确认这一点:
I believe this is because if one order is made with multiple items in the basket there are multiple lines in google big query with the same hits.transaction.transactionId
I tried confirming this with:
select distinct(hits.transaction.transactionId), count(distinct hits.transaction.transactionId) as total
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
WHERE _TABLE_SUFFIX between '200101' AND '210428'
GROUP BY 1
order by 2 desc
但是我得到了
transactionId total
ABSAD54 1
515ABDG 1
因此,我现在迷路了,因为不确定使用第二个脚本还是从第一个查询中注释掉这部分内容时,为什么会得到正确的答案.
So at this point, I am lost, as I am unsure why I get the correct answer if I use the second script or when I comment out this part from the first query.
--product.productSKU,
--product.v2ProductName,
有关Google大查询如何工作的任何提示都被接受.
Any tips on how google big query works is accepted.
我的目标是在第二个脚本中获得正确的 transactions
输出,但仍能够分组并具有 product.productSKU
和 product.v2ProductName
.
My goal is to have the correct output of transactions
which is achieved in the second script but still be able to group and have values of product.productSKU
and product.v2ProductName
.
在第二个查询中,您需要再次对其进行聚合:
in your second query , you need to aggregate them again :
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
,ARRAY_AGG(productSKU_list)
,ARRAY_AGG(productName_list)
from t1
group by month
order by month desc;