在Google大查询中按未嵌套的值分组时获取不同值的总和

在Google大查询中按未嵌套的值分组时获取不同值的总和

问题描述:

我正在查询有很多行的google大查询表,但是我感兴趣的行看起来像这样:

I am querying google big query table which has many rows but the ones I am interested in looks like this:

date             fullVisitorId       hits.product.productSKU     hits.product.v2ProductName     hits.transaction.transactionId

20210427        63546815            MM52AF                      panda                           149816182
20210427        65198162            KGSA5A                      giraffe                         321498182

我试图通过计算不同的 hits.transaction.transactionId 来计算总交易额.

I am trying to count the total transactions by counting distinct hits.transaction.transactionId.

with t1 as 
(
SELECT
    DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
    fullVisitorId,
    product.productSKU as sku,
    product.v2ProductName as v2,
    case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
    count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
    count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
    count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
    count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select 
    month,
    product.productSKU as sku,
    product.v2ProductName as v2,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;

哪个返回:

month               sku          v2      pdp            add_cart    conversions     transactions    
2021-04-01          AHBS         615     10146410       365569      46885           46640
2021-03-01          HERD         154     10074095       399483      58162           57811

但是 transactions 是不正确的,我使用以下命令获得了正确的输出:

But transactions is not correct, I get the correct output using this:


with t1 as 
(
SELECT
    DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
    fullVisitorId,
    ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, -- changed this
    ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list, -- changed this
    case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
    0 AS views_impressions,
    count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
    count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
    0 AS add_shortlist,
    count(case when hits.ecommerceaction.action_type = '5' then fullvisitorid else null end) AS checkouts,
    count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
    count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
GROUP BY 1,2,5
)
select 
    month,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;

哪个会返回正确的交易

month       pdp         add_cart     conversions      transactions  
2021-04-01  9978511     396333       46885            30917 
2021-03-01  15101718    568904       58162            23017

但是使用这个:

...
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, 
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
...

不允许我在第二条选择语句中分组或选择 productSKU_list productName_list .

Does not allow me to group or select productSKU_list and productName_list in my second select statement.

我相信这是因为,如果一个订单是由购物篮中的多个商品组成的,则在Google大查询中有多行具有相同的 hits.transaction.transactionId 我尝试通过以下方式确认这一点:

I believe this is because if one order is made with multiple items in the basket there are multiple lines in google big query with the same hits.transaction.transactionId I tried confirming this with:

select distinct(hits.transaction.transactionId), count(distinct hits.transaction.transactionId) as total
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
WHERE _TABLE_SUFFIX between '200101' AND '210428'  
GROUP BY 1
order by 2 desc

但是我得到了

transactionId   total   
ABSAD54         1   
515ABDG         1

因此,我现在迷路了,因为不确定使用第二个脚本还是从第一个查询中注释掉这部分内容时,为什么会得到正确的答案.

So at this point, I am lost, as I am unsure why I get the correct answer if I use the second script or when I comment out this part from the first query.

 --product.productSKU,
 --product.v2ProductName,

有关Google大查询如何工作的任何提示都被接受.

Any tips on how google big query works is accepted.

我的目标是在第二个脚本中获得正确的 transactions 输出,但仍能够分组并具有 product.productSKU product.v2ProductName .

My goal is to have the correct output of transactions which is achieved in the second script but still be able to group and have values of product.productSKU and product.v2ProductName.

在第二个查询中,您需要再次对其进行聚合:

in your second query , you need to aggregate them again :

select 
    month,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt) as transactions
    ,ARRAY_AGG(productSKU_list)
    ,ARRAY_AGG(productName_list)
from t1
group by month
order by month desc;