如何转置 PIG 中的列和行

如何转置 PIG 中的列和行

问题描述:

我不确定这是否可以使用内置的 PIG 脚本来完成,或者我需要编写一个 UDF.但我基本上有一个表格,我只是想在其中转置数据.

I'm not sure if this can be done with builtin PIG scripts or I'll need to code a UDF. But I have essentially a table where I simply want to transpose the data.

简单地说,给定:

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)
(11, 12, 13, 14, 15)
 ... 300 plus more tuples

我会得到:

(1,6,11,...) -> goes on for a few hundred more
(2,7,12,...)
(3,8,13,...)
(4,9,14,...)
(5,10,15,...)

关于如何实现这一目标的任何建议?

Any suggestions on how I could accomplish this?

这对 Pig 来说是不可能的,对它来说也没有多大意义.请记住,关系是一组元组,根据定义,不能保证一个包的元组按任何特定顺序排列.你可以从

This is not possible with Pig, nor does it make much sense for it to be. Remember that a relation is a bag of tuples, and by definition, a bag is not guaranteed to have its tuples in any specific order. You might start with

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)
(11, 12, 13, 14, 15)

但从猪的角度来看,这和

but from Pig's perspective there is no difference between this and

(11, 12, 13, 14, 15)
(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)

这意味着转置"定义不明确.这样看——如果你转置两次,你最终应该得到相同的数据结构,但是因为元组可以在此过程中重新排序,所以不能保证会发生这种情况.

which means that "transpose" is ill-defined. Look at it this way -- if you transpose twice, you should end up with the same data structure back, but because the tuples can be reordered along the way, this is not guaranteed to happen.

最后,如果你真的必须做矩阵运算,你最好使用一个同时尊重行和列顺序的工具.

In the end, if you really must do matrix operations, you would be better off using a tool that respects ordering in both rows and columns.

也就是说,你想完成什么?

That said, what are you trying to accomplish?