如何在PIG中转置列和行

问题描述:

我不确定是否可以使用内置的PIG脚本来完成此操作,或者我需要编写UDF代码.但是我实际上有一个表,我只想在其中转置数据.

I'm not sure if this can be done with builtin PIG scripts or I'll need to code a UDF. But I have essentially a table where I simply want to transpose the data.

简单放置,给出:

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)
(11, 12, 13, 14, 15)
 ... 300 plus more tuples

我最终会得到:

(1,6,11,...) -> goes on for a few hundred more
(2,7,12,...)
(3,8,13,...)
(4,9,14,...)
(5,10,15,...)

关于如何实现此目标的任何建议?

Any suggestions on how I could accomplish this?

Pig不可能做到这一点,也没有太大意义.请记住,关系是一包元组,根据定义,不保证包中的元组具有任何特定顺序.您可以从

This is not possible with Pig, nor does it make much sense for it to be. Remember that a relation is a bag of tuples, and by definition, a bag is not guaranteed to have its tuples in any specific order. You might start with

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)
(11, 12, 13, 14, 15)

但是从Pig的角度来看,这和

but from Pig's perspective there is no difference between this and

(11, 12, 13, 14, 15)
(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)

这意味着转置"定义不正确.这样看待-如果两次换位,最终应该返回相同的数据结构,但是由于元组可以沿途重新排序,因此不能保证会发生这种情况.

which means that "transpose" is ill-defined. Look at it this way -- if you transpose twice, you should end up with the same data structure back, but because the tuples can be reordered along the way, this is not guaranteed to happen.

最后,如果您确实必须执行矩阵运算,那么最好使用同时尊重行和列顺序的工具.

In the end, if you really must do matrix operations, you would be better off using a tool that respects ordering in both rows and columns.

也就是说,您要完成什么?

That said, what are you trying to accomplish?