带有java.lang.ClassCastException的Pig连接:无法将java.lang.String强制转换为java.lang.Integer
问题描述:
我在data1
1 3
1 2
5 1
在data2
2 3
2 4
然后我试图将它们读成猪
I then tried to read them into pig
d1 = LOAD 'data1';
d2 = foreach d1 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int);
d3 = LOAD 'data2' ;
d4 = foreach d3 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int);
data = join d2 by f1, d4 by f2;
然后我得到了
2013-08-04 00:48:26,032 [Thread-21] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:85)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
有人可以帮助我吗?谢谢.
Could anybody help me? Thank you.
答
首先,我将为输入定义一个简单的架构.根据您的示例,我假设您的输入是文本文件.
现在,您将获得ClassCastException,因为不幸的是,仅应用架构(f1:int,f2:int)不会进行任何转换.您需要将STRSPLIT
的输出模式显式转换为(tuple(int,int))
,以便flatten可以从中生成f1:int and f2:int
.即:
First I'd define a simple schema for the inputs. Based on your example I assume that your inputs are text files.
Now you get the ClassCastException because just applying the schema (f1:int, f2:int) unfortunately won't do any conversion. You need to explicitly cast the output schema of STRSPLIT
to (tuple(int,int))
so that flatten can generate f1:int and f2:int
from it. I.e:
d1 = LOAD 'data1' as (line:chararray);
d2 = foreach d1 generate flatten((tuple(int,int))(STRSPLIT($0, ' +')))
as (f1:int,f2:int);
d3 = LOAD 'data2' as (line:chararray);
d4 = foreach d3 generate flatten((tuple(int,int))(STRSPLIT($0, ' +')))
as (f1:int,f2:int);
data = join d2 by f1, d4 by f2;