com.google.cloud.dataflow.sdk.coders.CoderException:无法编码空字符串
我在Google Cloud Data Flow中遇到以下错误:
I got the following error in Google Cloud Data Flow:
java.lang.RuntimeException:com.google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:com.google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:com .google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:java.lang.RuntimeException:com.google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:java.lang.RuntimeException :com.google.cloud.dataflow.sdk.coders.CoderException:无法在com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn $ 1.output(SimpleParDoFn.java:162)处对空字符串进行编码 com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287)在 com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnProcessContext.output(DoFnRunnerBase.java:449)在 report.transforms.JsonToObject.processElement(JsonToObject.java:35)
java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.coders.CoderException: cannot encode a null String at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:449) at reports.transforms.JsonToObject.processElement(JsonToObject.java:35)
原因:com.google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:com.google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:com.google. cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:java.lang.RuntimeException:com.google.cloud.dataflow.sdk.util.UserCodeException:java.lang.RuntimeException:java.lang.RuntimeException:com. google.cloud.dataflow.sdk.coders.CoderException:无法在com.google.cloud.dataflow.sdk的com.google.cloud.dataflow.sdk.util.UserCodeException.wrap(UserCodeException.java:35)处对空字符串进行编码com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner上的.util.UserCodeException.wrapIf(UserCodeException.java:40)在com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:368)在com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner com.google.cloud.dataflow.sdk.runners.worker的com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)的.invokeProcessElement(SimpleDoFnRunner.java:51). com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)的com.google.cloud.dataflow.sdk.runners.worker的SimpleParDoFn.processElement(SimpleParDoFn.java:190).位于com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)的DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47) com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)在 com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn $ 1.output(SimpleParDoFn.java:160)在 com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287)在 com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnProcessContext.output(DoFnRunnerBase.java:449)在 reports.transforms.JsonToObject.processElement(JsonToObject.java:35)位于 com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)在 com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
Caused by: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.coders.CoderException: cannot encode a null String at com.google.cloud.dataflow.sdk.util.UserCodeException.wrap(UserCodeException.java:35) at com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:40) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:368) at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:51) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47) at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:160) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:449) at reports.transforms.JsonToObject.processElement(JsonToObject.java:35) at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
在我的班级(JsonToObject)中,我执行以下操作:
In my class (JsonToObject) I do the following:
if(obj!= null){ processContext.output(obj); }
if (obj != null) { processContext.output(obj); }
还有引发异常的地方.
知道为什么会发生吗?
@DefaultCoder 是与复合编码器(KvCoder,IterableCoder等)不兼容,因为此要求由另一个编码器进行参数化.解决您的问题的一种方法是在每个可能包含可空类型的PCollection上设置编码器.例如:
NullableCoder is a composite coder which requires it to be specified in terms of another coder. @DefaultCoder is incompatible with composite coders (KvCoder, IterableCoder, ...) because of this requirement to be parameterized by another coder. One way to solve your problem is to set the coder on each PCollection that may contain nullable types manually. For example:
PCollection<String> pc = pipeline.apply(... transform that produces nulls ...);
pc.setCoder(NullableCoder.of(StringUtf8Coder.of());