如何在Google Cloud Dataflow中编码可为空的对象?
这篇文章旨在回答以下问题:
This post is intended to answer questions like the following:
- 哪些内置
Coder
支持空值? - 如何编码可为空的对象?
- 具有可为空字段的类怎么办?
- 关于带有
null
条目的集合呢?
- Which built-in
Coder
s support nullable values? - How can I encode nullable objects?
- What about classes with nullable fields?
- What about collections with
null
entries?
Some of the default Coders do not support null
values, often for efficiency. For example, DoubleCoder
always encodes a double
using 8 bytes; adding a bit to reflect whether the double
is null
would add a (padded) 9th byte to all non-null
values.
可以使用以下概述的技术对可为空的值进行编码.
It is possible to encode nullable values using the techniques outlined below.
We generally recommend using
AvroCoder
to encode classes.AvroCoder
has support for nullable fields annotated with theorg.apache.avro.reflect.Nullable
annotation:
@DefaultCoder(AvroCoder.class)
class MyClass {
@Nullable String nullableField;
}
请参见 TrafficMaxLaneFlow 以获得更完整的代码示例.
See the TrafficMaxLaneFlow for a more complete code example.
AvroCoder
还支持在Union
中包含Null
的字段.
AvroCoder
also supports fields that include Null
in a Union
.
We recommend using NullableCoder
to encode nullable objects themselves. This implements the strategy in #1.
例如,考虑以下工作代码:
For example, consider the following working code:
PCollection<String> output =
p.apply(Create.of(null, "test1", null, "test2", null)
.withCoder(NullableCoder.of(String.class)));
只要嵌套编码器支持null
字段/对象,
嵌套的null
字段/对象就受到许多编码器的支持.
Nested null
fields/objects are supported by many coders, as long as the nested coder supports null
fields/objects.
例如,SDK应该能够使用默认的CoderRegistry
来为List<MyClass>
推断有效的编码器-它应该自动使用带有嵌套AvroCoder
的ListCoder
.
For example, the SDK should be able to infer a working coder using the default CoderRegistry
for a List<MyClass>
-- it should automatically use a ListCoder
with a nested AvroCoder
.
类似地,可以使用编码器对可能包含null
项的List<String>
进行编码:
Similarly, a List<String>
with possibly-null
entries can be encoded with the Coder:
Coder<List<String>> coder = ListCoder.of(NullableCoder.of(String.class))
最后,在某些情况下,编码器必须是确定性的,例如,用于GroupByKey
的密钥.在AvroCoder
中,只要基本类型的Coder
本身是确定性的,就对@Nullable
字段进行确定性编码.同样,使用NullableCoder
不应影响是否可以确定性地编码对象.
Finally, in some cases Coders must be deterministic, e.g., the key used for GroupByKey
. In AvroCoder
, the @Nullable
fields are coded deterministically as long as the Coder
for the base type is itself deterministic. Similarly, using NullableCoder
should not affect whether an object can be encoded deterministically.