带有空元素键的Jackson JsonNode
我正在使用jackson-dataformat-xml(2.9)将XML解析为JsonNode,然后将其解析为JSON(XML是动态的,所以这就是为什么我使用JsonNode而不是绑定到POJO的原因.例如'elementName'和'id'名称可能会有所不同.)
I am using jackson-dataformat-xml (2.9) to parse an XML into JsonNode and then parse it to JSON (the XML is very dynamic so that is why I am using JsonNode instead of binding to a POJO. e.g 'elementName' and 'id' names may vary).
碰巧,在JSON解析阶段,元素键之一是空字符串(").
It happens that during the JSON parsing phase, one of the element keys is empty string ("").
XML:
<elementName>
<id type="pid">abcdef123</id>
</elementName>
解析逻辑:
public Parser() {
ObjectMapper jsonMapper = new ObjectMapper();
XmlMapper xmlMapper = new XmlMapper(new XmlFactory(new WstxInputFactory()));
}
public InputStream parseXmlResponse(InputStream xmlStream) {
InputStream stream = null;
try {
JsonNode node = xmlMapper.readTree(xmlStream);
stream = new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
} catch (IOException e) {
e.printStackTrace();
}
return stream;
}
Json:
结果:
{
"elementName": {
"id": {
"type": "pid",
"": "abcdef123"
}
},
}
预期:
{
"elementName": {
"id": {
"type": "pid",
"value": "abcdef123"
}
},
}
我的想法是找到具有空键"的任何内容,并将其替换为值".在XML反序列化过程中或在JSON序列化过程中.我曾尝试使用默认的序列化程序,过滤器,但没有以一种简洁的方式使其正常工作.
My idea is to find whenever I have the empty key "" and replace it with "value". Either at XML de-serialization or during JSON serialization. I have tried to use default serializer, filter, but haven't got it working in a nice and concise way.
建议深表感谢.
谢谢您的帮助.
基于@shoek的建议,我决定编写一个自定义序列化程序,以避免在此过程中创建中间对象(ObjectNode).
Based on @shoek suggestion I decided to write a custom serializer to avoid creating an intermediate object (ObjectNode) during the process.
基于@shoek提出的相同解决方案进行重构.
edit: refactor based on the same solution proposed by @shoek.
public class CustomNode {
private JsonNode jsonNode;
public CustomNode(JsonNode jsonNode) {
this.jsonNode = jsonNode;
}
public JsonNode getJsonNode() {
return jsonNode;
}
}
public class CustomObjectsResponseSerializer extends StdSerializer<CustomNode> {
protected CustomObjectsResponseSerializer() {
super(CustomNode.class);
}
@Override
public void serialize(CustomNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
convertObjectNode(node.getJsonNode(), jgen, provider);
}
private void convertObjectNode(JsonNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
jgen.writeStartObject();
for (Iterator<String> it = node.fieldNames(); it.hasNext(); ) {
String childName = it.next();
JsonNode childNode = node.get(childName);
// XML parser returns an empty string as value name. Replacing it with "value"
if (Objects.equals("", childName)) {
childName = "value";
}
if (childNode instanceof ArrayNode) {
jgen.writeFieldName(childName);
convertArrayNode(childNode, jgen, provider);
} else if (childNode instanceof ObjectNode) {
jgen.writeFieldName(childName);
convertObjectNode(childNode, jgen, provider);
} else {
provider.defaultSerializeField(childName, childNode, jgen);
}
}
jgen.writeEndObject();
}
private void convertArrayNode(JsonNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
jgen.writeStartArray();
for (Iterator<JsonNode> it = node.elements(); it.hasNext(); ) {
JsonNode childNode = it.next();
if (childNode instanceof ArrayNode) {
convertArrayNode(childNode, jgen, provider);
} else if (childNode instanceof ObjectNode) {
convertObjectNode(childNode, jgen, provider);
} else {
provider.defaultSerializeValue(childNode, jgen);
}
}
jgen.writeEndArray();
}
}
您还可以简单地后处理 JSON DOM,遍历所有对象,然后重命名为空的键字符串为值" .
You also could simply post-process the JSON DOM, traverse to all objects, and rename the keys that are empty strings to "value".
种族条件:这样的密钥可能已经存在,并且不能被覆盖
(例如< id type ="pid" value ="existing"> abcdef123</id>
).
Race condition: such a key may already exist, and must not be overwritten
(e.g. <id type="pid" value="existing">abcdef123</id>
).
用法:
(注意:您不应默默抑制异常并返回null,但应传播该异常,以便调用方可以决定是否需要捕获并应用故障转移逻辑)
public InputStream parseXmlResponse(InputStream xmlStream) throws IOException {
JsonNode node = xmlMapper.readTree(xmlStream);
postprocess(node);
return new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
}
后处理:
private void postprocess(JsonNode jsonNode) {
if (jsonNode.isArray()) {
ArrayNode array = (ArrayNode) jsonNode;
Iterable<JsonNode> elements = () -> array.elements();
// recursive post-processing
for (JsonNode element : elements) {
postprocess(element);
}
}
if (jsonNode.isObject()) {
ObjectNode object = (ObjectNode) jsonNode;
Iterable<String> fieldNames = () -> object.fieldNames();
// recursive post-processing
for (String fieldName : fieldNames) {
postprocess(object.get(fieldName));
}
// check if an attribute with empty string key exists, and rename it to 'value',
// unless there already exists another non-null attribute named 'value' which
// would be overwritten.
JsonNode emptyKeyValue = object.get("");
JsonNode existing = object.get("value");
if (emptyKeyValue != null) {
if (existing == null || existing.isNull()) {
object.set("value", emptyKeyValue);
object.remove("");
} else {
System.err.println("Skipping empty key value as a key named 'value' already exists.");
}
}
}
}
输出:与预期的一样.
{
"elementName": {
"id": {
"type": "pid",
"value": "abcdef123"
}
},
}
关于性能的注意事项:
我对一个大型XML文件( enwikiquote-20200520-pages-articles-multistream.xml
,en.wikiquote XML转储, 498.4 MB )进行了测试,100四舍五入,并具有以下测量时间(使用带有 System.nanoTime()
的增量):
I did a test with a large XML file (enwikiquote-20200520-pages-articles-multistream.xml
, en.wikiquote XML dump, 498.4 MB), 100 rounds, with following measured times (using deltas with System.nanoTime()
):
- 平均读取时间(文件,SSD): 2870.96 ms
(JsonNode节点= xmlMapper.readTree(xmlStream);
) - 平均后处理时间: 0.04 ms
(postprocess(node);
) - 平均写入时间(内存): 0.31 ms
(新的ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
代码>)
- average read time (File, SSD): 2870.96 ms
(JsonNode node = xmlMapper.readTree(xmlStream);
) - average postprocessing time: 0.04 ms
(postprocess(node);
) - average write time (memory): 0.31 ms
(new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
)
对于从约500 MB的文件构建对象树来说,这只是一毫秒的时间-因此性能非常出色,无需担心.
That's a fraction of a millisecond for an object tree build from a ~500 MB file - so performance is excellent and no concern.