Python:使用 ElementTree 更新 XML 文件,同时尽可能地保留布局
我有一个使用 XML 命名空间的文档,我想将其 /group/house/dogs
增加一:(文件名为 houses.xml
)
I have a document which uses an XML namespace for which I want to increase /group/house/dogs
by one: (the file is called houses.xml
)
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>2</dogs>
</house>
</group>
我当前使用以下代码的结果是:(创建的文件名为houses2.xml
)
My current result using the code below is: (the created file is called houses2.xml
)
<ns0:group xmlns:ns0="http://dogs.house.local">
<ns0:house>
<ns0:id>2821</ns0:id>
<ns0:dogs>3</ns0:dogs>
</ns0:house>
</ns0:group>
我想解决两件事(如果可以使用 ElementTree.如果不是,我很乐意提供关于我应该使用什么的建议):
I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):
- 我想保留
行.
- 我不想为所有标签添加前缀,我想保持原样.
总而言之,我不想过多地弄乱文档.
我当前的代码(除了上述缺陷外都有效)生成上述结果如下.
My current code (which works except for the above mentioned flaws) generating the above result follows.
我制作了一个实用函数,它使用 ElementTree 加载一个 XML 文件并返回 elementTree 和命名空间(因为我不想对命名空间进行硬编码,并且愿意承担它所暗示的风险):
I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):
def elementTreeRootAndNamespace(xml_file):
from xml.etree import ElementTree
import re
element_tree = ElementTree.parse(xml_file)
# Search for a namespace on the root tag
namespace_search = re.search('^({\S+})', element_tree.getroot().tag)
# Keep the namespace empty if none exists, if a namespace exists set
# namespace to {namespacename}
namespace = ''
if namespace_search:
namespace = namespace_search.group(1)
return element_tree, namespace
这是我更新狗的数量并将其保存到新文件houses2.xml
的代码:
This is my code to update the number of dogs and save it to the new file houses2.xml
:
elementTree, namespace = elementTreeRootAndNamespace('houses.xml')
# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))
# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)
# Write the result to the new file houses2.xml.
elementTree.write('houses2.xml')
不幸的是,往返并不是一个小问题.对于 XML,除非您使用特殊的解析器(例如 DecentXML 但那是针对 Java 的).
Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).
根据您的需要,您有以下选择:
Depending on your needs, you have the following options:
如果您控制源代码并且可以通过单元测试保护您的代码,您就可以编写自己的简单解析器.此解析器不接受 XML,而只接受有限的子集.例如,您可以将整个文档作为字符串读取,然后使用 Python 的字符串操作来定位
并替换下一个之前的任何内容.黑客?是的.
If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate
<dogs>
and replace anything up to the next<
. Hack? Yes.
您可以过滤输出.XML 只允许字符串 <ns0:
出现在一处,因此您可以使用 <
搜索并替换它,然后使用 <group xmlns 搜索和替换它:ns0="
→
You can filter the output. XML allows the string <ns0:
only in one place, so you can search&replace it with <
and then the same with <group xmlns:ns0="
→ <group xmlns="
. This is pretty safe unless you can have CDATA in your XML.
您可以编写自己的简单 XML 解析器.将输入作为字符串读取,然后为每对 加上它们在输入中的位置创建元素.这使您可以快速拆分输入,但仅适用于小输入.
You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <>
plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.