如何在一个 Java 应用程序中处理不同版本的 xsd 文件?

如何在一个 Java 应用程序中处理不同版本的 xsd 文件?

问题描述:

在我的 java 应用程序中,我必须同时处理具有不同架构版本(xsd 文件)的 XML 文件.XML 文件的内容在不同版本之间只发生了一点变化,所以我想主要使用相同的代码来处理它,并根据所用架构的版本做一些案例区分.

In my java application I have to handle XML files with different schema versions (xsd files) simultaneously. The content of the XML files changed only a little between the different versions, so I'd like to use mainly the same code to handle it and just do some case distictions dependent on the version of the used schema.

现在我正在使用 SAX 解析器和我自己的 ContentHandler 解析 XML 文件,忽略模式版本,只检查我需要处理的标签是否存在.

Right now I'm parsing the XML files with a SAX parser and my own ContentHandler ignoring the schema version and just checking if the tags I need for processing are present.

我真的很想使用 JAXB 来生成用于解析 XML 文件的类.这样我就可以从我的 java 代码中删除所有硬编码的字符串(常量),并使用生成的类来处理.

I'd really like to use JAXB to generate the classes for parsing the XML files. This way I could remove all the hardcoded strings (constants) from my java code and handle with the generated classes instead.

  • 如何使用 JAXB 以统一的方式处理不同的模式版本?
  • 有更好的解决方案吗?

我将架构版本编译为不同的包 v1、v2 和 v3.现在我可以通过这种方式创建一个 Unmarshaller:

I compiled the schema versions to different packages v1, v2 and v3. Now I can create an Unmarshaller this way:

JAXBContext jc = JAXBContext.newInstance( 
    v1.Root.class, v2.Root.class, v3.Root.class );
Unmarshaller u = jc.createUnmarshaller();

现在 u.unmarshal( xmlInputStream ); 给了我匹配 XML 文件模式的包中的 Root 类.

Now u.unmarshal( xmlInputStream ); gives me the Root class from the package matching the schema of the XML file.

接下来,我将尝试定义一个 interface 来访问模式的公共部分.如果您以前做过类似的事情,请告诉我.与此同时,我正在阅读 JAXB 规范......

Next I'll try to define an interface to access the common parts of the schemas. If you have done something like this before, please let me know. In the mean time I'm reading through the JAXB specs...

首先,您需要某种方式来识别适合特定实例文档的架构.你说文档有一个 schemaLocation 属性,所以这是一种解决方案.但是请注意,您必须专门配置解析器才能使用此属性,恶意文档可能会指定您无法控制的架构位置.相反,我建议获取属性值,并使用它在内部表中查找适当的架构.

First, you need some way to identify the schema appropriate for the particular instance document. You say that the documents have a schemaLocation attribute, so this is one solution. Note, however, that you have to specifically configure the parser to use this attribute, and a malicious document could specify a schema location that you don't control. Instead, I'd recommend getting the attribute value, and using it to find the appropriate schema in an internal table.

接下来是访问数据.您没有说明为什么要使用三种不同的模式.唯一合理的原因是不断发展的数据规范(即模式代表相同数据的版本 1、2 和 3).如果这不是您的原因,那么您需要重新考虑您的设计.

Next is access to the data. You don't say why you're using three different schemas. The only rational reason is an evolving data spec (ie, the schemas represent versions 1, 2, and 3 of the same data). If that's not your reason, then you need to rethink your design.

如果您试图支持不断发展的数据规范,那么您需要回答我如何处理丢失的数据"这个问题.对此有几个答案:一个是维护代码的多个版本.通过重构通用功能,这不是一个坏主意,但它很容易变得无法维护.

If you are trying to support an evolving data spec, then you need to answer the question "how do I deal with data that's missing." There are a couple of answers to this: one is to maintain multiple versions of the code. With refactoring of common functionality, this is not a bad idea, but it can easily become unmaintainable.

另一种方法是使用单个代码库,以及某种适配器 对象,其中包含您的规则.如果沿着这条路走下去,JAXB 是错误的解决方案,因为它与模式相关联.您也许可以使用宽松的 XML->Java 转换器:我相信 XStream 会起作用,而且我知道Practical XML 的 1.1 版本会起作用(因为我写了它)——尽管你必须自己构建.

The alternative is to use a single codebase, and some sort of adapter object that incorporates your rules. And if you go down this path, JAXB is the wrong solution, since it is tied to a schema. You might be able to use a permissive XML->Java converter: I believe XStream will work, and I know that the 1.1 release of Practical XML will work (since I wrote it) -- although you'd have to build it yourself.

另一种更好的选择,取决于模式的复杂性,是开发一组使用 XPath 检索数据的对象.我可能会使用主"对象来实现,该对象包含模式的每个变体中每个字段的 XPath 表达式.然后创建轻量级的包装器"对象来保存实例文档的 DOM 版本,并使用适合于模式的 XPath.但是请注意,这仅限于只读访问.

Another, better alternative, depending on the complexity of the schema, is to develop a set of objects that use XPath to retrieve the data. I would probably implement using a "master" object that contains XPath expressions for every field, in every variant of the schema. Then create lightweight "wrapper" objects that hold a DOM version of your instance document, and use the XPath appropriate to the schema. Note, however, that this is limited tor read-only access.