使用 Saxon-HE 和 C# 在 XQuery 中查找所有 XPath

问题描述:

我有一个 XML 架构定义(架构"),其中包括多个其他 XSD,它们都在同一个命名空间中.其中一些从外部命名空间导入其他 XSD.总而言之,模式声明了几个可以实例化为 XML 文档的全局元素.我们称它们为 Global_1Global_2Global_3.

I have an XML schema definition ("the schema") that includes several other XSDs, all in the same namespace. Some of those import other XSDs from foreign namespaces. All in all, the schema declares several global elements that can be instantiated as XML documents. Let's call them Global_1, Global_2 and Global_3.

模式由定义业务规则"的 Schematron 文件扩充.它定义了许多抽象规则,每个抽象规则都包含许多使用通过 XSD 定义的数据模型的断言.例如:

The schema is augmented by a Schematron file that defines the "business rules". It defines a number of abstract rules, and each abstract rule contains a number of assertions using the data model defined via XSD. For instance:

<sch:pattern>
    <sch:rule id="rule_A" abstract="true">
        <sch:assert test="if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true()" id="A-01">Error message</sch:assert>
        <sch:assert test="not(abc:c = 'abcd' and abc:d = 'zz')" id="A-02">Some other error message</sch:assert>
    </sch:rule>
<!-- (...) -->
</sch:pattern>

每个抽象规则都由一个或多个非抽象(具体)规则扩展,这些规则定义了要验证抽象规则断言的特定上下文.例如:

Each abstract rule is extended by one or more non-abstract (concrete) rule that defines a specific context in which the abstract rule's assertions are to be validated. For example:

<sch:pattern>
    <!-- (...) -->
    <sch:rule context="abc:Global_1/abc:x/abc:y">
        <sch:extends rule="rule_A"/>
    </sch:rule>
    <sch:rule context="abc:Global_2/abc:j//abc:k/abc:l">
        <sch:extends rule="rule_A"/>
    </sch:rule>
    <!-- (...) -->
</sch:pattern>

换句话说,抽象rule_A中定义的所有断言都被应用于它们的特定上下文.

In other words, all the assertions defined within the abstract rule_A are being applied to their specific contexts.

两者都是架构"和商业规则"可能会发生变化 - 我的程序在运行时获取它们,而我在设计时不知道它们的内容.我唯一可以安全地假设的是,模式中没有无穷无尽的递归结构:每种类型总是有一个明确的叶节点,并且没有任何类型包含它自己.换句话说,不存在无限循环".可能的情况.

Both "the schema" and "the business rules" are subject to change - my program gets them at run-time and I don't know their content at design-time. The only thing I can safely assume is that there are no endless recursive structures in the schema: There is always one definite leaf node for every type and no type contains itself. Put differently, there are no "infinite loops" possible in the instances.

基本上,我想以编程方式评估每个定义的规则是否正确.由于正确性可能是一个相当有问题的话题,这里的正确性我只是指:规则中使用的每个 XPath(即其上下文及其继承的断言的 XQuery 中)都是可能的",这意味着它可以根据模式中定义的数据模型. 例如,如果忘记命名空间前缀(abc:a/b 而不是 abc:a/abc:b>),此 XPath 将永远不会返回空节点集以外的任何内容.如果 XPath 中的一个步骤被意外遗漏或拼写错误等,情况也是如此.这显然不是对正确性"的强烈要求;这样的规则,但它会做第一步.

Basically, I want to evaluate programmatically if each of the defined rules is correct. Since correctness can be quite a problematic topic, here by correctness I simply mean: Each XPath used in a rule (i.e. its context and within the XQueries of its inherited assertions) is "possible", meaning it can exist according to the data model defined in the schema. If, for instance, a namespace prefix is forgotten (abc:a/b instead of abc:a/abc:b), this XPath will never return anything other than an empty node set. The same is true if one step in the XPath is accidentally omitted, or spelled wrong, etc. This is obviously not a very strong claim for "correctness" of such a rule, but it'll do for a first step.

至少对我来说,评估为模式的实例设计的 XPath(更不用说整个 XQuery!)似乎不是一个微不足道的问题,给定它如何包含像 //ancestor::sibling:: 等轴步骤. 所以我决定构建一些我会称之为的东西最大实例":通过递归遍历所有全局元素及其子元素(以及它们各自复杂类型的结构等),我在运行时构建了一个 XML 实例,其中包含所有可能的元素和属性在正常实例中的位置,但同时发生.所以每个可选元素/属性,选择块中的每个元素等等.所以,最大实例看起来像这样:

At least to me it doesn't seem like a trivial problem to evaluate an XPath (not to speak of the entire XQuery!) designed for the instance of a schema against the actual schema, given how it may contain axis steps like //, ancestor::, sibling::, etc. So I decided to construct something I would call a "maximum instance": By recursively iterating through all global elements and their children (and the structure of their respective complex types etc.), I build an XML instance at run-time that contains every possible element and attribute where it would be in the normal instance, but all at once. So every optional element/attribute, every element within a choice block and so on. So, said maximum instance would look something like this:

<maximumInstance>
    <Global_1>
        <abc:a>
            <abc:b additionalAttribute="some_fixed_value">
                <abc:j/>
                <abc:k/>
                <abc:l/>
            </abc:b>
        </abc:a>
    </Global_1>
    <Global_2>
        <abc:x>
            <abc:y>
                <abc:a/>
                <abc:z>
                    <abc:l/>
                </abc:z>
            </abc:y>
        </abc:x>
    </Global_2>
    <Global_3>
        <!-- ... -->
    </Global_3>
    <!-- ... -->
</maximumInstance>

现在只需要迭代所有抽象规则:并且对于每个抽象规则中的每个断言,必须检查对于每个上下文,相应的抽象规则被扩展,断言中的每个 XPath 都会导致非空针对最大实例评估时设置的节点.

All it takes now is to iterate over all abstract rules: And for every assertion in each abstract rule it must be checked that for every context the respective abstract rule is extended by, every XPath within an assertion results in a non-empty node set when evaluated against the maximum instance.

我编写了一个 C# (.NET Framework 4.8) 程序来解析架构"进入所说的最大实例"(这是运行时的 XDocument).它还将业务规则解析为一种结构,以便轻松获取每个抽象规则、其断言以及要验证这些断言的上下文.

I have written a C# (.NET Framework 4.8) program that parses "the schema" into said "maximum instance" (which is an XDocument at run-time). It also parses the business rules into a structure that makes it easy to get each abstract rule, its assertions, and the contexts these assertions are to be validated against.

但目前,我只有每个完整的 XQuery(就像它们在 Schematron 文件中一样),它有效地创建了一个断言.但我实际上需要将 XQuery 分解成它的组件(我想我需要抽象语法树)这样我就可以拥有所有单独的 XPath.例如,当给定 XQuery if (abc:a/abc:b = '123') 然后 abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true(),我需要检索 abc:a/abc:babc:x/abc:y.

But currently, I only have each complete XQuery (just like they are in the Schematron file) which effectively creates an assertion. But I actually need to break the XQuery down into its components (I guess I'd need the abstract syntax tree) so that I would have all individual XPaths. For instance, when given the XQuery if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true(), I would need to retrieve abc:a/abc:b and abc:x/abc:y.

我认为这可以使用 Saxon-HE 来完成(或者可能是另一个目前可用于 C# 的解析器/编译器,我不知道).不幸的是,我还没有理解如何充分利用 Saxon,甚至至少为我想要实现的目标找到一个有效的起点.我一直在尝试使用看似可通过 XQueryExecutable 访问的抽象语法树(因此我可以访问 XQuery 中的相应 XPath):

I assume that this could be done using Saxon-HE (or maybe another Parser/Compiler currently available for C# I don't know about). Unfortunately, I have yet to understand how to make use of Saxon well enough to even find at least a valid starting point for what I want to achieve. I've been trying to use the abstract syntax tree (so I can access the respective XPaths in the XQuery) seemingly accessible via XQueryExecutable:

Processor processor = new Processor();
XQueryCompiler xqueryCompiler = processor.NewXQueryCompiler();
XQueryExecutable exe = xqueryCompiler.Compile(xquery);
var AST = exe.getUnderlyingCompiledQuery();

var st = new XDocument();
st.Add(new XElement("root"));
XdmNode node = processor.NewDocumentBuilder().Build(st.CreateReader());            
AST.explain((node); // <-- this is an error!

但这并没有让我去任何地方:我没有发现任何可以使用的公开属性?虽然 VS 为我提供了使用 AST.explain(...) (这似乎很有希望),但我无法弄清楚在这里参数化什么.我尝试使用我认为是 Destination 的 XdmNode?而且,我正在使用 Saxon 10(通过 NuGet),而 Destination 似乎来自 Saxon 9:net.sf.saxon.s9api.Destination?!

But that doesn't get me anywhere: I don't find any properties exposed I could work with? And while VS offers me to use AST.explain(...) (which seems promising), I'm unable to figure out what to parametrize here. I tried using a XdmNode which I thought would be a Destination? But also, I am using Saxon 10 (via NuGet), while Destination seems to be from Saxon 9: net.sf.saxon.s9api.Destination?!

有没有好心通读所有这些内容的人对我如何解决这个问题有什么建议?:-) 或者,也许有更好的方法来解决我没有想到的问题 - 我也很感谢您的建议.

Does anybody who was kind enough to read through all of this have any advice for me on how to tackle this? :-) Or, maybe there's a better way to solve my problem I haven't thought of - I'm also grateful for suggestions.

对不起文字墙!简而言之:我有 Schematron 规则,可以用业务逻辑来扩充 XML 模式.要在没有实际 XML 实例的情况下评估这些规则(不是:根据规则验证实例!),我需要将构成 Schematron 断言的 XQuery 分解为它们的组件,以便我可以处理其中使用的所有 XPath.我认为可以用 Saxon-HE 来完成,但我的知识太有限,甚至无法理解什么是好的起点.我也愿意就可能更好的方法来解决我的实际问题提出建议问题(如上所述).

Sorry for the wall of text! In short: I have Schematron rules that augment an XML schema with business logic. To evaluate these rules (not: validate instances against the rules!) without actual XML instances, I need to break down the XQueries which make up the Schematron's assertions into their components so that I can handle all XPaths used in them. I think it can be done with Saxon-HE, but my knowledge is too limited to even understand what a good starting point what be for that. I'm also open for suggestions regarding a possibly better approach to solve my actual problem (as described in detail above).

感谢您花时间阅读本文.

Thank you for taking the time to read this.

虽然 XQueryX(正如 Michael Kay 所指出的)理论上正是我正在寻找的,但不幸的是我找不到任何关于 .NET 在我的研究中.

While XQueryX (as pointed out by Michael Kay) would theoretically have been exactly what I was looking for, unfortunately I could not find anything useful regarding an implementation for .NET during my research.

所以我最终通过使用 创建自己的解析器解决了整个问题ANTLR4 的 XPath3.1 语法是一个理想的起点.通过这种方式,我现在能够检索任何 Schematron 规则表达式的语法树,从而允许我分别提取每个包含的 XPath 表达式(及其子表达式).

So I eventually solved the whole thing by creating my own parser using the XPath3.1 grammar for ANTLR4 as an ideal starting point. This way, I am now able to retrieve a syntax tree of any Schematron rule expression, allowing me to extract each contained XPath expression (and its sub expressions) separately.

请注意,另一个绊脚石是 .NET 仍然 (!) 仅真正处理 XPath 1.0:虽然我的解析器按预期执行了所有操作,但对于某些找到的表达式,.NET 给了我非法令牌".尝试评估它们时出错.安装 XPath2 NuGet 包(Chertkov/Heyenrath) 是解决方案.

Note that another stumbling block has been the fact that .NET still (!) only handles XPath 1.0 genuinely: While my parser does everything as supposed to, for some of the found expressions .NET gave me "illegal token" errors when trying to evaluate them. Installing the XPath2 NuGet package by Chertkov/Heyenrath was the solution.