Erik Johnson: UPA is your Friend (Repost)

The W3C XML Schema specification has a rule called unique particle attribution (UPA) that confuses, well, many. And after 4+ years of XML Schema in the wild, it still amazes me how inconsistent toolkits handle the issue. One toolkit – I think XMLSpy – at one time rebelled against UPA by enforcing the rule only if the user wished it. Just this week, I found a discrepancy in handling UPA between SQL 2005 and .NET Framework 2.0 (more on that later).

One problem in trying to comply with the UPA rule is that it is hard to describe in words. Here are a couple of examples to get the point across. This schema type is perfectly fine:

<xs:element name="MyType">

<xs:complexType>

<xs:sequence>

<xs:element name="Foo" />

</xs:sequence>

</xs:complexType>

</xs:element>

But this type violates UPA:

<xs:element name="MyType">

<xs:complexType>

<xs:sequence>

<xs:element name="Foo" minOccurs="0"/>

<xs:element name="Foo" />

</xs:sequence>

</xs:complexType>

</xs:element>

Why? Suppose a schema processor is checking a document that looks like this:

<Foo />

</MyType>

The processor can’t figure out which element declaration in the schema to match the element “Foo”. It must be able to find the match unambiguously. You might think that it just doesn’t matter: the document still fits the description. But it’s critical for a schema processor, because it must be able to align the current XML document node with exactly one schema declaration. This disconnect between how people feel they should be able to describe a document and how a schema processor really works is what’s really behind the criticism of UPA. But most documents and schemas are WAY more complicated than these examples, and the UPA rule keeps the validation mechanisms manageable. By the way, this change to the schema fixes the UPA problem – can you see why?

<xs:element name="MyType">

<xs:complexType>

<xs:sequence>

<xs:element name="Foo" />

<xs:element name="Foo" minOccurs="0"/>

</xs:sequence>

</xs:complexType>

</xs:element>

When the schema validator hits the first “Foo” element, it knows that “Foo” must have been declared in the first element declaration within “MyType”.

The UPA problem can come up just as result of schema factoring. I came across this situation when looking at a schema for an XML document containing a SQL expression tree:

<xs:element name="Subquery">

<xs:complexType>

<xs:choice>

<xs:sequence>

<xs:group ref="tns:grpColumn" minOccurs="1" maxOccurs="1" />

<xs:group ref="tns:grpQuery"/>

</xs:sequence>

<xs:sequence>

<xs:group ref="tns:grpColumn" minOccurs="1" maxOccurs="1" />

<xs:element name="ListItem" minOccurs="1" maxOccurs="unbounded">

<xs:complexType>

<xs:attributeGroup ref="attGrpScalarValue" />

</xs:complexType>

</xs:element>

</xs:sequence>

</xs:choice>

<xs:attributeGroup ref="tns:attGrpSubqueryItems" />

</xs:complexType>

</xs:element>

This schema type gives a choice of two sequences, both starting with an element group called “grpColumn”. So, when the schema validation processor encounters the group elements in an XML document, it can’t tell which choice is in play. Also, the schema spec designers did not want schema processors to look “past” the current node to try and resolve which schema node matches a document node. By the way, this is the schema construct that .NET Framework 2.0 compiles with no errors, but SQL 2005 (via MSXML 6.0) throws a UPA violation error.

I refactored the schema a little to eliminate the UPA violation:

<xs:element name="Subquery">

<xs:complexType>

<xs:sequence>

<xs:group ref="tns:grpColumn" minOccurs="1" maxOccurs="1" />

<xs:choice>

<xs:group ref="tns:grpQuery"/>

<xs:element name="ListItem" minOccurs="1" maxOccurs="unbounded">

<xs:complexType>

<xs:attributeGroup ref="attGrpScalarValue" />

</xs:complexType>

</xs:element>

</xs:choice>

</xs:sequence>

<xs:attributeGroup ref="tns:attGrpSubqueryItems" />

</xs:complexType>

</xs:element>

So again, you could argue that both schema examples are identical when it comes to describing an XML document. Why should UPA make me use a specific methodology? It’s because the people who created the XML Schema specification were also envisioning how schema validation tools would be created.

In the case above, UPA forced me – IMO – to create a better schema. So aside from just general unfamiliarity with UPA and inconsistent toolkit support, does UPA really do any harm? Schema authors sometimes want to allow users to add extra elements to their XML documents. Since you don’t know what the XML will look like ahead of time, it’s nice to use a wildcard like xs:any in the schema. The problem is that UPA restricts the occurrence of a wildcard element depending on the occurrence of the element previously declared.

For example, this is a legal sequence:

<xs:sequence>

<xs:element name="Foo"/>

<xs:any />

</xs:sequence>

However, you can’t change the occurrence of “Foo” to any value other than “1” (the default) or you violate UPA. UPA does not allow any wildcard to be next to an optional element. Also, you can’t put a wildcard before any element in the same namespace. Some feel this constrains extensibility in some cases.

But honestly, I haven’t found a situation where UPA caused pain that a workaround doesn’t fix. I also think you can extend schemas better by wrapping them rather than extending them. In other words, create a schema that imports the schema you want to extend, add some new elements and tie them to the original schema through key/keyref declarations. It’s really the only way for schema validation to work with content unknown at design time.

3 comments:

orcmid said...: It's interesting that RELAX-NG does not require unique parse trees, which I suppose is one reason that there's no convention for associating Relax-NG schemas to instances.
- - - - - - - - - - - - - - -

The code reads fine (even without indents) on the page, but the feed is awful. As presented by NewsGator in Outlook 2003, the fonts keep shrinking each time there's an other chunk of XML, and they keep shrinking until they are completely unreadable. I suspect there's some weird tag nesting but haven't looked at the source.

(Try using BlogJet to make and upload posts, by the way. You can edit the HTML there too, or paste in already created HTML, something that I do for hairy posts.); 7:44 AM
Erik J. said...: As you can see, I have not having any luck improving the format of the XML samples! I tried BlogJet, but I think the Blogger template is getting in the way. I'll keep working at it.; 8:50 AM
orcmid said...: The repost maintains size (though a bit small compared to the text of comments, for example). I'm suspicious of all of those spans that set fonts at 85%.

The colorization is nice. I would try using the block indent to get indenting, but depending on the template you are using, if the designer assumed blocks meant quotations, you will get weird behavior. So much for semantics versus presentation in markup. Heh [;<).; 12:53 PM

Erik Johnson

Thursday, December 08, 2005

UPA is your Friend (Repost)

3 comments:

Archive