Sr. Director, Product Research, Epicor Software

Software Architecture

This blog is not associated with my employer.

Monday, December 12, 2005

The W3C Schema Patterns WG is not Misguided

Dare Obasanjo’s recent post labeling the W3C W3C XML Schema Patterns for Databinding Working Group as misguided seems like an overreaction. The toolkit vendors put out bad XML Schema processors and invented those leaky abstractions in the (misguided?) rush to make XML painless and web services a transparent feature for programmers with typical skillsets and approaches.

Many (including me) think that moving to XML as a primary integration mechanism for applications should a great step forward. But many IT staff who actually have to link different apps together are complaining that their job is much harder now than it ever was. Industry consortia are having a hell of a time publishing good standardized schemas because the inclusion of some seemingly innocuous XML Schema features will unknowingly break constituent implementations.

You can’t tell developers to simply avoid statically typed languages (at least not yet). You also can’t tell developers to wait a bit longer and the toolkit vendors will somehow watertighten their abstractions in an interoperable way. You *can* tell developers to avoid leaky abstractions to process XML, but you get resistance (which is regrettable). Worse, developers have to sometimes fight their toolkits to even do get to the message payload.

So, I don’t know what is so wrong about the W3C trying to alleviate the situation by attempting to shine a light on issues that seriously impact users. It may perpetuate XML <-> OO binding, which many people — including me — think is a problematic strategy. But maybe the W3C can get the Infoset in more hands sooner. More people can walk before they run, if you will.

Thursday, December 08, 2005

UPA is your Friend (Repost)

The W3C XML Schema specification has a rule called unique particle attribution (UPA) that confuses, well, many. And after 4+ years of XML Schema in the wild, it still amazes me how inconsistent toolkits handle the issue. One toolkit – I think XMLSpy – at one time rebelled against UPA by enforcing the rule only if the user wished it. Just this week, I found a discrepancy in handling UPA between SQL 2005 and .NET Framework 2.0 (more on that later).


One problem in trying to comply with the UPA rule is that it is hard to describe in words. Here are a couple of examples to get the point across. This schema type is perfectly fine:



<xs:element name="MyType">


<xs:complexType>


<xs:sequence>


<xs:element name="Foo" />


<xs:element name="Foo" />


</xs:sequence>


</xs:complexType>


</xs:element>




But this type violates UPA:


<xs:element name="MyType">


<xs:complexType>


<xs:sequence>


<xs:element name="Foo" minOccurs="0"/>


<xs:element name="Foo" />


</xs:sequence>


</xs:complexType>


</xs:element>


Why? Suppose a schema processor is checking a document that looks like this:


<MyType>


<Foo />


</MyType>


The processor can’t figure out which element declaration in the schema to match the element “Foo”. It must be able to find the match unambiguously. You might think that it just doesn’t matter: the document still fits the description. But it’s critical for a schema processor, because it must be able to align the current XML document node with exactly one schema declaration. This disconnect between how people feel they should be able to describe a document and how a schema processor really works is what’s really behind the criticism of UPA. But most documents and schemas are WAY more complicated than these examples, and the UPA rule keeps the validation mechanisms manageable. By the way, this change to the schema fixes the UPA problem – can you see why?



<xs:element name="MyType">


<xs:complexType>


<xs:sequence>


<xs:element name="Foo" />


<xs:element name="Foo" minOccurs="0"/>


</xs:sequence>


</xs:complexType>


</xs:element>




When the schema validator hits the first “Foo” element, it knows that “Foo” must have been declared in the first element declaration within “MyType”.


The UPA problem can come up just as result of schema factoring. I came across this situation when looking at a schema for an XML document containing a SQL expression tree:


<xs:element name="Subquery">


<xs:complexType>


<xs:choice>


<xs:sequence>


<xs:group ref="tns:grpColumn" minOccurs="1" maxOccurs="1" />


<xs:group ref="tns:grpQuery"/>


</xs:sequence>


<xs:sequence>


<xs:group ref="tns:grpColumn" minOccurs="1" maxOccurs="1" />


<xs:element name="ListItem" minOccurs="1" maxOccurs="unbounded">


<xs:complexType>


<xs:attributeGroup ref="attGrpScalarValue" />


</xs:complexType>


</xs:element>


</xs:sequence>


</xs:choice>


<xs:attributeGroup ref="tns:attGrpSubqueryItems" />


</xs:complexType>


</xs:element>


This schema type gives a choice of two sequences, both starting with an element group called “grpColumn”. So, when the schema validation processor encounters the group elements in an XML document, it can’t tell which choice is in play. Also, the schema spec designers did not want schema processors to look “past” the current node to try and resolve which schema node matches a document node. By the way, this is the schema construct that .NET Framework 2.0 compiles with no errors, but SQL 2005 (via MSXML 6.0) throws a UPA violation error.


I refactored the schema a little to eliminate the UPA violation:


<xs:element name="Subquery">


<xs:complexType>


<xs:sequence>


<xs:group ref="tns:grpColumn" minOccurs="1" maxOccurs="1" />


<xs:choice>


<xs:group ref="tns:grpQuery"/>


<xs:element name="ListItem" minOccurs="1" maxOccurs="unbounded">


<xs:complexType>


<xs:attributeGroup ref="attGrpScalarValue" />


</xs:complexType>


</xs:element>


</xs:choice>


</xs:sequence>


<xs:attributeGroup ref="tns:attGrpSubqueryItems" />


</xs:complexType>


</xs:element>


So again, you could argue that both schema examples are identical when it comes to describing an XML document. Why should UPA make me use a specific methodology? It’s because the people who created the XML Schema specification were also envisioning how schema validation tools would be created.



In the case above, UPA forced me – IMO – to create a better schema. So aside from just general unfamiliarity with UPA and inconsistent toolkit support, does UPA really do any harm? Schema authors sometimes want to allow users to add extra elements to their XML documents. Since you don’t know what the XML will look like ahead of time, it’s nice to use a wildcard like xs:any in the schema. The problem is that UPA restricts the occurrence of a wildcard element depending on the occurrence of the element previously declared.



For example, this is a legal sequence:



<xs:sequence>


<xs:element name="Foo"/>


<xs:any />


</xs:sequence>


However, you can’t change the occurrence of “Foo” to any value other than “1” (the default) or you violate UPA. UPA does not allow any wildcard to be next to an optional element. Also, you can’t put a wildcard before any element in the same namespace. Some feel this constrains extensibility in some cases.

But honestly, I haven’t found a situation where UPA caused pain that a workaround doesn’t fix. I also think you can extend schemas better by wrapping them rather than extending them. In other words, create a schema that imports the schema you want to extend, add some new elements and tie them to the original schema through key/keyref declarations. It’s really the only way for schema validation to work with content unknown at design time.

Archive