QTAssistant provides an easy to use feature to derive and visualize an ADO.NET DataSet, from an XML or an XML Schema (XSD) file. The core functionality comes from the .NET Framework and is documented on MSDN. This tool is available as an ADO.NET DataSet tab page in XML and XSD editors.
Using this method to understand an XSD, or to facilitate writing SQL queries for analysis of XML files, sometimes yields unexpected results; the most common issue is missing relations between tables.
This topic demonstrates how two different XSD authoring styles, describing the same XML structure, affect the inference of a DataSet. This should help users understand the options they have to overcome what may be seen as a limitation in the .NET framework.
The XML Document
For illustration we are using a simplified document, derived from the W3C Purchase Order (PO) example XML Schema.
<?xmlversion="1.0"?>
<purchaseOrder xmlns="http://tempuri.org/po.xsd">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
The expectation of most users is to see a relational structure having five interconnected tables. However, the ADO.NET DataSet view in the QTAssistant XML Editor shows a different layout. A major drawback, often pointed out, is that such a relational structure would not be able to store the content of more than one XML document. For instance, none of the top level entities have an unique identifier to associate an entry with an XML document.
Figure 1.
XML Schema - Version 1
The inferred XML Schema for the above XML document is shown below.
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://tempuri.org/po.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="purchaseOrder">
<xs:complexType>
<xs:sequence>
<xs:element name="shipTo">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="street" type="xs:string" />
<xs:element name="city" type="xs:string" />
<xs:element name="state" type="xs:string" />
<xs:element name="zip" type="xs:unsignedInt" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="billTo">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="street" type="xs:string" />
<xs:element name="city" type="xs:string" />
<xs:element name="state" type="xs:string" />
<xs:element name="zip" type="xs:unsignedInt" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="items">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="item">
<xs:complexType>
<xs:sequence>
<xs:element name="productName" type="xs:string" />
<xs:element name="quantity" type="xs:unsignedByte" />
<xs:element name="USPrice" type="xs:decimal" />
<xs:element minOccurs="0" name="shipDate" type="xs:date" />
<xs:element minOccurs="0" name="comment" type="xs:string" />
</xs:sequence>
<xs:attribute name="partNum" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The derived relational structure is the same as Figure 1.
XML Schema - Version 2
One way to force the creation of a table for the document element (here purchaseOrder) is to make its type global.
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns="http://tempuri.org/po.xsd" targetNamespace="http://tempuri.org/po.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="purchaseOrder" type="purchaseOrder"/>
<xs:complexType name="purchaseOrder">
<xs:sequence>
<xs:element name="shipTo">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="street" type="xs:string" />
<xs:element name="city" type="xs:string" />
<xs:element name="state" type="xs:string" />
<xs:element name="zip" type="xs:unsignedInt" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="billTo">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="street" type="xs:string" />
<xs:element name="city" type="xs:string" />
<xs:element name="state" type="xs:string" />
<xs:element name="zip" type="xs:unsignedInt" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="items">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="item">
<xs:complexType>
<xs:sequence>
<xs:element name="productName" type="xs:string" />
<xs:element name="quantity" type="xs:unsignedByte" />
<xs:element name="USPrice" type="xs:decimal" />
<xs:element minOccurs="0" name="shipDate" type="xs:date" />
<xs:element minOccurs="0" name="comment" type="xs:string" />
</xs:sequence>
<xs:attribute name="partNum" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>
For the inferred DataSet, the relations are now displayed as expected.
Conclusion
This paper demonstrated the impact the XML Schema authoring styles have on deriving a ADO.NET DataSet by showing how changing the type of the root element from anonymous to global creates new tables and relationships in the DataSet.
References
Inferring Relationships (ADO.NET)
Summary of the DataSet Schema Inference Process