Skip to Content

LoadXML

LoadXML operation loads an arbitrary XML document into a set of tables. You specify rules to map the XML document into a set of tables using XPath expressions.

The implementation of the LoadXML operation is based on streaming (ie StAX) mechanism. It allows loading data of any size in a fast way with low memory consumption.

LoadXML parameters overview:

  • Source, Space, and Table specify location of the XML document to be loaded.
  • Target Source and Target Space specify location of tables to be created.
  • Mapping rules specify how to make XML onto tables.

Mapping rules uses path expressions to address XML elements in the document. Path expression is a subset of the XPath language: a sequence of XML element names separated by /. To address an attribute use @ before the attribute name (for example, ./@id). No other features such as filters (denoted in []), decendent axis (denoted //), etc are supported. There are several types of path expressions:

  • Absolute path expression starts with /, for example, /classes/class. Absolute path specifies a path from the root of the document. In the previous example it means that classes element must be the top element of the document.

  • Relative path expression starts with them name of the element. For example, classes/class matches any class element that is nested in classes and it does not matter where these element (classes, class) located in the document. They can even be arbitrary nested in each other - recursive structure.

  • Immediate relative path expression are only used to specify a path to nested tables (rules that have Parent Table) and columns. Immediate relative path starts with . and means an absolute path from a parent/record element as its root element. For example, ./Properties/Property can be used to address Property element nested in Properties element that is an immediate child of the parent/record element. Note, that immediate relative path can be but not required to be used to address nested tables and/or columns - you can use relative path for that purpose as well.

Mapping rules consist of Table name, Parent Table name, and Path. XML elements addressed by the path are parsed into records and loaded into the Table.

The record parsed from the address element includes il_id column that is usually a sequential number of the element at the path and is used as a key to refer to the record from nested tables. The record also includes all attributes of the addressed XML element as columns. Nested elements of the address XML element are added as columns to the record according to the column mapping. To define the column mapping expand the row of the mapping rule by clicking on the triangle at the left end of the row. You define the name of the column and relative or immediate relative path expression to an XML element representing the column. Name of the column is optional: the name of the element will be used as column name when the name of the column is blank. You must specify at least one column in the column mapping.

You can map nested XML data using Parent Table column of the mapping rule. Specifying Parent Table means:

  • Path must be relative or immediate relative from the elements that represents records of the parent table.
  • Records of the table that has Parent Table will include il_parent_id column that refers to il_id column of the parent table.
  • It is an error to specify Parent Table that is not defined in some other rule as Table .

Example 1: Nested elements

Below is an example of XML document where multiple Feature elements are nested in each element class. Also notice that the structure of Feature elements is simple (ie all feature’s properties are immediately nested elements and all possible properties appear in the first Feature element) so that we can use automatic columns creation (ie no explicit rules specified). The structure of class elements is more complicated: some properties of a class (eg name) are direct descendants but some (eg price) are nested in otherFields subelement so that we have to use explicit column mapping for class elements.

<?xml version="1.0" encoding="utf-8"?> <root> <class id="c1"> <name>Car</name> <otherFields> <price>3</price> </otherFields> <Features> <Feature> <name>color</name> <type>string</type> </Feature> <Feature> <name>weight</name> <type>decimal</type> </Feature> </Features> </class> <class id="c2"> <name>Book</name> <otherFields> <price>5</price> </otherFields> <Features> <Feature> <name>paper</name> <type>string</type> </Feature> <Feature> <name>numOfPages</name> <type>int</type> </Feature> </Features> </class> </root>

Below are mapping rules we use to load the XML document.

<table> <thead> <tr> <th>Table</th> <th>Parent Table</th> <th>Path</th> </tr> </thead> <tbody> <tr> <td>class_table</td> <td></td> <td>/root/class</td> </tr> <tr> <td><table> <caption>Columns mapping rules</caption> <thead> <tr class="header"> <th>Column</th> <th>Path</th> </tr> </thead> <tbody> <tr class="odd"> <td></td> <td>./name</td> </tr> <tr class="even"> <td>class_price</td> <td>price</td> </tr> </tbody> </table></td> <td></td> <td></td> </tr> <tr> <td>feature_table</td> <td>class_table</td> <td>./Features/Feature</td> </tr> <tr class="even"> <td><table> <caption>Columns mapping rules</caption> <thead> <tr class="header"> <th>Column</th> <th>Path</th> </tr> </thead> <tbody> </tbody> </table></td> <td></td> <td></td> </tr> </tbody> </table>

The above rules will create two tables class_table and feature_table that are shown below.

il_ididnameclass_price
0c1Car3
1c2Book5

Result table: class_table

il_idil_parent_idnametype
00colorstring
10weightstring
21paperstring
31numOfPagesint

Result table: feature_table

Example 2: Recursively nested elements

Below is an example of XML document with class nested in each other with unknown depth.

<?xml version="1.0" encoding="utf-8"?> <root> <class> <name>Car</name> <subclasses> <class> <name>Sedan</name> <subclasses> <class> <name>Mazda</name> </class> <class> <name>BMW</name> </class> </subclasses> </class> <class> <name>Hatchback</name> </class> </subclasses> </class> </root>

To load arbitrary nested elements use a relative path express, which is just the name of the element class in our example, and set the Parent Table to the Table as shown below.

TableParent TablePath
recursive_class_tablerecursive_class_tableclass

The above rule will create table recursive_class_table where each record represents a class element. il_parent_id stores the nesting structure.

il_idil_parent_idname
21Mazda
31BMW
10Sedan
40Hatchback
0nullCar
Last updated on