LoadXML
LoadXML operation loads an arbitrary XML document into a set of tables. You specify rules to map the XML document into a set of tables using XPath expressions.
The implementation of the LoadXML operation is based on streaming (ie StAX) mechanism. It allows loading data of any size in a fast way with low memory consumption.
LoadXML parameters overview:
Source,Space, andTablespecify location of the XML document to be loaded.Target SourceandTarget Spacespecify location of tables to be created.Mapping rulesspecify how to make XML onto tables.
Mapping rules uses path expressions to address XML elements in the
document. Path expression is a subset of the XPath language: a sequence
of XML element names separated by /. To address an attribute use @ before the attribute name (for example, ./@id). No other features such as filters
(denoted in []), decendent axis (denoted //), etc are supported.
There are several types of path expressions:
-
Absolute path expression starts with
/, for example,/classes/class. Absolute path specifies a path from the root of the document. In the previous example it means thatclasseselement must be the top element of the document. -
Relative path expression starts with them name of the element. For example,
classes/classmatches anyclasselement that is nested inclassesand it does not matter where these element (classes,class) located in the document. They can even be arbitrary nested in each other - recursive structure. -
Immediate relative path expression are only used to specify a path to nested tables (rules that have
Parent Table) and columns. Immediate relative path starts with.and means an absolute path from a parent/record element as its root element. For example,./Properties/Propertycan be used to addressPropertyelement nested inPropertieselement that is an immediate child of the parent/record element. Note, that immediate relative path can be but not required to be used to address nested tables and/or columns - you can use relative path for that purpose as well.
Mapping rules consist of Table name, Parent Table name, and Path.
XML elements addressed by the path are parsed into records and loaded
into the Table.
The record parsed from the address element includes il_id column that
is usually a sequential number of the element at the path and is used as
a key to refer to the record from nested tables. The record also
includes all attributes of the addressed XML element as columns. Nested
elements of the address XML element are added as columns to the record according to the column mapping. To define the column mapping expand the row of the mapping rule by clicking on the triangle
at the left end of the row. You define the name of the column and
relative or immediate relative path expression to an XML element
representing the column. Name of the column is optional: the name of
the element will be used as column name when the name of the column
is blank. You must specify at least one column in the column mapping.
You can map nested XML data using Parent Table column of the mapping
rule. Specifying Parent Table means:
- Path must be relative or immediate relative from the elements that represents records of the parent table.
- Records of the table that has Parent Table will include il_parent_id column that refers to il_id column of the parent table.
- It is an error to specify Parent Table that is not defined in some other rule as Table .
Example 1: Nested elements
Below is an example of XML document where multiple Feature elements
are nested in each element class. Also notice that the structure of
Feature elements is simple (ie all feature’s properties are
immediately nested elements and all possible properties appear in the
first Feature element) so that we can use automatic columns creation
(ie no explicit rules specified). The structure of class elements is
more complicated: some properties of a class (eg name) are direct
descendants but some (eg price) are nested in otherFields subelement
so that we have to use explicit column mapping for class elements.
<?xml version="1.0" encoding="utf-8"?>
<root>
<class id="c1">
<name>Car</name>
<otherFields>
<price>3</price>
</otherFields>
<Features>
<Feature>
<name>color</name>
<type>string</type>
</Feature>
<Feature>
<name>weight</name>
<type>decimal</type>
</Feature>
</Features>
</class>
<class id="c2">
<name>Book</name>
<otherFields>
<price>5</price>
</otherFields>
<Features>
<Feature>
<name>paper</name>
<type>string</type>
</Feature>
<Feature>
<name>numOfPages</name>
<type>int</type>
</Feature>
</Features>
</class>
</root>
Below are mapping rules we use to load the XML document.
<table>
<thead>
<tr>
<th>Table</th>
<th>Parent Table</th>
<th>Path</th>
</tr>
</thead>
<tbody>
<tr>
<td>class_table</td>
<td></td>
<td>/root/class</td>
</tr>
<tr>
<td><table>
<caption>Columns mapping rules</caption>
<thead>
<tr class="header">
<th>Column</th>
<th>Path</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td></td>
<td>./name</td>
</tr>
<tr class="even">
<td>class_price</td>
<td>price</td>
</tr>
</tbody>
</table></td>
<td></td>
<td></td>
</tr>
<tr>
<td>feature_table</td>
<td>class_table</td>
<td>./Features/Feature</td>
</tr>
<tr class="even">
<td><table>
<caption>Columns mapping rules</caption>
<thead>
<tr class="header">
<th>Column</th>
<th>Path</th>
</tr>
</thead>
<tbody>
</tbody>
</table></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>The above rules will create two tables class_table and feature_table
that are shown below.
| il_id | id | name | class_price |
|---|---|---|---|
| 0 | c1 | Car | 3 |
| 1 | c2 | Book | 5 |
Result table: class_table
| il_id | il_parent_id | name | type |
|---|---|---|---|
| 0 | 0 | color | string |
| 1 | 0 | weight | string |
| 2 | 1 | paper | string |
| 3 | 1 | numOfPages | int |
Result table: feature_table
Example 2: Recursively nested elements
Below is an example of XML document with class nested in each other
with unknown depth.
<?xml version="1.0" encoding="utf-8"?>
<root>
<class>
<name>Car</name>
<subclasses>
<class>
<name>Sedan</name>
<subclasses>
<class>
<name>Mazda</name>
</class>
<class>
<name>BMW</name>
</class>
</subclasses>
</class>
<class>
<name>Hatchback</name>
</class>
</subclasses>
</class>
</root>To load arbitrary nested elements use a relative path express, which is
just the name of the element class in our example, and set the Parent Table to the Table as shown below.
| Table | Parent Table | Path |
|---|---|---|
| recursive_class_table | recursive_class_table | class |
The above rule will create table recursive_class_table where each
record represents a class element. il_parent_id stores the nesting
structure.
| il_id | il_parent_id | name |
|---|---|---|
| 2 | 1 | Mazda |
| 3 | 1 | BMW |
| 1 | 0 | Sedan |
| 4 | 0 | Hatchback |
| 0 | null | Car |