Overview
Version
FineDataLink Version | Functional Change |
---|---|
4.0.9 | Added the XML Parsing operator, which could be used to parse the input XML data into the data in row-and-column format. |
Application Scenario
You want to parse the data in XML format returned by APIs, WebServices, or OData-based APIs, as well as the data from XML files, into the data in row-and-column format for subsequent processing and storage.
Function Introduction
You can use the XML Parsing operator in the Data Transformation node in FineDataLink to parse the data in XML format into the data in row-and-column for subsequent processing and storage.
Function Description
The Parsing Configuration page of the XML Parsing operator is shown in the following figure.
Selecting the Source Field
The drop-down list of Select Source Field contains all field names in preceding nodes.
If the upstream node is API Input and the data is not expanded into a two-dimensional table, the source field defaults to Default.
If you tick Keep All Upstream Output Fields After Parsing, all fields output by the upstream node will be merged with the new fields generated after XML parsing for output.
Namespace
Specify the namespace to ensure the nodes can be read correctly if the XML file has a namespace.
The namespace list is displayed after you tick Specify Namespace, where you can add and delete namespaces.
Field | Description |
---|---|
Namespace Prefix | It is editable. Duplicate names are not allowed. If identical namespace prefixes exist in the XML file, fill in URIs correctly and name the two prefixes differently for normal parsing. |
Namespace URI | It is editable. Duplicate names are allowed. |
If there is a default namespace in the XML file, customize a namespace prefix and fill the URI of the default namespace for normal parsing.
For example, there is no namespace prefix in http://111111, so you need to customize a namespace prefix such as xlms, and fill in http://111111 as the namespace URI to make it parsed normally.
Parsing XML Data
Selecting the XML Node
Click the Select XML Node button and select the XML node in the pop-up node selection box.
Example | Multiple Selection Tree Content |
---|---|
![]() | Leaf node: a node that has no child nodes The fields in yellow are leaf nodes. Others are non-leaf nodes. Non-leaf nodes cannot be selected. When two nodes with the same name and different parent nodes are selected, the name of one output field is suffixed with 1. For example, if you select the title node in the /bookstore/store path and the /bookstore/book path in the above figure, the names of the output fields after parsing are title and title1 and the XPath of the two fields is the valid path of the corresponding node. |
Outputting the Field
You can add and delete the output field.
All fields generated after XML parsing are of the string type. (The type of fields passed from the upstream node remains unchanged.)
Field | Description |
---|---|
Field Name After Parsing | It is editable. You can configure the name of fields generated after XML parsing. ![]() 1. Duplicate field names are not allowed. 2. Referencing parameters is not allowed. |
XPath | It is editable. It is the XPath expression of the output field. Referencing parameters is not allowed. Setting XPath manually is allowed. |
You can enter two kinds of XPath expressions: node set and predicate.
The following is an example of an XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
<title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>
Node Set
Nodes in the XML file are selected based on path expressions.
Some path expressions and results are shown in the following table.
Path Expression | Result |
---|---|
bookstore | All child nodes of the bookstore element are selected. |
/bookstore | The root element bookstore is selected. ![]() A path beginning with a forward slash (/) always represents an absolute path of the element. |
bookstore/book | All book elements under the bookstore element are selected. |
//book | All book elements are selected, regardless of their locations in the file. |
bookstore//book | All book elements under the bookstore element are selected, regardless of their locations. |
//@lang | All attributes named lang are selected. |
Predicate
Predicates are used to look for a specific node or a node that contains a specified value.
Predicates are enclosed in square brackets ([ ]).
Some path expressions with predicates and results are shown in the following table.
Path Expression | Result |
---|---|
/bookstore/book[1] | The first book element under the bookstore element is selected. |
/bookstore/book[last()] | The last book element under the bookstore element is selected. |
/bookstore/book[last()-1] | The penultimate book element under the bookstore element is selected. |
/bookstore/book[position()<3] | The first two book elements under the bookstore element are selected. |
//title[@lang] | All title elements with the lang attribute are selected. |
//title[@lang='eng'] | All title elements that have a lang attribute with a value of eng are selected. |
/bookstore/book[price>35.00] | All book elements under the bookstore element whose price element has a value greater than 35.00 are selected. |
/bookstore/book[price>35.00]/title | All title elements under the book element (whose price element has a value greater than 35.00) under the bookstore element are selected. |
Special Scenario Handling Strategy
Scenario | Result or Handling Strategy |
---|---|
The source XML data contains multiple root elements, as shown in the following figure. | When you click Select XML Node, an error message appears: XML data root node is missing. When you preview and run the manually set XPath, an error message appears: XML data root node is missing. |
The configured XPath is incorrect. | The field content after parsing is empty. |
The configured XPath is invalid or the namespace prefix contains characters other than English letters. | Parsing exception occurs. |
The namespace prefixes are repeated. | The namespace prefixes cannot be repeated. In this example, rename one s prefix, and fill in the corresponding URIs. The data is parsed normally after you select nodes from the node tree. If the paths are filled in manually, set the path according to the new namespace prefix. |
The source XML data is incomplete, as shown in the following figure. | When you click Select XML Node, an error message appears: XML data format is incomplete. When you preview and run the manually set XPath, an error message appears: XML data format is incomplete. |
Example
For details about using the XML Parsing operator, see Example of XML Parsing.