Text values matching
Structure matching
- Matching texts equality
- Matching text with wildcard pattern
- Matching text with Regular Expression pattern
- Matching numbers
- Matching angle values
- Matching time-of-day values
Matching XML structures with regular expressions
Advanced matching
Internal design
- Matcher
- Strategy
- StrategySelector
XML Matcher namespace declaration is omitted to save space from all examples in this guide! |
<root xmlns:xm='http://xml.sf.net/xmlmatcher/1.0'>
...
</root>
Template:
<name>John Doe</name> |
Will match with the same text value
after normalization:
<name>John Doe</name>Will not match text that has different character case: <name>jOHN dOE</name>or extra spaces <name> John Doe |
Template specifying empty
element:
<name/> |
Will match with another form of
empty element:<name></name>Empty element will not match with text that contains some space characters: <name>or <name> </name> |
<name xm:ignorecase='true' xm:trim='true'> |
Will match using case-insensitive matching, with
leading and trailing space characters removed:
<name> jOHN dOE </name> |
Template:
<street xm:wild="true">Great*St*</street> |
Will match:
<street>Great Plain St</street>Will not match: <street>Great Ridge Ave</street> |
Template:
<street xm:regex-text="true">.*30.*</street> |
Will match:
<street>Route 30</street>Will not match: <street>31 Washington St</street> |
Template:
<x xm:tolerance='0.01'>-72.98</x> |
Will match:
<x>-72.9873170132</x> |
Template defines that turn angle should be equal to 10
plus/minus 15:
<turnAngle xm:tolerance='15.0' xm:period='360'>10</turnAngle> |
The following actual element will match, since 359 lies
within -5...+25 :
<turnAngle>359</turnAngle> |
Please note that this strategy uses custom thread-safe parser which does have I18N support.
Time of day string format: H[H][:M[M][:S[S]]] [am|pm|AM|PM]
,
in other words:
For example,
5 same as 05:00:00
0:5 same as 00:05:00
5 p.m. same as 17:00:00
Template:
<start xm:time-tolerance='0:01'>10:00 am</start> |
Will match:
<start>10:00:56</start> |
In simple case, template element matches with instance element when both have the same XML tag names and matching sequence of children elements.
Template:
<street>16 Tech Circle</street> |
Will match with instance:
<street>16 Tech Circle</street>Will not match with different element tag name (note that tag names are case-sensitive): <Street>16 Tech Circle</Street>Will not match with instance containing different text value: <street>120 Oak St</street>Will not match with instance without text value: <street/> |
Template:
<address> |
Will match with instance:
<address>Will not match with instance containing extra element: <address>Will not match with instance that has missing element (<city>): <address>Will not match with instance that has different order of elements: <address>Will not match with instance that has extra text element (mixed content case): <address> Some Unexpected textWill not match with instance that has non-matching child (here child has different text value): <address> |
If you want to specify single
element with any tag name and any content, use special <xm:any>
element.
Note: <xm:any> element in template document may not have
any
sub-elements, but can match to actual elements with or without
sub-elements.
Template:
<address xm:regex-dom="true"> |
Will match with instance of <address> that has any
single element as a content:
<address>Note that xm:any can match with element that has complex element: <address>Will not match with instance of <address> that has empty content: <address/>Will not match with more than one element: (see maxOccurence attribute description below on how to match the same template element multiple times): <address> |
If you want to specify multiplicity
of an element, use optional xm:minOccurs and maxOccurs
attributes. By default xm:minOccurs and maxOccurs values are equal to 1
(when left unspecified).
Use special value "unbounded" to specify "zero or many" type of
occurrence. Value of minOccurs must be less than or equal to maxOccurs.
These attributes can be defined on any elements including elements
from xr namespace (any, group, choice, not).
Note: current version only supports the following values: 0, 1,
unbounded.
Template:
<address xm:regex-dom="true"> |
Will succeed matching with instance with or without
<street> child:
<address>or <address/>Will not match more than one occurrence: <address> |
You also can specify occurrences on <xm:any> element. Template: <address xm:regex-dom="true"> |
Will match any instance that contains the same
<state> element:
<address>or <address>or <address> Will not match with any instance missing a <state> element: <address>Will not match with instance containing different value of <state> element: <address>Will match multiple occurrences of <state> <address> |
Template:
<address xm:regex-dom="true"> |
Will match with instance when all elements of the sequence
appear exactly once in order they defined in template:
<address>or when entire group of elements is missing (minOccurs is 0): <address/>Will not match when elements appear in different order: <address>Will not match when one element from the group is missing: <address> |
Template:
<xm:choice> |
Will match with either element <nickname/> or
pair of elements <first/><last/>.
|
Template<xm:except-any-of> |
Will match any single element, except element
with tagname "red" or "green". |
Template<xm:except-any-of xm:maxOccurs="unbound"> |
Will match any number of elements, each can be anything except simple element with tagname "red" or "green". |
Template:
<xm:except-any-of xm:minOccurs='unbounded'> |
Will match any sequence of elements that contains one only
one element <left/> immediately followed by <right/>. |
Java Script object name |
Description |
out |
java.lang.System.out |
err |
java.lang.System.err |
a |
Object of type org.w3c.dom.Element, in current
context is initialized to current element of actual document |
t |
Object of type org.w3c.dom.Element, in current context is initialized to current element of template document |
assert.pathExists (xpath) |
Verifies that given XPath string selects at
least
one node in actual document, XPath context node is current element. |
assert.equals(xpath1, xpath2) assert.equals(xpath1, xpath2, tolerance) |
Verifies that textual values of two nodes selected by XPath strings in actual document are equal, XPath context node is current element. |
assert.isTrue(condition) assert.isFalse(condition) |
Verifies that given JavaScript condition is true/false. |
... what else do we need ? ... |
Template:
<?javascript asserts.pathExists("/step/[street='Route 30']") ?> |
Ensures that actual document has element that match XPath: /step/[street='Route 30'] <steps> |
<?javascript |
Ensures that first and last step elements use
the same street:<steps> |
In the following template street elements
are compared using wildcard mask:
... |
Will match when two elements match '* Main St'
wildcard
and identical to each other:...The following fragment will not match because two values are not identical (although both match their own template wildcards) : ... |
The following template shows numeric equality
with tolerance:
<?equ-tolerance sameCoordinates=0.00001 ?> |
The following fragment will not match
because difference between two numbers in the same equality set exceed
defined tolerance (although they are within their own tolerances):
... |
TODO: Do we need optional pattern=<regex> parameter for equality sets that match text elements?
Order |
Strategy |
Accept elements without children? |
Accept elements with children? |
Selection Criteria |
1 |
FloatingPointNumbersMatchingStrategy |
yes |
no |
Presence of xm:tolerance attribute |
2 |
RegExTextMatchingStrategy |
yes |
yes |
Presence of xm:regextext='true' attribute
value |
3 |
WildcardMatchingStrategy |
yes |
yes |
Presence of xm:wild='true' attribute value |
4 |
TimeOfDayMatchingStrategy |
yes |
no |
Presence of xm:time-tolerance attribute |
5 |
ChildrenOkMatchingStrategy |
no |
no |
Presence of xm:children='ignore' attribute value |
6 |
AngleMatchingStrategy |
yes |
no |
Presence of xm:period attribue |
7 |
EqualTextValueMatchingStrategy |
yes |
yes |
Default for text-only elements,
otherwise
presence of xm:ignorecase attribute. |
8 |
RegExElementsMatchingStrategy |
no |
yes |
Presense of xm:regexdom='true' attribute value |
9 |
ElementSequenceMatchingStrategy |
yes |
yes |
Presence of xm:children='sequence'. |
10 |
ElementSetMatchingStrategy | yes |
yes |
Presence of xm:children='set'. |
11 |
ElementBagMatchingStrategy | yes |
yes |
Presence of xm:children='bag', also default for complex elements. |
Traditional Regular Expressions construct |
Description |
XML analogue |
Description |
x |
single symbol |
<x>...</x> |
Matches single element with tagname x. See more. |
. |
any symbol | <xm:any/> |
Matches single element with any tagname and any
content. See more. |
x? x+ x* {n:m} |
repetition | <x xm:minOccurs='n'
xm:maxOccurs="m">...</x> |
Matches content zero or more
times. See more. |
(xyz)
|
group |
<xm:group>
<x/><y/><z/>
</xm:group> |
Defines group of elements. See more. |
(x | y | z ) |
choice |
<xm:choice> <x/><y/><z/>
</xm:choice> |
Defines matching alternatives. See more. |
(^ xyz) |
negation |
<xm:except-any-of>
<x/><y/><z/> </xm:except-any-of> |
Matches any single element that doesn't match
with given alternative(s). See more. |
<!-- anything element except step via 'Mass Pike' route -->Note: this strategy is applicable for XML structure matching. There is a similarly named strategy for matching text nodes values.
<xm:except-any-of xm:minOccurs='0' xm:maxOccurs='unbounded'> <!-- Note 'unbounded' represents "zero or more" multiplicity -->
<step>
<route>Mass Pike</route>
</step>
</xm:except-any-of>
<!-- followed by two steps via 'Mass Pike' and 'Route 30' -->
<step>
<route>Mass Pike</route>
</step>
<step>
<route>Route 30</route>
</step>
<!-- followed by at least one element -->
<xm:any xm:maxOccurs='unbounded'/>