XPath and XQuery - Logowanie - Uniwersytet Warszawskiczarnik/zajecia/xml10/07-en... · ·...
Transcript of XPath and XQuery - Logowanie - Uniwersytet Warszawskiczarnik/zajecia/xml10/07-en... · ·...
XPath and XQuery
Patryk Czarnik
Institute of Informatics University of Warsaw
XML and Modern Techniques of Content Management – 2010/11
IntroductionStatusXPath Data Model
XPath languageBasic constructsXPath 2.0 extrasPaths
XQueryXQuery query structureConstructorsFunctions
XPath and XQueryQuerying XML documents
Common properties
I Expression languages designed to query XML documentsI Convenient access to document nodesI Intuitive syntax analogous to filesystem pathsI Comparison and arithmetic operators, functions, etc. . .
XPathUsed within other standards:
I XSLTI XML SchemaI XPointerI DOM
XQueryStandalone standard. Mainapplications:
I XML data access andprocessing
I XML databases
XPath — status
XPath 1.0
I W3C Recommendation, XI 1999I used within XSLT 1.0, XML Schema, XPointer
XPath 2.0I Several W3C Recommendations, I 2007:
I XML Path Language (XPath) 2.0I XQuery 1.0 and XPath 2.0 Data ModelI XQuery 1.0 and XPath 2.0 Functions and OperatorsI XQuery 1.0 and XPath 2.0 Formal Semantics
I Used within XSLT 2.0I Related to XQuery 1.0
XPath and XQuery Data ModelI Theoretical basis of XPath, XSLT, and XQueryI XML document treeI Structures and simple data typesI Basic operations (type conversions etc.)
Differences between 1.0 and 2.0XPath 1.0 XPath 2.0
simple data types boolean,number,string
all XML Schema simpletypes
structures node sets sequences of nodes and sim-ple values
XML document in XPath modelI Document treeI Physical representation level fully expanded
I CDATA, references to characters and entities
I No adjacent text nodesI Namespaces applied and accessibleI XML Schema applied and accessible
! XPath 2.0 “schema aware” processors only
I Attribute nodes as element “properties”! formally, attribute is not child of element! however, element is parent of its attributes
I Root of tree — document node! main element (aka document element) is not the root
XPath node kindsSeven kinds of nodes:
I document node (root)I elementI attributeI text nodeI processing instructionI commentI namespace node
SequencesI Values in XPath 2.0 — sequencesI Sequence consists of zero or more items
I nodesI atomic values
Sequences properties
I Items order and number of occurrence meaningfulI Singleton sequence equivalent to its item:3.14 = (3.14)
I Nested sequences implicitly flattened to canonicalrepresentation:(3.14, (1, 2, 3), ’Alice’) = (3.14, 1, 2, 3, ’Alice’)
Data model in XPath 1.0I Four types:
I booleanI stringI numberI node set
I No collection of simple valuesI Sets (and not sequences) of nodes
Effective Boolean ValueI Treating any value as booleanI Motivation: convenience in condition writing,
e.g. if ( person[@passport] ) ...
Conversion rulesempty sequence → falsesequence starting with node → truesingle boolean value → that valuesingle empty string → falsesingle non-empty string → truesingle number equal to 0 or NaN → falseother single number → trueother value → error
AtomizationI Treating any sequence as sequence of atomic valuesI Motivation: sequences comparison, arithmetic, type casting
Conversion rules (for each item)
atomic value → that value
node of declared atomic type → node value
node of list type → sequence of list elements
node of unknown simple type orone of xs:untypedAtomic,xs:anySimpleType
→ text content as singleitem
node with mixed content → text content as singleitem
node with element content → error
Paths — typical XPath applicationI /company/department/person
I //person
I /company/department[name = ’accountancy’]
I /company/department[@id = ’D07’]/person[3]
I ./surname
I surname
I ../person[position = ’manager’]/surname
But there is much more to learn about XPath :)
Literals and variables
LiteralsI strings: ’12.5’, "He said, ""I don’t like it."""
I numbers: 12, 12.5, 1.13e-8
VariablesI $x — reference to variable x,I Variables introduced through:
I XPath 2.0 (for, some, every)I XQuery (FLWOR, some, every, function parameters)I XSLT 1.0 and 2.0 (variable, param)
Type casting
Type constructors
I xs:date("2010-08-25")
I xs:float("NaN")
I adresy:kod-pocztowy("48-200") (schema aware processing)I string(//obiekt[4]) (valid in XPath 1.0 too)
cast as operator
I "2010-08-25" cast as xs:date
I . . .
FunctionsI Function invocation:
I concat(’Mrs ’, name, ’ ’, surname)I count(//person)I my:fac(12)
I 150 (XPath 2.0) built-in functions:I Custom functions defining:
I XQueryI XSLT 2.0I execution environment
I EXSLT — de-facto standard of additional XPath functions andextension mechanism for XSLT 1.0
Chosen built-in XPath functions
Text
concat(s1, s2, ...) substring(s, pos, len)starts-with(s1, s2) contains(s1, s2)string-length(s) translate(s, t1, t2)
Numbersfloor(x) ceiling(x)round(x) abs(x)
Nodes
name(n?) local-name(n?) namespace-uri(n?)id(s) nilled(n?) document-uri(doc)
Contextcurrent() position()last() current-time()
Sequences
count(S) sum(S) min(S) max(S) avg(S)empty(S) reverse(S) distinct-values(S)
Date and timemonth-from-date(t)adjust-date-to-timezone(t, tz)
OperatorsI 68 operators in XPath 2.0 (after overloading expansion)
Arithmetic
I + - * div idiv mod on numbersI + and - on date/time and duration
Node sets / sequences
I union | intersect except
I not nodes found — type errorI result without repeats, document order preserved
Logical values
I and or
Comparison operators
Atomic comparison (XPath 2.0 only)
I eq ne lt le gt ge
I applied to singletons
General comparison (XPath 1.0 and 2.0)
I = != < <= > >=
I applied to sequencesI XPath 2.0 semantics:
There exists a pair of items, one from each argument sequences, forwhich the corresponding atomic comparison holds.(Argument sequences atomized on entry.)
General comparison — nonobvious behaviour
Equality operator does not check the real equality
(1, 2) = (2, 3) – true(1, 2) != (1, 2) – true
Equality is not transitive
(1, 2) = (2, 3) – true(2, 3) = (3, 4) – true(1, 2) = (3, 4) – false
Inequality is not just equality negation
(1, 2) = (1, 2) – true(1, 2) != (1, 2) – true() = () – false() != () – false
Conditional expression (XPath 2.0)
if CONDITIONthen RESULT1else RESULT2
I Effective Boolean Value of CONDITIONI One branch computed
Example
if details/pricethen
if details/price >= 1000then ’Insured mail’else ’Ordinary mail’
else ’No data’
Iteration through sequence (XPath 2.0)
for $VAR in SEQUENCEreturn RESULT
I VAR assigned subsequent values from SEQUENCEI RESULT computer in context where VAR is assigned current valueI overall result — (flattened) sequence of subsequent partial results
Examples
for $i in (1 to 10)return $i * $i
for $p in //personreturn concat($p/name, ’ ’, $p/surname)
Sequence quantifiers (XPath 2.0)
some $VAR in SEQUENCEsatisfies CONDITION
every $VAR in SEQUENCEsatisfies CONDITION
I Effective Boolean Value of CONDITIONI Lazy evaluation allowedI Arbitrary order of items checking
Examples
some $i in (1 to 10) satisfies $i > 7
every $p in //person satisfies $p/surname
XPath paths
Absolute path/step/step ...
Relative pathstep/step ...
Step — fully expanded syntaxaxis::node-test [predicate1] ...[predicateN]
axis direction in document tree
node-test selecting nodes basing on kind, name, . . .
predicates arbitrary filtering expressions
Example
/descendant::team[attribute::id = ’3’]/child::person[1]/child::surname/child::text()
AxisI child
I descendant
I parent
I ancestor
I following-sibling
I preceding-sibling
I following
I preceding
I attribute
I namespace
I self
I descendant-or-self
I ancestor-or-self
Axis
cite: www.GeorgeHernandez.com
Node test
Kind of node
I node()
I text()
I document-node()
I element()
I attribute()
I processing-instruction(xml-stylesheet)
Kind and name of node
I element(person)
I attribute(id)
I processing-instruction(xml-stylesheet)
Node test (ctd.)
Kind and type of node
! XPath 2.0 schema aware processing
I element(*, studentType)
I element(person, studentType)
I attribute(*, xs:integer)
I attribute(id, xs:integer)
Name of node
! Kind of node default for current axis (element or attribute)
I person
I *
I pre:*
I *:person
PredicatesI Evaluated for each node selected so far
(node becomes the context node).I Every predicate filters result sequence.I Depending on result type:
I number — compared to item position (counted from 1)I not number — Effective Boolean Value used
I Filter expressions — predicates outside paths (XSLT 2.0)
Examples
/child::staff/child::person[child::name = ’Patryk’]
child::person[child::name = ’Patryk’]/child::surname
//person[attribute::passport][3]
(1 to 10)[. mod 2 = 0]
Abbreviated SyntaxI child axis may be omittedI @ before name indicates attribute axisI . instead of self::node()I .. instead of parent::node()I // instead of /descendant-or-self::node()/
Example
.//object[@id = ’E4’]
self::node()/descendant-or-self::node()/child::object[attribute::id = ’E4’]
Evaluation orderI From left to rightI Step by step
I //department/person[1]I (//department/person)[1]
I Predicate by predicateI //person[@manages and position() = 5]I //person[@manages][position() = 5]
XQuery — the query language for XML
Status
I XQuery 1.0 — W3C Recommendation, I 2007I Data model, functions and operators — shared with XPath 2.0I Formally: syntax defined from scratchI Practically: XPath syntax extension
Features
I Picking up data from XML documentsI Constructing new result nodesI Sorting, grouping, . . .I Custom functions definitionI Various output methods (XML, HTML, XHTML, text) — shared
with XSLT
XQuery query structureI Header and bodyI Header consists of declarations:
I version declarationI importI flags and optionsI namespace declarationI variable or query parameterI function
Example
xquery version "1.0" encoding "utf-8";declare namespace foo = "http://example.org";declare variable $id as xs:string external;declare variable $doc := doc("example.xml");
$doc//foo:object[@id = $id]
FLWOR expressionI Acronym of For, Let, Where, Order by, ReturnI Replaces for from XPathI Motivation: SQL SELECT
Example
for $obj in doc("example.xml")/list/objectlet $prev := $obj/preceding-sibling::element()let $prev-name := $prev[1]/@namewhere $obj/@nameorder by $obj/@namereturn<div class="result">Object named {xs:string($obj/@name)}has {count($prev)} predecessors.The nearest predecessor name is{xs:string($prev-name)}.
</div>
Node constructors — direct
XML document fragment within query
for $el in doc("example.xml")//* return<p style="color: blue">I have found an element.<?pi bla Bla ?><!--Comments and PIs also taken to result-->
</p>
Expressions nested within constructors — braces
<result>{for $el in doc("example.xml")//* return<elem depth="{count($el/ancestor::node())}">{name($el)}</elem>
}</result>
Node constructors — computed
Syntax
for $el in doc("example.xml")//* returnelement p {attribute style {"color: blue"},text { "I have found an element."},processing-instruction pi { "bla Bla" }comment { "Comments and PIs also taken to result" }}
Application example — dynamically computed name
<result>{for $el in doc("example.xml")//* returnelement {concat("elem-", name($el))} {attribute depth {count($el/ancestor::node())},text {name($el)}
} }</result>
Custom function definitions
Example
declare functionlocal:twice($x)
{ 2 * $x };
Type declarations example
declare functionlocal:twice($x as xs:double)as xs:double
{ 2 * $x };
Type declarationsI Type declarations possible (not obligatory) for:
I variablesI function arguments and resultI also in XSLT 2.0
I Capabilities:I type nameI node kind | node() | item()I occurrence modifier (?, *, +, exactly one occurrence by default).
I Examples:I xs:doubleI element()I node()*I xs:integer?I item()+