Shortened XPath syntax. Examples of xpath queries to html Using the following axis

XPath is used to navigate through the elements and attributes of an XML document. XPath is one of the core elements in the W3C XSLT standard.

1 What's happened XPath

XPath Expressions

XPath uses path expressions to select individual nodes or a set of nodes in an XML document. These expressions are very similar to the expressions you see when working with a traditional computer file system.

Standard XPath Functions

XPath includes over 100 built-in functions. There are functions for string and numeric values, date and time, node comparison and QName manipulation, sequence management, boolean values, and much more.

XPath is used in XSLT

XPath is one of the core elements in the XSLT standard. Without knowledge of XPath, you will not be able to create XSLT documents.

2 Terminology XPath

Nodes

There are seven types of nodes in XPath: element, attribute, text, namespace, processing instructions, comments, and document nodes. XML documents are processed as trees of nodes. The top element of the tree is called the root element. Look at the following XML document:

Harry Potter J. K. Rowling 2005 29.99

Example nodes in the XML document above:

(root element) J. K. Rowling (node) lang="en"(attribute)

Atomic values

Atomic values ​​are nodes that have no children or parents. Example of atomic values:

J. K. Rowling "en"

Elements

Elements are atomic values ​​or nodes.

3 Relationship nodes

Parent

Each element and attribute has one parent. In the following example, the book element is the parent of the title, author, year, and price elements:

Harry Potter J K Rowling 2005 29.99

Descendants

Element nodes can have zero, one, or more children. In the following example, the elements "title", "author", "year" and "price" are all children of the book element:

Harry Potter J K Rowling 2005 29.99

Elements of the same level

These are nodes that have the same parent. In the following example, the elements "title", "author", "year" and "price" are all elements of the same level:

Harry Potter J K Rowling 2005 29.99

Ancestors

Parent of the node, parent of the parent of the node, etc. In the following example, the ancestors of the title element are the book and bookstore elements:

Harry Potter J K Rowling 2005 29.99

Descendants

Children of a node, children of children of a node, etc. In the following example, the children of the "bookstore" element are the elements "book", "title", "author", "year", and "price":

Harry Potter J K Rowling 2005 29.99

4 Syntax XPath

XPath uses path expressions to select nodes or sets of nodes in an XML document. A node can be selected by following a path or by steps. We will use the following XML document in the examples below.

Harry Potter 29.99 Learning XML 39.95

Node selection

By using XPath expressions to select nodes in an XML document, you can select a node by following a path or steps. The most useful path expressions are listed below:

The table below lists some expression paths and the result of executing the expression:

XPath expression Result
bookstore Selects all nodes named "bookstore"
/bookstore Selects the bookstore root element

Note: If a path begins with a slash (/), it is always an absolute path to the element!

bookstore/book Selects all "book" elements that are children of the "bookstore" element
//book Selects all "book" elements regardless of where they are in the document
bookstore//book Selects all "book" elements that are children of the "bookstore" element, regardless of where they are under the "bookstore" element
//@lang Selects all attributes that are named "lang"

Predicates

Predicates are used to find a specific node or a node that contains a specific value. Predicates are always surrounded by square brackets. The table below lists some path expressions with predicates, and the result of the expression:

XPath Expressions Result
/bookstore/book Selects the first "book" element, which is a child of the "bookstore" element.

Note: In IE 5,6,7,8,9, the first node has an index of , but according to W3C guidelines, it is . To solve this problem in IE, set the "SelectionLanguage" option for XPath:

In JavaScript: xml.setProperty("SelectionLanguage", "XPath");
/bookstore/book Selects the last "book" element that is a child of the "bookstore" element
/bookstore/book Selects the penultimate "book" element, which is a child of the "bookstore" element
/bookstore/book Selecting the first two "book" elements that are children of the "bookstore" element
//title[@lang] Selects all "title" elements that have an attribute named "lang"
//title[@lang="en"] Selects all "title" elements that have a "language" attribute with a value of "en"
/bookstore/book Selects all "book" elements after the "bookstore" element that have a "price" element with a value greater than 35.00
/bookstore/book/title Selects all book "title" elements of the "bookstore" element that have a "price" element with a value greater than 35.00

Selecting unknown nodes

XPath special characters can be used to select unknown XML nodes.

In the table below, we have listed some expression paths and expression results:

Selecting Multiple Paths

Using the operator | in XPath expressions you can select multiple paths. The table below lists several path expressions and their results:

5 Axles XPath

We will use the following XML document later in the example.

Harry Potter 29.99 Learning XML 39.95

Axes define sets of nodes, relative to the current node.

Axis name Result
ancestor Selects all ancestors (parents, grandparents, etc.) of the current node
ancestor-or-self Selects all ancestors (parents, grandparents, etc.) of the current node and the current node itself
attribute
child
descendant Selects all children (children, grandchildren, etc.) of the current node
descendant-or-self Selects all children (children, grandchildren, etc.) of the current node and the current node itself
following Selects everything in the document after the current node's tag closes
following-sibling Selects all nodes of the same level after the current node
namespace Selects all nodes in the given namespace of the current node
parent Selects the parent of the current node
preceding Selects all nodes that appear before the current node in the document, excluding ancestors, attribute nodes, and namespace nodes
preceding-sibling Selects all siblings up to the current node
self Selects the current node

6 Expressions sampling paths

The location path can be absolute or relative. An absolute location path begins with a slash (/), but a relative path does not. In both cases, the sampling path consists of one or more steps separated by slashes:

Absolute location path:

/step/step/...

Relative location fetch path:

Step/step/...

Each step is evaluated against the nodes in the current node set. The step consists of:

  • axis (defines the tree relationship between the selected nodes and the current node);
  • node check (identifies a node within an axis);
  • zero or more predicates (to further refine the selected set of nodes)

The fetch step syntax is:

Axisname::nodetestAxisname::nodetest[predictor]

Example Result
child::book Selects all book nodes that are children of the current node
attribute::lang Selects the language attribute (lang) of the current node
child::* Selects all children of the current node
attribute::* Selects all attributes of the current node
child::text() Selects all text nodes of the current node
child::node() Selects all immediate children of the current node
descendant::book Selects all children of the current node
ancestor::book Selects all ancestors of the "books" of the current node
ancestor-or-self::book Selects all book ancestors of the current node - and the current node if it is also a book
child::*/child::price Selects all children of "price" one level away from the current node

7 Operators XPath

XPath expressions return as a set of nodes, strings, booleans, or numeric values. Below is a list of operators used in XPath expressions:

Operator Description Example
| Computes two sets of nodes //book | //cd
+ Addition 6 + 4
- Subtraction 6 - 4
* Multiplication 6 * 4
div Division 8 div 4
= Equality price=9.80
!= Inequality price!=9.80
< Less than price<9.80
<= Less or equal price≤9.80
> More than price>9.80
>= More or equal price≤9.80
or Or price=9.80 or price=9.70
and AND price>9.00 and price<9.90
mod Remainder of the division 5 mod 2

8 Examples XPath

Let's walk through the basic XPath syntax with a few examples. We will use the following XML document "books.xml" in the examples below:

Everyday Italian Giada De Laurentiis 2005 30.00 Harry Potter J K Rowling 2005 29.99 XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan 2003 49.99 Learning XML Erik T. Ray 2003 39.95

Loading an XML document

Use XMLHttpRequest to download XML documents, which is supported by most modern browsers:

Var xmlhttp=new XMLHttpRequest()

Code for legacy Microsoft browsers (IE 5 and 6):

Var xmlhttp=new ActiveXObject("Microsoft.XMLHTTP")

Node selection

Unfortunately, XPath may work differently in Internet Explorer than in other browsers. In our examples we will use code that should work in most browsers. Internet Explorer uses the "selectNodes()" method to select nodes in an XML document:

XmlDoc.selectNodes(xpath);

Firefox, Chrome, Opera and Safari use the evaluate() method to select nodes from an XML document:

XmlDoc.evaluate(xpath, xmlDoc, null, XPathResult.ANY_TYPE, null);

Select all titles

The following example selects all header nodes:

/bookstore/book/title

Choosing the title of the first book

The following example selects the title of the first "book" node after the "bookstore" element:

/bookstore/book/title

Select all prices

The following example selects the text of all price nodes:

/bookstore/book/price

Selects nodes with price >35

The following example selects all nodes with prices above 35:

/bookstore/book/price

Selecting header nodes with price >35

The following example selects all title nodes with a price greater than 35:

/bookstore/book/title

Xpath is a query language for xml or xhtml document elements. Just like SQL, xpath is a declarative query language. To obtain the data of interest, you just need to create a query that describes this data. The xpath language interpreter will do all the dirty work for you.
Very convenient, isn't it? Let's see what capabilities xpath offers for accessing web page nodes.

Creating a request to web page nodes

I bring to your attention a small laboratory work, during which I will demonstrate the creation of xpath requests to a web page. You will be able to repeat the requests I gave and, most importantly, try to fulfill your own. I hope that thanks to this, the article will be equally interesting to beginners and programmers familiar with xpath to xml.

For the laboratory we will need:
- xhtml web page;
- Mozilla Firefox browser with add-ons;
- firebug;
- firePath ;
(you can use any other browser with visual xpath support)
- a little time.

As a web page for conducting an experiment, I propose the main page of the World Wide Web Consortium website ("http://w3.org"). It is this organization that develops the xquery(xpath) languages, the xhtml specification and many other Internet standards.

Task
Retrieve information about consortium conferences from the xhtml code of the w3.org main page using xpath queries.
Let's start writing xpath queries.
First Xpath request
Open the Firepath tab in FireBug, select the element to be analyzed with the selector, click: Firepath has created an xpath request for the selected element.

If you selected the title of the first event, then the request will be like this:

After removing unnecessary indexes, the query will match all elements of the header type.

Firepath highlights elements that match the query. You can see in real time which document nodes match the query.

Request for information about conference venues:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p

This is how we get a list of sponsors:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p

xpath syntax

Let's go back to the queries we created and understand how they are structured.
Let's consider the first request in detail

In this query I have divided three parts to demonstrate the capabilities of xpath. (The division into parts is tricky)

First part
.// - recursive descent to zero or more levels of hierarchy from the current context. In our case, the current context is the document root

Second part
* - any element,
[@id="w3c_home_upcoming_events"]– a predicate on the basis of which we search for a node that has an id attribute equal to “w3c_home_upcoming_events”. XHTML element IDs must be unique. Therefore, the query “any element with a specific ID” should return the only node we are looking for.

We can replace * to the exact node name div in this request
div[@id="w3c_home_upcoming_events"]

Thus, we go down the document tree to the div[@id="w3c_home_upcoming_events"] node we need. We do not care at all what nodes the DOM tree consists of and how many levels of hierarchy remain above.

The third part
/ul/li/div/p/a–xpath is the path to a specific element. The path consists of addressing steps and node checking conditions (ul, li, etc.). Steps are separated by a "/" (slash) character.

xpath collections
It is not always possible to access the node of interest using a predicate or addressing steps. Very often there are many nodes of the same type at one hierarchy level and it is necessary to select “only the first” or “only the second” nodes. Collections are provided for such cases.

xpath collections allow you to access an element by its index. The indexes correspond to the order in which the elements were presented in the original document. The serial number in collections is counted from one.

Based on the fact that “venue” is always the second paragraph after “conference name”, we get the following query:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p
Where p is the second element in the set for each node in the list /ul/li/div.

Similarly, we can get a list of sponsors with the request:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p

Some xpath functions
There are many functions in xpath for working with elements within a collection. I will give only a few of them.

last():
Returns the last element of the collection.
Query ul/li/div/p - will return the last paragraphs for each "ul" list node.
The first() function is not provided. To access the first element, use index "1".

text():
Returns the test content of an element.
.//a – we get all links with the text “Archive”.

position() and mod:
position() - returns the position of an element in a set.
mod is the remainder of the division.

By combining these functions we can get:
- not even elements ul/li
- even elements: ul/li

Comparison Operations

  • < - логическое «меньше»
  • > - logical “greater than”
  • <= - логическое «меньше либо равно»
  • >= - logical “greater than or equal”
ul/li , ul/li - list elements starting from the 3rd number and vice versa.

On one's own

Try to get:
- even URL links from the left menu “Standards”;
- headers of all news, except the first one from the main page of w3c.org.

Xpath in PHP5

$dom = new DomDocument(); $dom->loadHTML($HTMLCode); $xpath = new DomXPath($dom); $_res = $xpath->query(".//*[@id="w3c_home_upcoming_events"]/ul/li/div/p/a"); foreach($_res => $obj) ( echo "URL: ".$obj->getAttribute("href"); echo $obj->nodeValue; )

Finally

Using a simple example, we saw the capabilities of xpath for accessing web page nodes.
Xpath is the industry standard for accessing xml elements and xhtml, xslt transformations.
You can use it to parse any html page. If the source html code contains significant errors in the markup, run it through

Today we will take a closer look at the topic of using XPath with PHP. You'll see in the examples how XPath significantly reduces the amount of code. Let's look at using queries and functions in XPath.

At the beginning, I will provide you with two types of documents: DTD and XML, using which we will look at the functioning of PHP DOM XPath. Here's what they look like:

A Book An Author Horror chapter one Another Book Another Author Science Fiction chapter one

Basic XPath queries

The simple XPath syntax allows you to access elements in an XML document. In the simplest way, you can specify the path to the desired element. Using the XML document provided above, the following XPath query will return the collection of the current elements found in the book element:

//library/book

Like this! Two forward slashes define the root element of the document, and one forward slash transitions to the book child element. It's simple and fast, isn't it?

But what if you want to select a specific book element from a set? Let's assume that you want books by a "Certain Author". The XPath query for this would be:

//library/book/author/..

you can use text() in square brackets to compare the node value. Also «/..» means we want to use the parent element (i.e. go back one node above).

XPath queries are made using one or two functions: query() And evaluate(). Both form a request, but the difference is in the result returned. query() will always return DOMNodeList, Unlike evaluate() will return a text result if possible. For example, if your XPath query would return the number of books written by a particular author, then query() would return an empty DOMNodeList, evaluate() would simply return a number, you could use this directly to retrieve the data from the node.

XPath Code and Speed ​​Benefits

Let's look at a simple example that will return the number of books written by a specific author. We'll look at the first method the way we always do, without using XPath. Now you'll understand how to do this without XPath and how much easier it is to do it with XPath.

domDocument->getElementsByTagName("author"); foreach ($elements as $element) ( if ($element->nodeValue == $author) ( $total++; ) ) return $number; )

The next method returns the same result, but uses XPath to select those books that are written by a specific author.

domDocument); $result = $xpath->query($query); return $result->length; )

Note that we don't need to recheck the value of each element to determine which author wrote each book. But we can simplify the code more using the XPath function count() to count the contents of the elements in this path.

domDocument); return $xpath->evaluate($query); )

We can get the information we need with a single line XPath query. There is no need to create many PHP filters. This is the easiest and fastest way to write this functionality!

Note that evaluate() was used in the last example. This is because the count() function returns a text result. Using query() will return a DOMNodeList, but it will be empty.

XPath is worth using because not only does it make your PHP code simpler, it also offers a speed benefit. I noticed that the first version was 30% faster on average compared to the second. But the third is 10% faster than the first. Of course, this depends on your server and the queries you are using. Using XPath in its pure form gives the greatest results in speed and ease of writing code.

XPath Functions

Here are a few functions that can be used with XPath. You'll also find plenty of resources that go into detail about each available feature. If you need to calculate DOMNodeList or compare nodeValue (node ​​value), you can find a suitable XPath function that eliminates the use of unnecessary PHP code.

You already know this from the example of the count() function. Let's use the id() function to get the titles of books with the given ISBNs. To do this you need to use the following XPath expression:

id("isbn1234 isbn1235")/title

Note that the values ​​you are looking for should not be enclosed in parentheses, just separated by spaces. Also, don't even think about adding a comma:

domDocument); $result = $xpath->query($query); $books = array(); foreach ($result as $node) ( $book = array("title" => $booknode->nodeValue); $books = $book; ) return $books; )

Handling complex functions in XPath is incredibly simple.

Using PHP functions with XPath

Sometimes you will need more functionality that the standard XPath functions cannot provide. Fortunately, the PHP DOM allows native PHP functions to interact with XPath queries.

Let's look at an example that returns the number of words in a book title. In this simplest function, we will write the following:

domDocument); $result = $xpath->query($query); $title = $result->item(0)->getElementsByTagName("title") ->item(0)->nodeValue; return str_word_count($title); )

But, we can also include the str_word_count() function directly in the XPath request. This can be done in a few steps. First of all, we need to register namespace with an XPath object. PHP functions in XPath requests are called using the string “php:functionString”, after which the name of the desired function is written. Also, namespace is discussed in more detail at http://php.net/xpath. Other namespace values ​​will throw an error. After this we need to call registerPHPFunctions(). This function tells PHP that when a call is made through the namespace "php:", PHP will handle this call.

An example syntax for calling functions would be:

php:functionString("nameoffunction", arg, arg...)

Let's put it all together in the following getNumberOfWords() function example:

domDocument); //register php namespace $xpath->registerNamespace("php", "http://php.net/xpath"); //now php functions can be called in xpath requests $xpath->registerPHPFunctions(); $query = "php:functionString("str_word_count",(//library/book[@isbn = "$isbn"]/title))"; return $xpath->evaluate($query); )

Note that you don't need to call the XPath function text() to get the node's text. The registerPHPFunctions() method makes this automatic. Although, the following example line of code will also be valid:

php:functionString("str_word_count",(//library/book[@isbn = "$isbn"]/title))

Registering PHP functions is not limited to functions that are included in PHP. You can define your own functions and use them inside XPath. The only difference is that you will have to use "php:function" instead of "php:functionString".

Let's write a function outside of the class to demonstrate the basic functionality. The function we will use returns the books by the author "George Orwell". It should return true for each node you want to include in the query.

nodeValue == "George Orwell"; } !}

The argument that is passed to the function is an array of DOM elements. This function goes through the array and determines the necessary elements, and then includes them in the DOMNodeList. In this example, the node being tested was /book, and we also used /author to determine the required elements.

Now we can create the getGeorgeOrwellBooks() function:

domDocument); $xpath->registerNamespace("php", "http://php.net/xpath"); $xpath->registerPHPFunctions(); $query = "//library/book1"; $result = $xpath->query($query); $books = array(); foreach($result as $node) ( $books = $node->getElementsByTagName("title") ->item(0)->nodeValue; ) return $books; )

If the compare() function is static, then you need to amend the XPath query:

//library/book

To be honest, all this functionality could have been implemented using pure XPath code. But the example shows how you can expand XPath queries and make them more complex.

In conclusion

XPath is a great way to reduce the amount of code and increase its processing when working with XML. Additional PHP DOM functionality allows you to extend XPath functions. This is a really useful thing, if you use it and delve into the specifics, you will have to write less and less code.


XPath uses path expressions to select nodes in an XML document or a set of nodes. By node along the path (path) or stage (steps) to the selected one.

instance XML documents

We will use this XML document in the examples below.


Harry Potter
29.99


Learning XML
39.95

Select node

XPath uses path expressions to select nodes in an XML document. Or along the path through a node to select a step. The following are the most useful path expressions:

In the table below we show some of the expression paths and the result of the expression:

expression Pathresult
book ShopSelect all child nodes of the book element.
/ Book Shop

Select the root element bookstore.

Note: If a path begins with a slash (/), the path is always representative of the absolute path of the element!

bookstore/bookSelect the subitems belonging to the bookstore all bookitems.
// BookSelect all book subelements, regardless of their position in the document.
bookstore // bookSelects all book elements that are not descendants of the element's bookstore, regardless of their position in and under the bookstore.
// @LangSelect all properties named Lang.

Predicate(s)

A predicate is used to find a specific node or node that contains the value specified.

The predicate is enclosed in square brackets.

In the table below, we have listed some path expressions with predicates and the result of the expression:

expression Pathresult
/ Bookstore / bookSelect the subelements that belong to the first workbook element.
/Bookstore/book [last()]Select the sub-elements that belong to the book last book element.
/ Bookstore / book [last () - 1]Select the sub-items related to the mutual bookstore of the second book item.
/Bookstore/book [position()<3] Select the first two book elements of the book element belonging to the subelements.
// Title [@lang]Select all attribute named Lang has an element title.
// Title [@ LANG = "eng"]Select all the element names and these elements have the English attribute value.
/bookstore/bookSelect all book elements of the book element, and the value of the price element, which must be greater than 35.00.
/bookstore/book/titleSelect all book element header elements of the book element and in which the value of the price element must be greater than 35.00.

Select unknown nodes

XPath wildcards can be used to select unknown XML elements.

In the table below, we have listed some path expressions as well as the results of these expressions:

Choose multiple paths

Using the path expression "|" operator, you can choose several paths.

In the table below, we have listed some path expressions as well as the results of these expressions.