Shortened XPath syntax. Examples of xpath queries to html Using the following axis
XPath is used to navigate through the elements and attributes of an XML document. XPath is one of the core elements in the W3C XSLT standard.
1 What's happened XPath
XPath Expressions
XPath uses path expressions to select individual nodes or a set of nodes in an XML document. These expressions are very similar to the expressions you see when working with a traditional computer file system.
Standard XPath Functions
XPath includes over 100 built-in functions. There are functions for string and numeric values, date and time, node comparison and QName manipulation, sequence management, boolean values, and much more.
XPath is used in XSLT
XPath is one of the core elements in the XSLT standard. Without knowledge of XPath, you will not be able to create XSLT documents.
2 Terminology XPath
Nodes
There are seven types of nodes in XPath: element, attribute, text, namespace, processing instructions, comments, and document nodes. XML documents are processed as trees of nodes. The top element of the tree is called the root element. Look at the following XML document:
Example nodes in the XML document above:
Atomic values
Atomic values are nodes that have no children or parents. Example of atomic values:
J. K. Rowling "en"
Elements
Elements are atomic values or nodes.
3 Relationship nodes
Parent
Each element and attribute has one parent. In the following example, the book element is the parent of the title, author, year, and price elements:
Descendants
Element nodes can have zero, one, or more children. In the following example, the elements "title", "author", "year" and "price" are all children of the book element:
Elements of the same level
These are nodes that have the same parent. In the following example, the elements "title", "author", "year" and "price" are all elements of the same level:
Ancestors
Parent of the node, parent of the parent of the node, etc. In the following example, the ancestors of the title element are the book and bookstore elements:
Descendants
Children of a node, children of children of a node, etc. In the following example, the children of the "bookstore" element are the elements "book", "title", "author", "year", and "price":
4 Syntax XPath
XPath uses path expressions to select nodes or sets of nodes in an XML document. A node can be selected by following a path or by steps. We will use the following XML document in the examples below.
Node selection
By using XPath expressions to select nodes in an XML document, you can select a node by following a path or steps. The most useful path expressions are listed below:
The table below lists some expression paths and the result of executing the expression:
XPath expression | Result |
---|---|
bookstore | Selects all nodes named "bookstore" |
/bookstore | Selects the bookstore root element Note: If a path begins with a slash (/), it is always an absolute path to the element! |
bookstore/book | Selects all "book" elements that are children of the "bookstore" element |
//book | Selects all "book" elements regardless of where they are in the document |
bookstore//book | Selects all "book" elements that are children of the "bookstore" element, regardless of where they are under the "bookstore" element |
//@lang | Selects all attributes that are named "lang" |
Predicates
Predicates are used to find a specific node or a node that contains a specific value. Predicates are always surrounded by square brackets. The table below lists some path expressions with predicates, and the result of the expression:
XPath Expressions | Result |
---|---|
/bookstore/book | Selects the first "book" element, which is a child of the "bookstore" element. Note: In IE 5,6,7,8,9, the first node has an index of , but according to W3C guidelines, it is . To solve this problem in IE, set the "SelectionLanguage" option for XPath: In JavaScript: xml.setProperty("SelectionLanguage", "XPath"); |
/bookstore/book | Selects the last "book" element that is a child of the "bookstore" element |
/bookstore/book | Selects the penultimate "book" element, which is a child of the "bookstore" element |
/bookstore/book | Selecting the first two "book" elements that are children of the "bookstore" element |
//title[@lang] | Selects all "title" elements that have an attribute named "lang" |
//title[@lang="en"] | Selects all "title" elements that have a "language" attribute with a value of "en" |
/bookstore/book | Selects all "book" elements after the "bookstore" element that have a "price" element with a value greater than 35.00 |
/bookstore/book/title | Selects all book "title" elements of the "bookstore" element that have a "price" element with a value greater than 35.00 |
Selecting unknown nodes
XPath special characters can be used to select unknown XML nodes.
In the table below, we have listed some expression paths and expression results:
Selecting Multiple Paths
Using the operator | in XPath expressions you can select multiple paths. The table below lists several path expressions and their results:
5 Axles XPath
We will use the following XML document later in the example.
Axes define sets of nodes, relative to the current node.
Axis name | Result |
---|---|
ancestor | Selects all ancestors (parents, grandparents, etc.) of the current node |
ancestor-or-self | Selects all ancestors (parents, grandparents, etc.) of the current node and the current node itself |
attribute | |
child | |
descendant | Selects all children (children, grandchildren, etc.) of the current node |
descendant-or-self | Selects all children (children, grandchildren, etc.) of the current node and the current node itself |
following | Selects everything in the document after the current node's tag closes |
following-sibling | Selects all nodes of the same level after the current node |
namespace | Selects all nodes in the given namespace of the current node |
parent | Selects the parent of the current node |
preceding | Selects all nodes that appear before the current node in the document, excluding ancestors, attribute nodes, and namespace nodes |
preceding-sibling | Selects all siblings up to the current node |
self | Selects the current node |
6 Expressions sampling paths
The location path can be absolute or relative. An absolute location path begins with a slash (/), but a relative path does not. In both cases, the sampling path consists of one or more steps separated by slashes:
Absolute location path:
/step/step/...
Relative location fetch path:
Step/step/...
Each step is evaluated against the nodes in the current node set. The step consists of:
- axis (defines the tree relationship between the selected nodes and the current node);
- node check (identifies a node within an axis);
- zero or more predicates (to further refine the selected set of nodes)
The fetch step syntax is:
Axisname::nodetestAxisname::nodetest[predictor]
Example | Result |
---|---|
child::book | Selects all book nodes that are children of the current node |
attribute::lang | Selects the language attribute (lang) of the current node |
child::* | Selects all children of the current node |
attribute::* | Selects all attributes of the current node |
child::text() | Selects all text nodes of the current node |
child::node() | Selects all immediate children of the current node |
descendant::book | Selects all children of the current node |
ancestor::book | Selects all ancestors of the "books" of the current node |
ancestor-or-self::book | Selects all book ancestors of the current node - and the current node if it is also a book |
child::*/child::price | Selects all children of "price" one level away from the current node |
7 Operators XPath
XPath expressions return as a set of nodes, strings, booleans, or numeric values. Below is a list of operators used in XPath expressions:
Operator | Description | Example |
---|---|---|
| | Computes two sets of nodes | //book | //cd |
+ | Addition | 6 + 4 |
- | Subtraction | 6 - 4 |
* | Multiplication | 6 * 4 |
div | Division | 8 div 4 |
= | Equality | price=9.80 |
!= | Inequality | price!=9.80 |
< | Less than | price<9.80 |
<= | Less or equal | price≤9.80 |
> | More than | price>9.80 |
>= | More or equal | price≤9.80 |
or | Or | price=9.80 or price=9.70 |
and | AND | price>9.00 and price<9.90 |
mod | Remainder of the division | 5 mod 2 |
8 Examples XPath
Let's walk through the basic XPath syntax with a few examples. We will use the following XML document "books.xml" in the examples below:
Loading an XML document
Use XMLHttpRequest to download XML documents, which is supported by most modern browsers:
Var xmlhttp=new XMLHttpRequest()
Code for legacy Microsoft browsers (IE 5 and 6):
Var xmlhttp=new ActiveXObject("Microsoft.XMLHTTP")
Node selection
Unfortunately, XPath may work differently in Internet Explorer than in other browsers. In our examples we will use code that should work in most browsers. Internet Explorer uses the "selectNodes()" method to select nodes in an XML document:
XmlDoc.selectNodes(xpath);
Firefox, Chrome, Opera and Safari use the evaluate() method to select nodes from an XML document:
XmlDoc.evaluate(xpath, xmlDoc, null, XPathResult.ANY_TYPE, null);
Select all titles
The following example selects all header nodes:
/bookstore/book/title
Choosing the title of the first book
The following example selects the title of the first "book" node after the "bookstore" element:
/bookstore/book/title
Select all prices
The following example selects the text of all price nodes:
/bookstore/book/price
Selects nodes with price >35
The following example selects all nodes with prices above 35:
/bookstore/book/price
Selecting header nodes with price >35
The following example selects all title nodes with a price greater than 35:
/bookstore/book/title
Xpath is a query language for xml or xhtml document elements. Just like SQL, xpath is a declarative query language. To obtain the data of interest, you just need to create a query that describes this data. The xpath language interpreter will do all the dirty work for you.
Very convenient, isn't it? Let's see what capabilities xpath offers for accessing web page nodes.
Creating a request to web page nodes
I bring to your attention a small laboratory work, during which I will demonstrate the creation of xpath requests to a web page. You will be able to repeat the requests I gave and, most importantly, try to fulfill your own. I hope that thanks to this, the article will be equally interesting to beginners and programmers familiar with xpath to xml.For the laboratory we will need:
- xhtml web page;
- Mozilla Firefox browser with add-ons;
- firebug;
- firePath ;
(you can use any other browser with visual xpath support)
- a little time.
As a web page for conducting an experiment, I propose the main page of the World Wide Web Consortium website ("http://w3.org"). It is this organization that develops the xquery(xpath) languages, the xhtml specification and many other Internet standards.
Task
Retrieve information about consortium conferences from the xhtml code of the w3.org main page using xpath queries.Let's start writing xpath queries.
First Xpath request
Open the Firepath tab in FireBug, select the element to be analyzed with the selector, click: Firepath has created an xpath request for the selected element.If you selected the title of the first event, then the request will be like this:
After removing unnecessary indexes, the query will match all elements of the header type.
Firepath highlights elements that match the query. You can see in real time which document nodes match the query.
Request for information about conference venues:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p
This is how we get a list of sponsors:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p
xpath syntax
Let's go back to the queries we created and understand how they are structured.Let's consider the first request in detail
In this query I have divided three parts to demonstrate the capabilities of xpath. (The division into parts is tricky)
First part
.//
- recursive descent to zero or more levels of hierarchy from the current context. In our case, the current context is the document root
Second part
*
- any element,
[@id="w3c_home_upcoming_events"]– a predicate on the basis of which we search for a node that has an id attribute equal to “w3c_home_upcoming_events”. XHTML element IDs must be unique. Therefore, the query “any element with a specific ID” should return the only node we are looking for.
We can replace *
to the exact node name div in this request
div[@id="w3c_home_upcoming_events"]
Thus, we go down the document tree to the div[@id="w3c_home_upcoming_events"] node we need. We do not care at all what nodes the DOM tree consists of and how many levels of hierarchy remain above.
The third part
/ul/li/div/p/a–xpath is the path to a specific element. The path consists of addressing steps and node checking conditions (ul, li, etc.). Steps are separated by a "/" (slash) character.
xpath collections
It is not always possible to access the node of interest using a predicate or addressing steps. Very often there are many nodes of the same type at one hierarchy level and it is necessary to select “only the first” or “only the second” nodes. Collections are provided for such cases.xpath collections allow you to access an element by its index. The indexes correspond to the order in which the elements were presented in the original document. The serial number in collections is counted from one.
Based on the fact that “venue” is always the second paragraph after “conference name”, we get the following query:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p
Where p is the second element in the set for each node in the list /ul/li/div.
Similarly, we can get a list of sponsors with the request:
.//*[@id="w3c_home_upcoming_events"]/ul/li/div/p
Some xpath functions
There are many functions in xpath for working with elements within a collection. I will give only a few of them.last():
Returns the last element of the collection.
Query ul/li/div/p - will return the last paragraphs for each "ul" list node.
The first() function is not provided. To access the first element, use index "1".
text():
Returns the test content of an element.
.//a – we get all links with the text “Archive”.
position() and mod:
position() - returns the position of an element in a set.
mod is the remainder of the division.
By combining these functions we can get:
- not even elements ul/li
- even elements: ul/li
Comparison Operations
- < - логическое «меньше»
- > - logical “greater than”
- <= - логическое «меньше либо равно»
- >= - logical “greater than or equal”
On one's own
Try to get:- even URL links from the left menu “Standards”;
- headers of all news, except the first one from the main page of w3c.org.
Xpath in PHP5
$dom = new DomDocument(); $dom->loadHTML($HTMLCode); $xpath = new DomXPath($dom); $_res = $xpath->query(".//*[@id="w3c_home_upcoming_events"]/ul/li/div/p/a"); foreach($_res => $obj) ( echo "URL: ".$obj->getAttribute("href"); echo $obj->nodeValue; )Finally
Using a simple example, we saw the capabilities of xpath for accessing web page nodes.Xpath is the industry standard for accessing xml elements and xhtml, xslt transformations.
You can use it to parse any html page. If the source html code contains significant errors in the markup, run it through
Today we will take a closer look at the topic of using XPath with PHP. You'll see in the examples how XPath significantly reduces the amount of code. Let's look at using queries and functions in XPath.
At the beginning, I will provide you with two types of documents: DTD and XML, using which we will look at the functioning of PHP DOM XPath. Here's what they look like:
Basic XPath queries
The simple XPath syntax allows you to access elements in an XML document. In the simplest way, you can specify the path to the desired element. Using the XML document provided above, the following XPath query will return the collection of the current elements found in the book element:
//library/book
Like this! Two forward slashes define the root element of the document, and one forward slash transitions to the book child element. It's simple and fast, isn't it?
But what if you want to select a specific book element from a set? Let's assume that you want books by a "Certain Author". The XPath query for this would be:
//library/book/author/..
you can use text() in square brackets to compare the node value. Also «/..» means we want to use the parent element (i.e. go back one node above).
XPath queries are made using one or two functions: query() And evaluate(). Both form a request, but the difference is in the result returned. query() will always return DOMNodeList, Unlike evaluate() will return a text result if possible. For example, if your XPath query would return the number of books written by a particular author, then query() would return an empty DOMNodeList, evaluate() would simply return a number, you could use this directly to retrieve the data from the node.
XPath Code and Speed Benefits
Let's look at a simple example that will return the number of books written by a specific author. We'll look at the first method the way we always do, without using XPath. Now you'll understand how to do this without XPath and how much easier it is to do it with XPath.
domDocument->getElementsByTagName("author"); foreach ($elements as $element) ( if ($element->nodeValue == $author) ( $total++; ) ) return $number; )
The next method returns the same result, but uses XPath to select those books that are written by a specific author.
domDocument); $result = $xpath->query($query); return $result->length; )
Note that we don't need to recheck the value of each element to determine which author wrote each book. But we can simplify the code more using the XPath function count() to count the contents of the elements in this path.
domDocument); return $xpath->evaluate($query); )
We can get the information we need with a single line XPath query. There is no need to create many PHP filters. This is the easiest and fastest way to write this functionality!
Note that evaluate() was used in the last example. This is because the count() function returns a text result. Using query() will return a DOMNodeList, but it will be empty.
XPath is worth using because not only does it make your PHP code simpler, it also offers a speed benefit. I noticed that the first version was 30% faster on average compared to the second. But the third is 10% faster than the first. Of course, this depends on your server and the queries you are using. Using XPath in its pure form gives the greatest results in speed and ease of writing code.
XPath Functions
Here are a few functions that can be used with XPath. You'll also find plenty of resources that go into detail about each available feature. If you need to calculate DOMNodeList or compare nodeValue (node value), you can find a suitable XPath function that eliminates the use of unnecessary PHP code.
You already know this from the example of the count() function. Let's use the id() function to get the titles of books with the given ISBNs. To do this you need to use the following XPath expression:
id("isbn1234 isbn1235")/title
Note that the values you are looking for should not be enclosed in parentheses, just separated by spaces. Also, don't even think about adding a comma:
domDocument); $result = $xpath->query($query); $books = array(); foreach ($result as $node) ( $book = array("title" => $booknode->nodeValue); $books = $book; ) return $books; )
Handling complex functions in XPath is incredibly simple.
Using PHP functions with XPath
Sometimes you will need more functionality that the standard XPath functions cannot provide. Fortunately, the PHP DOM allows native PHP functions to interact with XPath queries.
Let's look at an example that returns the number of words in a book title. In this simplest function, we will write the following:
domDocument); $result = $xpath->query($query); $title = $result->item(0)->getElementsByTagName("title") ->item(0)->nodeValue; return str_word_count($title); )
But, we can also include the str_word_count() function directly in the XPath request. This can be done in a few steps. First of all, we need to register namespace with an XPath object. PHP functions in XPath requests are called using the string “php:functionString”, after which the name of the desired function is written. Also, namespace is discussed in more detail at http://php.net/xpath. Other namespace values will throw an error. After this we need to call registerPHPFunctions(). This function tells PHP that when a call is made through the namespace "php:", PHP will handle this call.
An example syntax for calling functions would be:
php:functionString("nameoffunction", arg, arg...)
Let's put it all together in the following getNumberOfWords() function example:
domDocument); //register php namespace $xpath->registerNamespace("php", "http://php.net/xpath"); //now php functions can be called in xpath requests $xpath->registerPHPFunctions(); $query = "php:functionString("str_word_count",(//library/book[@isbn = "$isbn"]/title))"; return $xpath->evaluate($query); )
Note that you don't need to call the XPath function text() to get the node's text. The registerPHPFunctions() method makes this automatic. Although, the following example line of code will also be valid:
php:functionString("str_word_count",(//library/book[@isbn = "$isbn"]/title))
Registering PHP functions is not limited to functions that are included in PHP. You can define your own functions and use them inside XPath. The only difference is that you will have to use "php:function" instead of "php:functionString".
Let's write a function outside of the class to demonstrate the basic functionality. The function we will use returns the books by the author "George Orwell". It should return true for each node you want to include in the query.
nodeValue == "George Orwell"; } !}
The argument that is passed to the function is an array of DOM elements. This function goes through the array and determines the necessary elements, and then includes them in the DOMNodeList. In this example, the node being tested was /book, and we also used /author to determine the required elements.
Now we can create the getGeorgeOrwellBooks() function:
domDocument); $xpath->registerNamespace("php", "http://php.net/xpath"); $xpath->registerPHPFunctions(); $query = "//library/book1"; $result = $xpath->query($query); $books = array(); foreach($result as $node) ( $books = $node->getElementsByTagName("title") ->item(0)->nodeValue; ) return $books; )
If the compare() function is static, then you need to amend the XPath query:
//library/book
To be honest, all this functionality could have been implemented using pure XPath code. But the example shows how you can expand XPath queries and make them more complex.
In conclusion
XPath is a great way to reduce the amount of code and increase its processing when working with XML. Additional PHP DOM functionality allows you to extend XPath functions. This is a really useful thing, if you use it and delve into the specifics, you will have to write less and less code.
XPath uses path expressions to select nodes in an XML document or a set of nodes. By node along the path (path) or stage (steps) to the selected one.
instance XML documents
We will use this XML document in the examples below.
Select node
XPath uses path expressions to select nodes in an XML document. Or along the path through a node to select a step. The following are the most useful path expressions:
In the table below we show some of the expression paths and the result of the expression:
expression Path | result |
---|---|
book Shop | Select all child nodes of the book element. |
/ Book Shop | Select the root element bookstore. Note: If a path begins with a slash (/), the path is always representative of the absolute path of the element! |
bookstore/book | Select the subitems belonging to the bookstore all bookitems. |
// Book | Select all book subelements, regardless of their position in the document. |
bookstore // book | Selects all book elements that are not descendants of the element's bookstore, regardless of their position in and under the bookstore. |
// @Lang | Select all properties named Lang. |
Predicate(s)
A predicate is used to find a specific node or node that contains the value specified.
The predicate is enclosed in square brackets.
In the table below, we have listed some path expressions with predicates and the result of the expression:
expression Path | result |
---|---|
/ Bookstore / book | Select the subelements that belong to the first workbook element. |
/Bookstore/book [last()] | Select the sub-elements that belong to the book last book element. |
/ Bookstore / book [last () - 1] | Select the sub-items related to the mutual bookstore of the second book item. |
/Bookstore/book [position()<3] | Select the first two book elements of the book element belonging to the subelements. |
// Title [@lang] | Select all attribute named Lang has an element title. |
// Title [@ LANG = "eng"] | Select all the element names and these elements have the English attribute value. |
/bookstore/book | Select all book elements of the book element, and the value of the price element, which must be greater than 35.00. |
/bookstore/book/title | Select all book element header elements of the book element and in which the value of the price element must be greater than 35.00. |
Select unknown nodes
XPath wildcards can be used to select unknown XML elements.
In the table below, we have listed some path expressions as well as the results of these expressions:
Choose multiple paths
Using the path expression "|" operator, you can choose several paths.
In the table below, we have listed some path expressions as well as the results of these expressions.