Transforming XML: XML Pipelines
XML transformations refer to the process of converting XML data from one format to another, typically for the purpose of presentation or integration with other systems. This transformation can involve converting XML into various formats such as HTML, CSV, XML itself (with different structure), or other textual formats.
These transformations enable us to select, filter, sort, and reorganize XML data according to specific requirements. This is crucial for extracting relevant information from large XML documents.
Transformations also allow XML data to be converted into other formats, making it more versatile and usable across different systems and applications.
XML is primarily designed for storing and transporting structured data, but it's not inherently formatted for human readability or presentation. Transformations allow us to convert XML into more user-friendly formats like HTML for web display
The two primary languages for performing XML transformations are:
XSLT (eXtensible Stylesheet Language Transformations):
XSLT is designed specifically for transforming XML documents into other formats like HTML, XML, or plain text.
It operates based on a set of rules defined in an XSLT stylesheet, which specifies how elements and attributes in the XML document should be transformed into the desired output format.
XSLT uses XPath to navigate through the XML structure and apply transformation rules defined in templates.
XQuery:
XQuery is a query and functional programming language designed for querying and manipulating XML data.
It can also be used for transforming XML data into different formats similar to XSLT, but it is more geared towards querying and extracting data from XML documents.
XQuery resembles SQL in syntax and capability, allowing for complex querying, filtering, sorting, and transformation of XML data.
Both XSLT and XQuery are powerful tools in XML processing:
XSLT is best suited for tasks where the primary goal is to transform XML into structured formats like HTML, often used in web development and document generation.
XQuery, on the other hand, is used more for querying XML data and extracting specific information, often integrated into XML database systems and used in scenarios where data extraction and transformation are needed.
Transforming XML Data to HTML Using XSLT and Flask
A Practical Guide
Let's look at a practical example to see how we can use XSLT to transform some sample XML from the New York Philharmonic Orchestra's concert archive available on their github repository here:
In this example, I'll guide you through setting up a Python Flask project, installing and configuring the necessary tools, and writing the XSLT code to display concert information in a user-friendly HTML format. By the end of this tutorial, you'll be able to extend and customise your application to explore and present complex XML data in various formats, making it an essential skill for working with XML and web technologies.
This tutorial is designed for macOS, but students using Windows or Linux can follow along with minimal adjustments.
Start by creating a new project directory for our application. Open your terminal and create a new directory for your project:
mkdir concerts && cd concerts
Next create and activate a virtual environment:
python -m venv venv
source venv/bin/activate
# On Windows, use `venv\Scripts\activate`
Install Flask and lxml
Pip install the necessary packages, first make sure you are using the latest version of pip, pip install --upgrade pip
pip install Flask lxml
Create Directory Structure and Download XML Data
Now create two subdirectory called templates
and data
where your XSLT file and xml will be stored:
mkdir templates data
Next download the XML file from the New York Philharmonic Orchestra archive, well put this file in our data folder:
curl -o data/concerts.xml https://raw.githubusercontent.com/nyphilarchive/PerformanceHistory/master/Programs/xml/1842-43_TO_1910-11.xml
Create the XSLT File
Next, create a new file table.xsl
inside the templates
directory:
touch templates/table.xsl
Add the following content to table.xsl
This XSL (Extensible Stylesheet Language) file is used to transform an XML document into an HTML document. Let's break down the code with comments for better understanding:
<?xml version="1.0" encoding="UTF-8"?>
<!-- The XML declaration defines the XML version and the character encoding used. -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- This is the root element of the XSLT stylesheet.
It declares the XSL namespace and specifies that this is version 1.0 of XSLT. -->
<xsl:output method="html" indent="yes" />
<!-- This element defines the output method as HTML and specifies that the output should be indented for readability. -->
<xsl:template match="/">
<!-- This template matches the root node of the XML document.
It is the starting point for the transformation. -->
<html>
<head>
<title>NY Philharmonic Concerts</title>
<!-- The title of the HTML document. -->
</head>
<body>
<h1>NY Philharmonic Concerts</h1>
<!-- The main heading of the HTML document. -->
<table border="1">
<!-- Creates an HTML table with a border. -->
<tr>
<th>Season</th>
<th>Orchestra</th>
<th>Date</th>
<th>Venue</th>
<th>Time</th>
<!-- Table headers for the concert data. -->
</tr>
<xsl:for-each select="//program">
<!-- Iterates over each <program> element in the XML document. -->
<tr>
<!-- Creates a new table row for each <program> element. -->
<td><xsl:value-of select="season"/></td>
<!-- Adds a table cell with the value of the <season> element. -->
<td><xsl:value-of select="orchestra"/></td>
<!-- Adds a table cell with the value of the <orchestra> element. -->
<td><xsl:value-of select="concertInfo/Date"/></td>
<!-- Adds a table cell with the value of the <Date> element inside <concertInfo>. -->
<td><xsl:value-of select="concertInfo/Venue"/></td>
<!-- Adds a table cell with the value of the <Venue> element inside <concertInfo>. -->
<td><xsl:value-of select="concertInfo/Time"/></td>
<!-- Adds a table cell with the value of the <Time> element inside <concertInfo>. -->
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Summary of the XSLT File
The provided XSLT file transforms XML data related to concerts into an HTML table format. Here's a detailed breakdown of its components and functionality:
XML Declaration:
<?xml version="1.0" encoding="UTF-8"?>
- Specifies the version of XML and the character encoding used.
XSLT Stylesheet Root Element:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
Declares the document as an XSLT stylesheet.
Defines the XSL namespace and specifies XSLT version 1.0.
Output Method Declaration:
<xsl:output method="html" indent="yes" />
Specifies that the output should be in HTML format.
Indicates that the output should be indented for better readability.
Template for Root Node:
<xsl:template match="/">
Matches the root node of the XML document.
Serves as the starting point for the transformation.
HTML Document Structure:
<html> <head> <title>NY Philharmonic Concerts</title> </head> <body> <h1>NY Philharmonic Concerts</h1> <table border="1"> <tr> <th>Season</th> <th>Orchestra</th> <th>Date</th> <th>Venue</th> <th>Time</th> </tr> ... </table> </body> </html>
Constructs the basic HTML structure including the
<html>
,<head>
, and<body>
tags.Sets the document title to "NY Philharmonic Concerts".
Adds a main heading (
<h1>
) with the same title.Creates an HTML table with a border.
Table Headers:
<tr> <th>Season</th> <th>Orchestra</th> <th>Date</th> <th>Venue</th> <th>Time</th> </tr>
- Defines the headers for the table columns: Season, Orchestra, Date, Venue, and Time.
Iteration Over XML Data:
<xsl:for-each select="//program">
- Iterates over each
<program>
element in the XML document.
- Iterates over each
Table Rows for Each Program:
<tr> <td><xsl:value-of select="season"/></td> <td><xsl:value-of select="orchestra"/></td> <td><xsl:value-of select="concertInfo/Date"/></td> <td><xsl:value-of select="concertInfo/Venue"/></td> <td><xsl:value-of select="concertInfo/Time"/></td> </tr>
For each
<program>
element:Creates a new table row (
<tr>
).Adds table cells (
<td>
) for the values of<season>
,<orchestra>
,<concertInfo/Date>
,<concertInfo/Venue>
, and<concertInfo/Time>
.Uses
<xsl:value-of>
to extract and insert the text content of the specified XML elements.
This XSLT file is designed to take an XML document containing concert information and transform it into a well-structured HTML table. It processes each <program>
element in the XML, extracting relevant details such as season, orchestra, date, venue, and time, and displaying them in a tabular format on a web page. This setup ensures that concert data is presented clearly and accessibly to users.
Create the Flask Application
Create a file named app.py
in the root of your concerts
directory:
touch app.py
Add the following content to app.py
from flask import Flask, render_template_string
import lxml.etree as ET
# Create a Flask application instance
app = Flask(__name__)
# Define the route for the root URL
@app.route('/')
def home():
# Load the XML and XSL files
xml_path = 'data/concerts.xml'
xsl_path = 'templates/table.xsl'
# Parse the XML and XSL files
xml_tree = ET.parse(xml_path)
xsl_tree = ET.parse(xsl_path)
# Create an XSLT transformer
transform = ET.XSLT(xsl_tree)
# Apply the transformation
result_tree = transform(xml_tree)
# Render the result as a string and return as the HTTP response
return render_template_string(str(result_tree))
# Run the Flask application
if __name__ == '__main__':
app.run(debug=True, port=8088)
Explanation of app.py
Imports:
from flask import Flask, render_template_string import lxml.etree as ET
Flask
andrender_template_string
are imported from theflask
module to create the web server and render the HTML response.lxml.etree
is imported asET
to handle XML parsing and XSLT transformations.
Flask Application Instance:
app = Flask(__name__)
- An instance of the Flask application is created.
Route Definition:
@app.route('/') def home():
- The
home
function is defined to handle requests to the root URL (/
).
- The
Load and Parse XML and XSL Files:
xml_path = 'data/concerts.xml' xsl_path = 'templates/table.xsl' xml_tree = ET.parse(xml_path) xsl_tree = ET.parse(xsl_path)
The paths to the XML and XSL files are defined.
ET.parse
is used to parse the XML and XSL files into tree structures.
Create XSLT Transformer and Apply Transformation:
transform = ET.XSLT(xsl_tree) result_tree = transform(xml_tree)
An XSLT transformer is created using the parsed XSL tree.
The transformation is applied to the XML tree, resulting in a new tree structure (
result_tree
).
Render and Return the Result:
return render_template_string(str(result_tree))
- The transformed result is converted to a string and rendered as the HTTP response using
render_template_string
.
- The transformed result is converted to a string and rendered as the HTTP response using
Run the Flask Application:
if __name__ == '__main__': app.run(debug=True, port=8088)
The Flask application is started, listening on port 8088 with debug mode enabled.
Run your flask application
Ensure you are in the concerts directory and activate the virtual environment if not already activated:
python app.py
Finally, open your web browser and go to http://localhost:8088 to see the transformed HTML.
You should see the NY Philharmonic Concerts listed in tabular format:
Transforming XML Data to HTML Using XQuery and Flask
Below is a tutorial for creating a Flask application that uses XQuery to transform XML data and display it as an HTML table. The example assumes the same XML structure as before, with details about concerts by the NY Philharmonic.
As before create a project directory, this time called flask_xquery
activate a virtual environment and pip install Flask and lxml. Your project directory should look something like this:
flask_xquery/
├── app.py
├── static/
│ └── styles.css
├── data/
└── concerts.xml
Place your concerts.xml data in the data/concerts.xml
file.
Create the Flask Application
Create the app.py
file to handle routes and render templates. Instead of using a separate .xq
file, we will embed the XQuery logic directly in your Flask route.
from flask import Flask, render_template
from lxml import etree
app = Flask(__name__)
@app.route('/')
def index():
# Define the XQuery directly within Python
xquery = '''
xquery version "3.1";
<results>
<html>
<head>
<title>NY Philharmonic Concerts</title>
</head>
<body>
<h1>NY Philharmonic Concerts</h1>
<table border="1">
<tr>
<th>Season</th>
<th>Orchestra</th>
<th>Date</th>
<th>Venue</th>
<th>Time</th>
</tr>
{
for $program in doc("data/concerts.xml")//program
return
<tr>
<td>{ $program/season }</td>
<td>{ $program/orchestra }</td>
<td>{ $program/concertInfo/Date }</td>
<td>{ $program/concertInfo/Venue }</td>
<td>{ $program/concertInfo/Time }</td>
</tr>
}
</table>
</body>
</html>
</results>
'''
# Parse and execute the XQuery using lxml
xquery_result = etree.XSLT(etree.XML(xquery))
# Convert lxml object to string for Flask rendering
html_result = str(xquery_result)
return html_result
if __name__ == '__main__':
app.run(debug=True)
XQuery Definition: The XQuery is defined as a string (xquery
) directly within the index()
function. It, is querying concerts.xml
for concert details and formatting them into an HTML table.
Execution: The XQuery string is parsed and executed using lxml
. The resulting HTML table is converted to a string (html_result
) for rendering via Flask.
Run the Flask Application
Run your Flask application from the terminal:
python app.py
Resources
Subscribe to my newsletter
Read articles from Pedram Badakhchani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Pedram Badakhchani
Pedram Badakhchani
Working as an online tutor for the Bachelor of Computer Science Degree from Goldsmiths, University of London on the Coursera platform. This is the first undergraduate degree programme available on Coursera, one of the world’s leading online education providers. The programme has been designed to equip students to access careers in emerging technologies, providing opportunities for students to study machine learning, data science, virtual reality, game development and web programming to meet the needs of career changers in industry as well as those taking their first steps into the innovative computer science field.