Manipulating XML with Python - Guru Software (2024)

XML (eXtensible Markup Language) is a popular format for storing and transporting data. With Python‘s built-in tools and libraries, working with XML files is straightforward. In this comprehensive guide, we‘ll explore how to parse, read, write, and modify XML using Python.

Parsing XML Files in Python

To start manipulating XML files, we first need to load and parse the content. Python‘s xml module provides various APIs for this purpose. The commonly used ones are xml.etree.ElementTree and xml.dom.minidom. Let‘s look at examples of using both to parse an XML file.

Using xml.etree.ElementTree

ElementTree provides a simple way to parse and create XML data. Here‘s how to use it:

import xml.etree.ElementTree as ETtree = ET.parse(‘data.xml‘)root = tree.getroot()

This loads the XML file, parses it and returns the root element. We can now traverse the tree to access elements and extract data.

For example, to print all item names:

for elem in root.findall(‘item‘): name = elem.get(‘name‘) print(name) 

ElementTree also makes it easy to manipulate the XML structure by adding, updating and removing elements.

Using xml.dom.minidom

xml.dom.minidom uses the DOM (Document Object Model) approach to represent XML as a tree structure in memory for easy traversal and manipulation.

Here‘s an example to parse the XML and print some basic details:

import xml.dom.minidom as mddoc = md.parse(‘data.xml‘)print(doc.nodeName) print(doc.firstChild.tagName) 

This prints the node name of root and its first child element. We can also fetch all elements of a tag name using doc.getElementsByTagName().

In summary, ElementTree offers a simple API while minidom allows closely examining and manipulating XML documents.

Reading XML Data in Python

Now let‘s discuss different ways to extract or read data from parsed XML using Python.

Accessing Elements

The getroot() and getElementsByTagName() methods return element objects. Each element has attributes/methods to explore its properties:

for employee in doc.getElementsByTagName(‘employee‘): id = employee.getAttribute(‘id‘) name = employee.getElementsByTagName(‘name‘)[0].childNodes[0].nodeValue role = employee.getElementsByTagName(‘role‘)[0].childNodes[0].nodeValue print(f‘ID: {id}, Name: {name}, Role: {role}‘) 

This loops through employee elements, extracts the ID attribute, finds inner text of name and role tags to print employee details.

Finding Elements

We can search for elements in the parsed XML based on attributes, tag names or text values using .find() and .findall():

# Get first expertise element with SQL value sql_skill = root.find(‘.//expertise[text()="SQL"]‘)# Get all expertise elements skills = root.findall(‘.//expertise‘) 

Here .// searches recursively from current node and [] has the matching condition.

XPath Queries

For more complex queries, XPath expressions can be used to find data:

xpath_query = "//employee/name/text()"name_nodes = doc.xpath(xpath_query)for node in name_nodes: print(node.nodeValue)

This prints names of all employee elements by evaluating the XPath query.

So in short, we have multiple options to search or extract any data from XML using Python.

Modifying and Writing XML

We often need to make changes by adding, updating or removing elements and attributes in XML files. Python has good support for this as well with xml.etree.ElementTree and xml.dom.minidom.

Let‘s look at examples of modifying the parsed XML tree and also writing it back to a file.

Updating Elements

To update some data:

for expertise in doc.getElementsByTagName(‘expertise‘): if expertise.getAttribute(‘name‘) == ‘Testing‘: expertise.setAttribute(‘name‘, ‘Software Testing‘) print(doc.toxml()) 

Here we change the name from "Testing" to "Software Testing" for expertise elements. The updated XML content is printed.

Adding Elements

We can also add new elements to the XML:

new_expertise = doc.createElement(‘expertise‘)new_expertise.setAttribute(‘name‘, ‘Big Data‘)doc.firstChild.appendChild(new_expertise)

This appends a new expertise node in the XML structure.

Deleting Nodes

Similarly, unwanted elements can be removed:

sql_skill = root.find(‘.//expertise[text()="SQL"]‘)root.remove(sql_skill)

This finds SQL expertise node and deletes it from the tree.

Writing XML Files

Finally, we can write the updated XML content back into a file:

with open(‘new_data.xml‘, ‘w‘) as f: f.write(doc.toprettyxml()) 

This properly formats and writes the XML to new_data.xml.

So Python gives lots of flexibility to modify XML and persist changes.

Parsing Large XML Files

When dealing with large XML data, loading fully into memory with parse() may not be feasible.

We can use streaming APIs like iterparse() to handle such cases. It generates events while parsing and we handle each element separately.

Here‘s an example:

import xml.etree.ElementTree as ETcontext = ET.iterparse(‘large_data.xml‘, events=(‘start‘, ‘end‘)) # Iterate through elements for event, elem in context: if event == ‘start‘ and elem.tag == ‘item‘: print(f‘Reading item {elem.get("id")}‘) if event == ‘end‘: elem.clear() # discard elementcontext.close()

This minimized memory usage by clearing elements after reading. The same approach works for writing large XML files incrementally.

XML Schema Validation

We may want to validate XML against a schema XSD file to ensure integrity. The schema package makes validation simple:

import schema xml_doc = ‘data.xml‘xsd_doc = ‘schema.xsd‘schema.validate(xml_doc, xsd_doc)

This raises ValidationError if XML structure does not conform to the schema. We can handle them to take corrective actions.

Conclusion

In this guide, we explored various ways to parse, read, modify and write XML using Python‘s standard library modules mainly xml.etree.ElementTree and xml.dom.minidom.

Key points are:

  • Use ElementTree for simple XML parsing and processing
  • Prefer minidom for advanced DOM based manipulation
  • Extract data via element attributes, tag names, XPath queries etc.
  • Update XML tree in memory and write changes back to files
  • Employ streaming for large XML data
  • Validate against schema for catching errors

Python makes XML handling quite straightforward. Hope you find the techniques useful for your applications. Let me know if you have any other XML usage tricks with Python.

Manipulating XML with Python - Guru Software (2024)
Top Articles
Sodexonet North America
'Babbling' and 'hoarse': Biden's debate performance sends Democrats into a panic
Chs.mywork
Printable Whoville Houses Clipart
Swimgs Yuzzle Wuzzle Yups Wits Sadie Plant Tune 3 Tabs Winnie The Pooh Halloween Bob The Builder Christmas Autumns Cow Dog Pig Tim Cook’s Birthday Buff Work It Out Wombats Pineview Playtime Chronicles Day Of The Dead The Alpha Baa Baa Twinkle
Pixel Speedrun Unblocked 76
Frederick County Craigslist
Jonathon Kinchen Net Worth
Don Wallence Auto Sales Vehicles
FFXIV Immortal Flames Hunting Log Guide
Find All Subdomains
Craigslist Dog Sitter
Music Archives | Hotel Grand Bach - Hotel GrandBach
123 Movies Babylon
Corporate Homepage | Publix Super Markets
Natureza e Qualidade de Produtos - Gestão da Qualidade
Our Facility
Curtains - Cheap Ready Made Curtains - Deconovo UK
Georgia Vehicle Registration Fees Calculator
Illinois VIN Check and Lookup
97226 Zip Code
Hannaford To-Go: Grocery Curbside Pickup
Bennington County Criminal Court Calendar
Red8 Data Entry Job
What Is The Lineup For Nascar Race Today
Naval Academy Baseball Roster
Craigslist Panama City Beach Fl Pets
Downloahub
Diggy Battlefield Of Gods
Fairwinds Shred Fest 2023
Star News Mugshots
Indiana Jones 5 Showtimes Near Jamaica Multiplex Cinemas
Unm Hsc Zoom
Sitting Human Silhouette Demonologist
Prima Healthcare Columbiana Ohio
Help with your flower delivery - Don's Florist & Gift Inc.
10 Most Ridiculously Expensive Haircuts Of All Time in 2024 - Financesonline.com
Craigs List Stockton
Vivek Flowers Chantilly
Bbc Gahuzamiryango Live
Pepsi Collaboration
The Holdovers Showtimes Near Regal Huebner Oaks
How Many Dogs Can You Have in Idaho | GetJerry.com
Citibank Branch Locations In Orlando Florida
Lamp Repair Kansas City Mo
Tableaux, mobilier et objets d'art
Coffee County Tag Office Douglas Ga
What is a lifetime maximum benefit? | healthinsurance.org
Sitka Alaska Craigslist
Craigslist Sarasota Free Stuff
Call2Recycle Sites At The Home Depot
786 Area Code -Get a Local Phone Number For Miami, Florida
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 5488

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.