Skip to content

What is an XML-File?

  • Data

XML stands for Extensible Markup Language and is used today as a text-based data format for the exchange of structured data. It was originally developed to replace HTML because it had reached its limits in terms of data technology.

How are XML files structured?

The so-called markup languages, such as HTML or XML, use markup to structure textual information. In HTML there is a fixed set of markers, which are defined in angle brackets (<>). These include, for example, headings (<h1>, <h2>,<h3>, …) or paragraphs (<p>), which are used to build up the structure of a website:

<!DOCTYPE html>
<html>
<head>
<title>Title Text</title>
</head>
<body>

<h1>Main Header</h1>
<p>The first paragraph</p>

</body>
</html>

For example, this is the structure of a very simple website. The markups define the structure of the page, which currently consists only of a heading and a short paragraph.

XML uses the same structure with the difference that the amount of markups is not limited and their namings are freely selectable. This makes it relatively easy to emulate many types of data structures.

The only rules given by Extensible Markup Language are that markups must always start with an opening tag <markup_1> and end with a closing tag </markup_1>. The only exception is a tag with this form <markup_1/>.

To display nested information, several new tags can also be defined within an open tag. Here is an example:

# Car XML File
<car>
    <brand>Audi</brand>
    <model>A6</model>
    <construction_year>1997</construction_year>
</car>

An additional functionality are the so-called parameters. These can be defined for each markup and must always contain a name and a value. The value must be defined in quotes, even if it is a number.

In our example, the introduction of parameters is useful when we want to describe multiple cars in a collection:

# Car XML File
<car id="1">
    <brand>Audi</brand>
    <model>A6</model>
    <construction_year>1997</construction_year>
</car>
<car id="2">
    <brand>Mercedes-Benz</brand>
    <model>A-class</model>
    <construction_year>2015</construction_year>
</car>

Advantages and Disadvantages of the Extensible Markup Language

The following advantages are offered by the use of Extensible Markup Language files:

  • Wide distribution and thus high compatibility with already existing applications
  • High security of the files
  • Easy recovery of information due to the readability of the text files
  • Easy interpretation by man and machine
  • Simple structure and layout, so that it can be quickly understood by many users
  • Extensibility in the form of “dialects”

The only real drawback to this long list of positives comes from the textual format that the Extensible Markup Language uses. Textual information can only be stored with comparatively more memory and can therefore lead to lower performance in processing. Binary file formats, such as BSON, require significantly less storage space for the same information, but are not human readable because the information is stored in zeros and ones.

Which Applications use the Extensible Markup Language?

Due to the text-based storage of the Extensible Markup Language, the format is relatively easy to read and understand. This is why it is used for a wide variety of applications. One of the most common use cases is data exchange, i.e. importing and exporting data in applications.

In addition, there are few general uses of the Extensible Markup Language, as most use cases have created a variation of XML that is specific to their application. For example, there is the Mathematical Markup Language (MathML), which is a dialect of XML and is used to correctly represent mathematical equations and terms.

How to edit XML Files in Python?

There are several ways and modules to open XML files in Python. We will try to use a string consisting of our previous example. We can either try to preserve the original structure:

# Example
string = """
<car id="1"> 
    <brand>Audi</brand> 
    <model>A6</model> 
    <construction_year>1997</construction_year> 
</car> 
"""

# Import Modules
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import fromstring
tree = ET.ElementTree(ET.fromstring(string))
root = tree.getroot()

# Get elements 
print(root[0])

Out: 
<Element 'brand' at 0x00000229449A54F8>

On the other hand, we can also try to convert the structure of the Extensible Markup Language into a Python dictionary. This is much easier for many developers who work a lot with Python:

# Import Module
import xmltodict

print(xmltodict.parse(xml_string))

Out: 
{'car': {'@id': '1',
  'brand': 'Audi',
  'model': 'A6',
  'construction_year': '1997'}}

This is what you should take with you

  • XML stands for Extensible Markup Language and is used today as a text-based data format for the exchange of structured data.
  • Due to the text-based storage of the Extensible Markup Language, the format is relatively easy to read and understand.
  • Among other things, the format has the advantage of being adaptable to different use cases through dialects.

Other Articles on the Topic of XML

close
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner