What is a YAML File?

The abbreviation YAML is either “Yet Another Markup Language” or “YAML Ain’t Markup Language”, depending on the source. The language is used to serialize data, i.e. to convert structured data into a sequence. Whereby the first name is not really valid, because it is not a Markup Language.

The so-called markup languages, such as HTML or XML, use markers to structure textual information. In HTML, for example, there is a fixed set of markers written in angle brackets (<>). These include, for example, headings (<h1>, <h2>, <h3>, …) or paragraphs (<p>), which are used to build up the structure of a website.

Das Bild zeigt die Konsole in Google Chrome mit dem Quellcode der Seite www.databasecamp.de. — HTML-Code of Data Basecamp | Source: Author

How does the syntax of YAML work?

This file format uses functions from various programming languages, such as Perl, C, or HTML. At the same time, it is a so-called superset of JSON, which means that JSON files can be used in YAML without significant problems.

A YAML file starts with three hyphens (“—“) to indicate that the file is starting. It can have either the structure of a list or a map. In our example, we show a map with fixed key-value pairs, which are similar to Python dictionaries. Each key may occur only once and the order of the pairs is not specified:

---
name: Boris Miller
telephone: 0152 48912348
age: 28

In addition, there are other syntax rules, such as:

Comments: YAML supports both single-line and multi-line comments. Single-line comments start with the hash symbol (#) and continue to the end of the line. Multiline comments start with a carriage return followed by a hash symbol (#) and continue to the end of the block.
Indentation: The file format uses indentations to define the structure of the data. Each indentation level represents a hierarchy level. Indentation can be done with either spaces or tabs but must be consistent throughout the file.
Scalars: Scalars are single values in YAML, such as strings, numbers, and Boolean values. They can be represented in various syntaxes, such as plain scalars, scalars in quotes, or folded scalars.
Collections: YAML supports two types of collections: Sequences and Mappings. Sequences are ordered lists of values, while mappings are unordered key-value pairs.
Anchors and aliases: YAML supports anchors and aliases to reference the same object multiple times in a file. An anchor is defined with the ampersand (&), an alias with the asterisk (*).

What Data Types does the File Format support?

In a “Yet Another Markup Language” file the common data types can be stored and saved:

Boolean (True/False)
Integer (e.g. 5 or 2304)
Floating Point (decimal numbers, e.g. 2.48 or 1983.4335)
String
Null values

What do you use YAML for?

YAML is a flexible, human-readable format for data serialization and is used in a variety of applications in different domains. One of the most popular uses is configuration files, which define settings for software applications and systems. YAML configuration files are used by many popular tools such as Ansible, Docker Compose, Kubernetes, and Helm.

Another application is in the area of data processing and analysis. YAML can be used to define data pipelines that specify the flow of data and the operations to be performed on the data. Apache Airflow is an example of a tool that uses this file type to define data pipelines.

YAML is also widely used in web development to define structured data, such as API specifications and Swagger documentation. It is popular for these use cases due to its readability and ease of parsing by programming languages.

OpenAPI Definition in YAML | Screenshot from Swagger

In addition, the file format can also be used in database applications, where it is used to define the structure and schema of the database, as well as to import and export data. Tools such as the Python-based ORM SQLAlchemy or the document-oriented NoSQL database MongoDB use YAML files for database configuration and schema definition.

Overall, YAML’s flexibility and readability make it suitable for a wide range of applications in different domains. Its use continues to increase and it can be assumed that it will continue to be a popular format for serialization in the coming years.

What are the advantages of using YAML files?

YAML is a human-readable data serialization language commonly used for configuration files. The following are some of the advantages of using this file format:

Readability: YAML is designed to be easy for humans to read and write. The syntax is intuitive to understand due to indentation and simple structures.
Flexible: It is a flexible format that can be used to represent a variety of data types such as lists, dictionaries, and scalar values. This makes it suitable for use in a variety of contexts, from simple configuration files to more complex data structures.
Integration: The file format is supported by many programming languages and frameworks, making it easy to integrate with other tools in the tech stack. For example, YAML is often used in conjunction with Docker and Kubernetes for container orchestration.
Version control: Files can be easily managed with version control systems like Git. This allows you to track changes over time, collaborate with others, and revert to previous versions as needed.
Portability: Since it is a text-based format, it can be easily transferred between different systems and platforms. This makes it a good choice for sharing configuration files and other data between teams and organizations.

Overall, the advantages of YAML make it a popular choice for configuration files and other use cases where readability and flexibility are important.

YAML and JSON in Comparison

YAML and JSON are two popular formats for serializing data used in modern software development. However, besides some similarities, there are also some differences between the two formats.

One of the main differences between them is the syntax. YAML uses indentations and spaces to structure the data, while JSON uses curly braces and parentheses. This makes the former file format more readable and easier to write than JSON, which can be lengthy and difficult to read.

Another difference between the file formats is the supported data types. YAML supports a wider range of data types, including strings, numbers, dates, and even custom data types. In contrast, JSON supports only basic data types such as strings, numbers, and Boolean values.

In terms of extensibility, YAML provides better support for custom tags and metadata. This makes it easier to create and use custom data types, which is essential for some applications.

One advantage of JSON is better cross-language compatibility. It is supported by almost all programming languages, while YAML support is more limited. In addition, JSON is generally faster to parse and serialize.

In terms of readability, YAML is often perceived as more natural than JSON due to its space-based syntax. The latter, on the other hand, can be more difficult to read and write due to its use of parentheses and curly lines.

In summary, the two data types have some similarities, but also distinct differences that make them more suitable for different applications. YAML is more readable and supports a wider range of data types, while JSON has better cross-language compatibility and faster parsing times. Ultimately, the decision depends on the specific requirements of the project at hand.

How to write and process YAML files in Python?

With the help of Python, you can very easily write and open these files. There is even its own library for processing:

# Import modules
import yaml as yml

# open the example file
with open('details.yml') as f:

    data = yml.load(f, Loader=yml.FullLoader)
    print(data)

Out:
{"name": "Boris Miller", "telephone": "0152 48912348", "age": 28}

With the help of the library, our read sample file is directly converted into a Python Dictionary and is then available for further processing. Just as easily, Python dictionaries can be saved in “Yet Another Markup Language” files:

dict_file = [
{'cars' : ['A6', 'A180', 'M3', 'A4', 'X5']},
{'brands' : ['Audi', 'Mercedes', 'BMW', 'Audi', 'BMW']}
]

# Create new file and dump data into it
with open(data.yml', 'w') as file:  
    data = yml.dump(dict_file, file)

This is what you should take with you

YAML is the abbreviation for “Yet Another Markup Language” and is mainly used for data serialization.
Besides data serialization, it is also used for configuration files of software.
It is very similar to JSON in its use and many other properties and is therefore also compatible.

Univariate Analysis / Univariate Analyse

What is the Univariate Analysis?

22. March 2025

Master Univariate Analysis: Dive Deep into Data with Visualization, and Python - Learn from In-Depth Examples and Hands-On Code.

What is OpenAPI?

15. February 2025

Explore OpenAPI: A Comprehensive Guide to Building and Consuming RESTful APIs. Learn How to Design, Document, and Test APIs.

What is Data Governance?

25. January 2025

Ensure the quality, availability, and integrity of your organization's data through effective data governance. Learn more here.

What is Data Quality?

18. January 2025

Ensuring Data Quality: Importance, Challenges, and Best Practices. Learn how to maintain high-quality data to drive better business decisions.

What is Data Imputation?

11. January 2025

Impute missing values with data imputation techniques. Optimize data quality and learn more about the techniques and importance.

What is Outlier Detection?

23. November 2024

Discover hidden anomalies in your data with advanced outlier detection techniques. Improve decision-making and uncover valuable insights.

You can find more information about Yet Another Markup Language on the official website.

Niklas Lang

I have been working as a machine learning engineer and software developer since 2020 and am passionate about the world of data, algorithms and software development. In addition to my work in the field, I teach at several German universities, including the IU International University of Applied Sciences and the Baden-Württemberg Cooperative State University, in the fields of data science, mathematics and business analytics.

My goal is to present complex topics such as statistics and machine learning in a way that makes them not only understandable, but also exciting and tangible. I combine practical experience from industry with sound theoretical foundations to prepare my students in the best possible way for the challenges of the data world.