Skip to content

What is a YAML File?

  • Data

The abbreviation YAML is either “Yet Another Markup Language” or “YAML Ain’t Markup Language”, depending on the source. The language is used to serialize data, i.e. to convert structured data into a sequence. Whereby the first name is not really valid, because it is not a Markup Language.

The so-called markup languages, such as HTML or XML, use markers to structure textual information. In HTML, for example, there is a fixed set of markers written in angle brackets (<>). These include, for example, headings (<h1>, <h2>, <h3>, …) or paragraphs (<p>), which are used to build up the structure of a website.

Das Bild zeigt die Konsole in Google Chrome mit dem Quellcode der Seite www.databasecamp.de.
HTML-Code of Data Basecamp | Source: Author

How does the syntax of YAML work?

This file format uses functions from various programming languages, such as Perl, C, or HTML. At the same time, it is a so-called superset of JSON, which means that JSON files can be used in YAML without significant problems.

A YAML file starts with three hyphens (“—“) to indicate that the file is starting. It can have either the structure of a list or a map. In our example, we show a map with fixed key-value pairs, which are similar to Python dictionaries. Each key may occur only once and the order of the pairs is not specified:

---
name: Boris Miller
telephone: 0152 48912348
age: 28

In addition, there are other syntax rules, such as:

  • Comments: YAML supports both single-line and multi-line comments. Single-line comments start with the hash symbol (#) and continue to the end of the line. Multiline comments start with a carriage return followed by a hash symbol (#) and continue to the end of the block.
  • Indentation: The file format uses indentations to define the structure of the data. Each indentation level represents a hierarchy level. Indentation can be done with either spaces or tabs but must be consistent throughout the file.
  • Scalars: Scalars are single values in YAML, such as strings, numbers, and Boolean values. They can be represented in various syntaxes, such as plain scalars, scalars in quotes, or folded scalars.
  • Collections: YAML supports two types of collections: Sequences and Mappings. Sequences are ordered lists of values, while mappings are unordered key-value pairs.
  • Anchors and aliases: YAML supports anchors and aliases to reference the same object multiple times in a file. An anchor is defined with the ampersand (&), an alias with the asterisk (*).

What Data Types does the File Format support?

In a “Yet Another Markup Language” file the common data types can be stored and saved:

  • Boolean (True/False)
  • Integer (e.g. 5 or 2304)
  • Floating Point (decimal numbers, e.g. 2.48 or 1983.4335)
  • String
  • Null values

What do you use YAML for?

YAML is a flexible, human-readable format for data serialization and is used in a variety of applications in different domains. One of the most popular uses is configuration files, which define settings for software applications and systems. YAML configuration files are used by many popular tools such as Ansible, Docker Compose, Kubernetes, and Helm.

Another application is in the area of data processing and analysis. YAML can be used to define data pipelines that specify the flow of data and the operations to be performed on the data. Apache Airflow is an example of a tool that uses this file type to define data pipelines.

YAML is also widely used in web development to define structured data, such as API specifications and Swagger documentation. It is popular for these use cases due to its readability and ease of parsing by programming languages.

OpenAPI Definition in YAML | Screenshot from Swagger

In addition, the file format can also be used in database applications, where it is used to define the structure and schema of the database, as well as to import and export data. Tools such as the Python-based ORM SQLAlchemy or the document-oriented NoSQL database MongoDB use YAML files for database configuration and schema definition.

Overall, YAML’s flexibility and readability make it suitable for a wide range of applications in different domains. Its use continues to increase and it can be assumed that it will continue to be a popular format for serialization in the coming years.

What are the advantages of using YAML files?

YAML is a human-readable data serialization language commonly used for configuration files. The following are some of the advantages of using this file format:

  • Readability: YAML is designed to be easy for humans to read and write. The syntax is intuitive to understand due to indentation and simple structures.
  • Flexible: It is a flexible format that can be used to represent a variety of data types such as lists, dictionaries, and scalar values. This makes it suitable for use in a variety of contexts, from simple configuration files to more complex data structures.
  • Integration: The file format is supported by many programming languages and frameworks, making it easy to integrate with other tools in the tech stack. For example, YAML is often used in conjunction with Docker and Kubernetes for container orchestration.
  • Version control: Files can be easily managed with version control systems like Git. This allows you to track changes over time, collaborate with others, and revert to previous versions as needed.
  • Portability: Since it is a text-based format, it can be easily transferred between different systems and platforms. This makes it a good choice for sharing configuration files and other data between teams and organizations.

Overall, the advantages of YAML make it a popular choice for configuration files and other use cases where readability and flexibility are important.

YAML and JSON in Comparison

YAML and JSON are two popular formats for serializing data used in modern software development. However, besides some similarities, there are also some differences between the two formats.

One of the main differences between them is the syntax. YAML uses indentations and spaces to structure the data, while JSON uses curly braces and parentheses. This makes the former file format more readable and easier to write than JSON, which can be lengthy and difficult to read.

Another difference between the file formats is the supported data types. YAML supports a wider range of data types, including strings, numbers, dates, and even custom data types. In contrast, JSON supports only basic data types such as strings, numbers, and Boolean values.

In terms of extensibility, YAML provides better support for custom tags and metadata. This makes it easier to create and use custom data types, which is essential for some applications.

One advantage of JSON is better cross-language compatibility. It is supported by almost all programming languages, while YAML support is more limited. In addition, JSON is generally faster to parse and serialize.

In terms of readability, YAML is often perceived as more natural than JSON due to its space-based syntax. The latter, on the other hand, can be more difficult to read and write due to its use of parentheses and curly lines.

In summary, the two data types have some similarities, but also distinct differences that make them more suitable for different applications. YAML is more readable and supports a wider range of data types, while JSON has better cross-language compatibility and faster parsing times. Ultimately, the decision depends on the specific requirements of the project at hand.

How to write and process YAML files in Python?

With the help of Python, you can very easily write and open these files. There is even its own library for processing:

# Import modules
import yaml as yml

# open the example file
with open('details.yml') as f:

    data = yml.load(f, Loader=yml.FullLoader)
    print(data)

Out:
{"name": "Boris Miller", "telephone": "0152 48912348", "age": 28}

With the help of the library, our read sample file is directly converted into a Python Dictionary and is then available for further processing. Just as easily, Python dictionaries can be saved in “Yet Another Markup Language” files:

dict_file = [
{'cars' : ['A6', 'A180', 'M3', 'A4', 'X5']},
{'brands' : ['Audi', 'Mercedes', 'BMW', 'Audi', 'BMW']}
]

# Create new file and dump data into it
with open(data.yml', 'w') as file:  
    data = yml.dump(dict_file, file)

This is what you should take with you

  • YAML is the abbreviation for “Yet Another Markup Language” and is mainly used for data serialization.
  • Besides data serialization, it is also used for configuration files of software.
  • It is very similar to JSON in its use and many other properties and is therefore also compatible.
RESTful API

What is a RESTful API?

Learn all about RESTful APIs and how they can make your web development projects more efficient and scalable.

Time Series Data / Zeitreihendaten

What is Time Series Data?

Unlock insights from time series data with analysis and forecasting techniques. Discover trends and patterns for informed decision-making.

Balkendiagramm / Bar Chart

What is a Bar Chart?

Discover the power of bar charts in data visualization. Learn how to create, customize, and interpret bar charts for insightful data analysis.

Liniendiagramm / Line Chart

What is a Line Chart?

Master the art of line charts: Learn how to visualize trends and patterns in your data with our comprehensive guide.

Data Preprocessing

What is Data Preprocessing?

Streamline your data analysis with effective data preprocessing techniques. Learn the essentials in our guide to data preprocessing.

Kreisdiagramm / Pie Chart

What is a Pie Chart?

Visualize data proportions with pie charts: an intuitive and effective way to understand relative distribution.

  • You can find more information about Yet Another Markup Language on the official website.
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner