The abbreviation ACID (Atomicity, Consistency, Isolation, Durability) is a term from database theory and describes rules and procedures for database transactions. If ACID specifications are adhered to, the data in a system is reliable and consistent. In the course of the article, in addition to ACID, we also highlight the properties of the CAP theorem that deal not only with databases but with distributed systems. We also highlight the advantages that arise when the ACID properties are satisfied.
What are the A-C-I-D basic principles?
Classical relational databases fulfill the four ACID properties. These state that the most important requirement for a database is to maintain the truthfulness and meaningfulness of the data. In many cases, data stores are seen as a “single point of truth”, thus it would be fatal if erroneous information is stored and passed on. The four properties include the following points:
- Atomicity (A): Data transactions, e.g. the entry of a new data record or the deletion of an old one, should either be executed completely or not at all. The transaction is only visible for other users when it is completely executed.
- Consistency (C): This property is satisfied when each data transaction moves the database from a consistent state to a consistent state.
- Isolation (I): When multiple transactions occur simultaneously, the final state must be the same as if the transactions occurred separately. That is, the database should pass the stress test. In other words, it should not result in incorrect database transactions due to overload.
- Durability (D): The data within the database must only change as a result of a transaction and must not be changeable by external influences. For example, a software update must not inadvertently cause data to change or possibly be deleted.
The Basic Principles on the Example
The most common example to illustrate the components of ACID are bank transfers, where money is transferred from one account to another. The goal, of course, is to ensure that all transfers are correct and that all customers have the amount of money in their account that they are entitled to.
Assume that a transfer from account A to account B takes place. Atomicity describes that transactions are either executed completely or fail completely. For our example, this means that if account A is debited with the amount of money and then there is a system failure, the money is simply credited back to account A. If this did not happen, we would have destroyed money and the system would be in a false state.
For consistency, after each transaction, it must be determined that the database is still in a consistent state, for example, that it does not contain any conflicting data. Suppose our example bank maintains a table with all accounts and the current balance amounts. In this table, the account number is a primary key so that each account number may occur only once in the database. If, after an incorrect database transaction, there may be two records for one account number, there is an inconsistency and the transaction must be reversed.
Isolation states that several transactions running in parallel must not lead to different results than if the transactions had taken place individually and one after the other. Thus, if a bank has to process 100 transfers simultaneously during peaks, it must be ensured that the balances of the affected accounts are just as high as if the transfers had taken place one after the other.
Finally, for durability, the bank must be able to guarantee that the consistent data inventory is not impaired by external influences. This includes, for example, power failures, system crashes, or software updates.
What are the Benefits of ACID?
In application, databases that comply with ACID principles offer many advantages. These include:
- ACID makes it possible for several people to work on a database without any concerns.
- Database users and developers can assume that the database is error-free and do not have to deal with troubleshooting.
- Manual debugging is no longer necessary because no errors occur.
Do NoSQL Databases fulfill the ACID Properties?
NoSQL solutions generally cannot comply with the ACID properties, although there are exceptions, such as graph databases, which comply with all the concepts. NoSQL databases are in many cases distributed across multiple devices and servers. This allows much larger amounts of data to be processed and stored simultaneously, which is a key requirement for these systems. However, this means that they do not fulfill the property of consistency.
Suppose we have implemented a NoSQL database on two physical servers, one in Germany and the other in the USA. The databases contain the account balances and transactions of German and American customers. The German accounts are stored in Germany and the American accounts are stored on the American server.
It may now happen that a German customer makes a transfer to an American account. Then both data stores are changed and are inconsistent during this processing period. For example, it may happen that we start a database query while the processing in Germany has already been completed, but the processing in the USA has not yet been completed. In this time window, the “Inconsistency window”, the data in the database is not correct and is inconsistent. This would not happen in a relational database.
What is the CAP Theorem?
The CAP theorem describes a total of three properties of databases on distributed systems, which can never all be fulfilled at the same time. CAP is an abbreviation for the terms “Consistency”, “Availability” and “Partition Tolerance”. This theorem applies primarily to databases that are distributed across multiple systems and belong to the field of NoSQL databases. For classic relational databases, on the other hand, the so-called ACID principle is applied.
In essence, CAP consists of the following three properties:
- Consistency describes the fact that the data in the database must be consistent at all times. This means that there must be no irregularities when retrieving the data, regardless of which of the nodes is addressed. In practical terms, this means, for example, that when a new data record is inserted, the data states on all nodes must take place simultaneously.
- Availability means that the distributed system always provides a response, even if individual nodes may have just failed. This is independent of which node one addresses in the system. Thus one gets also an answer if one addressed coincidentally a failed node. The system as a whole is thus continuously available.
- Partition tolerance, also known as failure tolerance, describes the ability to ensure that requests are always processed correctly and completely, even if communication failures occur during the process.
It can be shown axiomatically that these properties cannot be fulfilled simultaneously in distributed systems under any circumstances. Therefore, the CAP theorem was formed, which states that one must limit oneself to two of the properties when building distributed databases and that the third property will thus be disregarded in any case.
ACID vs. CAP Theorem
In short, the CAP theorem and the ACID properties differ in that CAP deals with distributed systems whereas ACID makes statements about databases. However, we want to go into more detail at this point.
Both concepts deal with the consistency of data, but they differ in what effects this has. With ACID, data consistency is meant in the area of (relational) databases. This means that the data is consistent, as soon as there is conflicting data in the system, then the database is inconsistent. This can occur, for example, due to faulty duplicates, but it does not have to. It reaches much deeper and includes the interconnections between tables and the logic behind them, such as foreign and primary keys.
In the CAP theorem, on the other hand, consistency means that the distributed system always outputs the same result for a query. This means that there may be duplicates on different servers, but they must always have the same status so that no differences can occur. Then the consistency in the CAP theorem is fulfilled.
This is what you should take with you
- ACID (Atomicity, Consistency, Isolation, Durability) is a term from database theory and describes rules and procedures for database transactions.
- Relational databases fulfill these properties and are therefore consistent at all times. NoSQL databases, on the other hand, are to a large extent not ACID compliant.
- Compliance with the principles ensures that databases have an error-free database at all times and that concurrent accesses are possible without any concerns.
- ACID differs from the so-called CAP theorem mainly in the definition of consistency and in the fact that ACID is a concept for databases and CAP is a concept of distributed systems.
What is Apache Airflow?
Apache Airflow explained with architecture and application examples.
What is Apache Kafka?
Structure of Apache Kafka explained with possible fields of application.
What is the Star Schema?
Description of the star scheme compared to the snowflake scheme.
What is Apache Spark?
Explanation of Apache Spark with a comparison to Hadoop.
What is a Database Schema?
Explanation of database schemas by example.
What is Presto?
Explanation of Apache Presto compared to Apache Spark.
OLTP: What is Online Transaction Processing?
Explanation of OLTP including its features and differences from OLAP.
Overview of important SQL commands
Common SQL commands explained with the help of examples.
OLAP: What is Online Analytical Processing?
Introduction to Online Analytical Processing with an explanation of the OLAP Cube.
What is a YAML File?
Explanation of YAML files and their use in Python.
Other Articles on the Topic of ACID
- IBM also provides a detailed explanation of the principles of ACID.