The abbreviation ACID (Atomicity, Consistency, Isolation, Durability) is a term from database theory and describes rules and procedures for database transactions. If ACID specifications are adhered to, the data in a system is reliable and consistent. In the course of the article, in addition to ACID, we also highlight the properties of the CAP theorem that deal not only with databases but with distributed systems. We also highlight the advantages that arise when the ACID properties are satisfied.
What are the A-C-I-D basic principles?
Classical relational databases fulfill the four ACID properties. These state that the most important requirement for a database is to maintain the truthfulness and meaningfulness of the data. In many cases, data stores are seen as a “single point of truth”, thus it would be fatal if erroneous information is stored and passed on. The four properties include the following points:
- Atomicity (A): Data transactions, e.g. the entry of a new data record or the deletion of an old one, should either be executed completely or not at all. The transaction is only visible for other users when it is completely executed.
- Consistency (C): This property is satisfied when each data transaction moves the database from a consistent state to a consistent state.
- Isolation (I): When multiple transactions occur simultaneously, the final state must be the same as if the transactions occurred separately. That is, the database should pass the stress test. In other words, it should not result in incorrect database transactions due to overload.
- Durability (D): The data within the database must only change as a result of a transaction and must not be changeable by external influences. For example, a software update must not inadvertently cause data to change or possibly be deleted.
The Basic Principles on the Example
The most common example to illustrate the components of ACID are bank transfers, where money is transferred from one account to another. The goal, of course, is to ensure that all transfers are correct and that all customers have the amount of money in their account that they are entitled to.
Assume that a transfer from account A to account B takes place. Atomicity describes that transactions are either executed completely or fail completely. For our example, this means that if account A is debited with the amount of money and then there is a system failure, the money is simply credited back to account A. If this did not happen, we would have destroyed money and the system would be in a false state.
For consistency, after each transaction, it must be determined that the database is still in a consistent state, for example, that it does not contain any conflicting data. Suppose our example bank maintains a table with all accounts and the current balance amounts. In this table, the account number is a primary key so that each account number may occur only once in the database. If, after an incorrect database transaction, there may be two records for one account number, there is an inconsistency and the transaction must be reversed.
Isolation states that several transactions running in parallel must not lead to different results than if the transactions had taken place individually and one after the other. Thus, if a bank has to process 100 transfers simultaneously during peaks, it must be ensured that the balances of the affected accounts are just as high as if the transfers had taken place one after the other.
Finally, for durability, the bank must be able to guarantee that the consistent data inventory is not impaired by external influences. This includes, for example, power failures, system crashes, or software updates.
What are the Benefits of ACID?
In application, databases that comply with ACID principles offer many advantages. These include:
- ACID makes it possible for several people to work on a database without any concerns.
- Database users and developers can assume that the database is error-free and do not have to deal with troubleshooting.
- Manual debugging is no longer necessary because no errors occur.
Do NoSQL Databases fulfill the ACID Properties?
NoSQL solutions generally cannot comply with the ACID properties, although there are exceptions, such as graph databases, which comply with all the concepts. NoSQL databases are in many cases distributed across multiple devices and servers. This allows much larger amounts of data to be processed and stored simultaneously, which is a key requirement for these systems. However, this means that they do not fulfill the property of consistency.
Suppose we have implemented a NoSQL database on two physical servers, one in Germany and the other in the USA. The databases contain the account balances and transactions of German and American customers. The German accounts are stored in Germany and the American accounts are stored on the American server.
It may now happen that a German customer makes a transfer to an American account. Then both data stores are changed and are inconsistent during this processing period. For example, it may happen that we start a database query while the processing in Germany has already been completed, but the processing in the USA has not yet been completed. In this time window, the “Inconsistency window”, the data in the database is not correct and is inconsistent. This would not happen in a relational database.
What is the CAP Theorem?
The CAP theorem describes a total of three properties of databases on distributed systems, which can never all be fulfilled at the same time. CAP is an abbreviation for the terms “Consistency”, “Availability” and “Partition Tolerance”. This theorem applies primarily to databases that are distributed across multiple systems and belong to the field of NoSQL databases. For classic relational databases, on the other hand, the so-called ACID principle is applied.
In essence, CAP consists of the following three properties:
- Consistency describes the fact that the data in the database must be consistent at all times. This means that there must be no irregularities when retrieving the data, regardless of which of the nodes is addressed. In practical terms, this means, for example, that when a new data record is inserted, the data states on all nodes must take place simultaneously.
- Availability means that the distributed system always provides a response, even if individual nodes may have just failed. This is independent of which node one addresses in the system. Thus one gets also an answer if one addressed coincidentally a failed node. The system as a whole is thus continuously available.
- Partition tolerance, also known as failure tolerance, describes the ability to ensure that requests are always processed correctly and completely, even if communication failures occur during the process.
It can be shown axiomatically that these properties cannot be fulfilled simultaneously in distributed systems under any circumstances. Therefore, the CAP theorem was formed, which states that one must limit oneself to two of the properties when building distributed databases and that the third property will thus be disregarded in any case.
What are the limitations of the ACID properties?
While ACID properties provide important guarantees for database transactions, they also have some limitations and trade-offs.
Firstly, strict adherence to ACID properties can result in decreased performance and scalability, particularly for large-scale distributed systems. Maintaining transactional consistency across distributed nodes can be challenging, leading to increased network latency and reduced throughput.
Secondly, enforcing strict isolation can lead to increased contention and locking, resulting in reduced concurrency and throughput. This can become a bottleneck for systems with high write rates or high contention for shared resources.
Thirdly, ACID transactions can be expensive in terms of resource usage, particularly for long-running transactions or for systems with high write rates. This can lead to increased resource consumption and potential performance issues.
Finally, strict adherence to ACID properties may not be necessary for all applications. In cases where high availability or performance is more important than strict consistency, some trade-offs can be made to relax the ACID guarantees in favor of improved performance or scalability.
Overall, while ACID properties provide important guarantees for transactional consistency and reliability, they may not always be the best fit for every application. Careful consideration should be given to the trade-offs between consistency, availability, and performance when designing database systems.
How can you implement ACID in databases?
Implementing ACID properties in a database system involves a series of fundamental steps to ensure data integrity and reliability:
1. Atomicity (A):
- Transaction Management: Develop a transaction management system that treats a sequence of SQL statements as a single, indivisible unit.
- Transaction Logs: Create detailed transaction logs to record all changes during a transaction. These logs enable rollback in case of failure.
- Rollback Mechanism: Implement a mechanism to revert changes if any part of a transaction fails, ensuring consistency.
2. Consistency (C):
- Data Validation: Enforce data integrity constraints, like unique keys and referential integrity, to maintain data consistency.
- Validation Rules: Define validation rules to validate data before insertion or update, ensuring only valid data is accepted.
- Pre-transaction Checks: Perform checks on data before a transaction starts to prevent violations of consistency rules.
3. Isolation (I):
- Concurrency Control: Implement mechanisms, such as locking or timestamps, to manage concurrent transactions.
- Isolation Levels: Support different isolation levels, allowing users to choose the level of isolation required.
- Deadlock Handling: Detect and resolve deadlocks where transactions are mutually waiting for resources.
4. Durability (D):
- Write-Ahead Logging: Employ write-ahead logging (WAL) to log changes before applying them, ensuring durability.
- Redo Logs: Maintain redo logs containing committed transactions for recovery after a system crash.
- Data Backup: Regularly back up the database to enable data restoration in catastrophic failures.
5. Transaction Commit:
- Two-Phase Commit (2PC): Implement a two-phase commit protocol for distributed transactions to ensure all participants commit or abort together.
- Transaction Recovery: Develop recovery procedures using transaction logs to complete or roll back incomplete transactions when the database restarts after a failure.
6. Testing and Validation:
- Compliance Testing: Thoroughly test the database to ensure ACID compliance in various scenarios, including normal operations, failures, and concurrent access.
- Validation Tools: Employ validation tools to verify consistent adherence to ACID requirements.
7. Monitoring and Maintenance:
- Continuous Monitoring: Set up monitoring systems to track database performance and health continuously.
- Regular Maintenance: Schedule routine tasks, such as log management, index optimization, and backup verification, to maintain ACID compliance.
- Guidelines and Documentation: Provide clear documentation and guidelines for administrators and developers regarding ACID implementation and maintenance.
These steps collectively ensure a robust and reliable database system, crucial for applications requiring secure and consistent data management, even under challenging conditions.
ACID vs. CAP Theorem
In short, the CAP theorem and the ACID properties differ in that CAP deals with distributed systems whereas ACID makes statements about databases. However, we want to go into more detail at this point.
Both concepts deal with the consistency of data, but they differ in what effects this has. With ACID, data consistency is meant in the area of (relational) databases. This means that the data is consistent, as soon as there is conflicting data in the system, then the database is inconsistent. This can occur, for example, due to faulty duplicates, but it does not have to. It reaches much deeper and includes the interconnections between tables and the logic behind them, such as foreign and primary keys.
In the CAP theorem, on the other hand, consistency means that the distributed system always outputs the same result for a query. This means that there may be duplicates on different servers, but they must always have the same status so that no differences can occur. Then the consistency in the CAP theorem is fulfilled.
This is what you should take with you
- ACID (Atomicity, Consistency, Isolation, Durability) is a term from database theory and describes rules and procedures for database transactions.
- Relational databases fulfill these properties and are therefore consistent at all times. NoSQL databases, on the other hand, are to a large extent not ACID compliant.
- Compliance with the principles ensures that databases have an error-free database at all times and that concurrent accesses are possible without any concerns.
- ACID differs from the so-called CAP theorem mainly in the definition of consistency and in the fact that ACID is a concept for databases and CAP is a concept of distributed systems.
Other Articles on the Topic of ACID
- IBM also provides a detailed explanation of the principles of ACID.