Referential integrity is a term from database theory and describes rules of how a relational database ensures the integrity and consistency, i.e. the true content, of the data. It focuses primarily on the primary and foreign keys in the tables.
What are primary and foreign keys?
Each table in a database ideally has one column or a combination of several columns that have a unique value for a record. This column or several common columns are called the primary key of the table. It ensures that each record, i.e. each row in the table, can be uniquely identified. This provides the ability to access individual records using an SQL query.
If a table does not have a column with unique values, multiple columns can also be used to establish this uniqueness. This distinguishes different types of primary keys.
The foreign key is an attribute or a combination of several attributes that is the primary key in another table. This is used to create links between tables in databases. Depending on the primary key in the other table, it can be a composite or unique foreign key.
However, it is important to mention that the foreign key does not have to be unique in its own table, but values can also occur more often. With the help of such connections, large tables can be split into several smaller tables, which saves storage space and improves clarity.
What is referential integrity?
Referential integrity (RI) ensures that table relationships in databases do not lead to inconsistencies in the database. For this purpose, a total of two rules are established that must be adhered to in order to ensure referential integrity:
- RI rule: When a new record with a foreign key is inserted, it must be ensured that a record for this key really exists in the referenced table. If this is not the case, the insertion process should not be possible.
- RI rule: A record must not be deleted if it is used as a foreign key in another table. The database must ensure that such transactions are not possible.
If these two rules are fulfilled, inconsistencies cannot occur when changes are made in the database.
How to create referential integrity in a database?
Modern database systems, such as MySQL, ensure that no transactions take place that violate referential integrity. In case of a violation, the transaction is not executed and a corresponding error message is issued. For example, if we wanted to delete a customer in our example with the SQL statement “DELETE”, an error would occur because the customer would still be referenced in the order table.
Since it can quickly become confusing in large databases and you don’t want to delete all entries manually to ensure referential integrity, there is a special SQL command that can be used to delete records and ensure integrity. Using “ON DELETE CASCADE” we can delete the customer and it would also automatically delete all the orders of the customer in the order table so that the integrity still exists.
In reality, of course, we would not perform such a transaction as it would distort the sales figures and instead rather work with a status for customers that marks them as either active or inactive.
What are the integrity rules for databases?
In addition to referential integrity, there are other integrity provisions that a database must meet in order to be in a consistent, or truthful, state. These include:
- Range integrity: The values in an attribute, i.e., a column, must lie within a predefined range. For example, the entry of age can only be between 0 and 150 years, but if an entry wants to store an age of 200 years, the range integrity is violated.
- Entity integrity: A unique primary key must be defined for each record, i.e. each entity.
- Referential integrity: It must be ensured that a foreign key refers to a data record. In addition, a primary key can only be deleted if the data record in which the key appears as a foreign key is also deleted.
- Logical consistency: Depending on the application, the user can set up his own integrity requirements that must also be met.
The database is only in a consistent state if all these conditions are met.
How do you maintain the referential integrity of a database sustainably?
Referential integrity not only ensures the initial establishment of relationships between tables but also plays a critical role in maintaining data integrity when updates or deletions occur. Handling these operations correctly is essential to prevent inconsistencies and maintain the accuracy of your database. Here’s how referential integrity comes into play when dealing with updates and deletions:
When you need to update a value in a table that is part of a relationship, referential integrity constraints help ensure that the changes don’t compromise the established relationships. Consider the following scenarios:
- Cascade Updates: With CASCADE referential integrity, updating a primary key in the referenced table will automatically update corresponding foreign keys in the related tables. This can be helpful when you want changes to propagate throughout the database seamlessly.
- Restrict Updates: Using the NO ACTION option prevents updates to the referenced primary key if related records exist. This can be useful when you want to ensure data consistency and avoid unintentional changes to key values.
- Set Null or Set Default: If you’ve defined the foreign key with the SET NULL or SET DEFAULT option, updating the primary key in the referenced table can lead to setting the corresponding foreign key values to NULL or the default value.
Dealing with deletions in the context of referential integrity requires careful consideration to prevent orphaned records and maintain the validity of relationships:
- Cascade Deletions: CASCADE referential integrity automatically deletes related records in child tables when the corresponding primary key is deleted. This can be a powerful tool but should be used with caution to avoid unintended data loss.
- Restrict Deletions: Similar to the NO ACTION option during updates, using NO ACTION or RESTRICT prevents deletion of a record if it has related records in child tables. This is useful to maintain data integrity and avoid accidental data loss.
- Set Null or Set Default: When using the SET NULL or SET DEFAULT option, deleting a record with associated foreign keys can lead to setting those foreign keys to NULL or the default value.
- Before implementing CASCADE actions, carefully analyze the impact on your database. Cascading deletions can lead to significant data loss if not used judiciously.
- Choose the appropriate referential integrity option based on your application’s requirements and data integrity goals.
- Avoid overly complex relationships that might result in unintended side effects when updates or deletions occur.
- Document the relationships and referential integrity constraints in your database schema to provide clear guidance for future maintenance and development.
- Test updates and deletions in a controlled environment before deploying changes to a production database.
- Regularly monitor your database for any anomalies related to updates and deletions, especially when using CASCADE actions.
In conclusion, handling updates and deletions within the framework of referential integrity is crucial for maintaining a consistent and accurate database. By selecting appropriate referential integrity actions and considering the potential consequences of updates and deletions, you ensure that your database remains reliable and free from inconsistencies, even as data evolves over time.
What is Normalization?
Normalization refers to a concept from database design with the aim of eliminating redundancies, i.e. duplications in the database. This saves storage space and also prevents anomalies from occurring. Integrity and normalization are two closely related topics in database theory, so they should always be considered together.
In practice, three normal forms are particularly important. This is because often, when these are satisfied, the database is performing well, and relatively little work has had to be done. Thus, the cost-benefit ratio is comparatively high up to the third normal form. In theory, however, there are up to five normal forms, but in this article, we will limit ourselves only to the first three:
The 1st normal form is achieved when all records are atomic. This means that each data field may contain only one value. It should also be ensured that each column contains only values of the same data type (numeric, text, etc.). The following examples would need to be modified accordingly to have a database in 1st normal form:
The 2nd normal form is satisfied if the first normal form is satisfied, and also each column in a row is fully functionally dependent on the primary key. The primary key denotes an attribute that can be used to uniquely identify a database row. This includes, for example, the invoice number to identify an invoice or the ID number to identify a person.
The 3rd normal form exists if the two preceding normal forms are fulfilled and there are additionally no so-called transitive dependencies. A transitive dependency exists when an attribute that is not a primary key depends not only on this primary key but also on other attributes.
This is what you should take with you
- Referential integrity describes provisions about foreign keys in a database that lead to consistent records.
- When changing data records, it must be ensured that foreign keys refer to existing data records and that all referring data records are also deleted when primary keys are deleted.
- Modern database systems already do not allow transactions that would violate referential integrity and issue a corresponding error message.
- In addition to referential integrity, there is also range integrity, entity integrity, and logical consistency, which ensure that data is consistent.
Other Articles on the Topic of Referential Integrity
IBM has published an interesting article on referential integrity.