A database requires primary keys and foreign keys to create and define a unique data model. This allows relationships to be established between individual tables in a relational database. This ensures the integrity, i.e. the correctness, completeness, and consistency, of a database.
What is a Primary Key?
Each table in a database ideally has one column or a combination of several columns that have a unique value for a record. This column or several common columns is called the primary key of the table. It ensures that each record, i.e. each row in the table, can be uniquely identified. This provides the ability to access individual records using an SQL query.
If a table does not have a column with unique values, multiple columns can also be used to establish this uniqueness. This distinguishes different types of primary keys.
What are the types of Primary Keys?
Depending on whether an attribute has unique values or not, there are different types of primary keys.
If there is exactly one column consisting of unique values, then you have a unique key. If there should be several such columns, one can decide which of the attributes to use. Common examples of unique keys are, for example, social security numbers, order numbers, or ID card numbers.
In this table, only the attribute “Order ID” has unique values and is also suitable because the order number will remain unique.
What is a composite key?
If there are none of these attributes in a table that has only unique values, you may have to resort to a composite key. In this case, the combination of different attributes creates unique values for a row.
With a small change to the previous table, the “Order ID” column no longer has unique values, since an order number can occur more often if the order has multiple items. For this reason, the primary key must consist of two composite attributes. This is because for each order there is only one record for the first position. Thus, the two columns “Order ID” and “Position ID” together are unique again.
What is an artificial key?
In rare cases, it can happen that even the combination of several columns does not lead to the desired, unique values. Then one must fall back to creating a new column, in which one creates an artificial key. This can, for example, number the data records more simply.
In this slightly modified table there is now no single column and also no combination of attributes that are unique. Therefore, the additional column “ID” is inserted, which numbers the individual data records. Thus, the column “ID” contains only unique values and each record can be addressed uniquely.
What is a Foreign Key?
The foreign key is an attribute or a combination of several attributes that is the primary key in another table. This is used to create links between tables in databases. Depending on the primary key in the other table, it can be a composite or unique key.
However, it is important to mention that the foreign key does not have to be unique in its own table, but values can also occur more often. With the help of such connections, large tables can be split into several smaller tables, which saves storage space and improves clarity.
What is the difference between primary and foreign keys?
In database management systems, both keys are used to establish relationships between tables. Although both types of constraints are used to ensure data integrity, they serve different purposes.
A primary key is a column or group of columns in a table that uniquely identifies each row in that table. It is a unique identifier that cannot be duplicated within the same table. A primary key is used to ensure the integrity of entities, which means that each row in the table represents a unique entity. Primary keys are essential for indexing, sorting, and searching data, and are also used to establish relationships between tables.
A foreign key, on the other hand, is a column or set of columns in a table that references the primary key of another table. It is used to enforce referential integrity, which means that the values in their column must match the values in the primary key column of the referenced table. In this way, they establish relationships between tables and ensure that data remains consistent across multiple tables.
Here are some key differences between the two types of keys:
- Primary keys are used to identify unique rows within a table, while foreign keys are used to establish relationships between tables.
- Primary keys must be unique and cannot contain null values, while foreign keys can contain null values.
- Primary keys are always defined at the table level, while foreign keys can be defined at either the table or column level.
- Primary keys can be used as foreign keys in other tables, while the other ones can only refer to them in other tables.
In summary, both keys are important tools for ensuring data integrity and establishing relationships between tables in a database. While primary keys are used to identify unique rows within a table, foreign keys are used to create relationships between tables by referencing the former in other tables. Understanding the differences between these two types of keys is essential to designing a well-structured and efficient database.
What is the Star Scheme?
With large amounts of data stored in databases or the data warehouse quickly becomes confusing and queries are not only complicated but also take a relatively long time. Therefore, intelligent ways are needed to create tables so that memory can be saved and queries can take place more quickly.
The first approach to this is the star schema, which includes star-shaped table structures. A distinction is made between facts and dimensions:
- The facts are key figures or measured values that are to be analyzed or illustrated. They form the center of the analysis and are located in the central fact table. In addition to the key figures, this also consists of the keys that refer to the surrounding dimensions. In the business environment, facts are, for example, the sales quantity, the turnover, or the incoming orders.
- The dimensions, on the other hand, are the properties of the facts and can be used to visualize the key figures. The different levels of detail of the dimensions are then stored in these and thus memory can be saved since the details only have to be stored once in the dimension table. Dimensions in the corporate environment are, for example, customer information, the date of the order, or product information.
The star schema deliberately omits normalization, which is normally an important concept in database theory. The third normal form is namely violated with a star schema. On the other hand, the structure is particularly efficient and provides fast answers even for complex queries.
What is the Snowflake Scheme?
The so-called snowflake scheme is a further expansion stage of the star scheme with the goal of completely normalizing the tables and thereby circumventing the disadvantages of the star scheme to a certain extent. The structure of snowflake results, in short, from the fact that the dimension tables are broken down and classified even further. The fact table, however, remains unchanged.
In our example, this could lead to the dimension table with the delivery addresses being further classified into country, state, and city. This normalizes the tables and the third normal form is also fulfilled, but this is at the expense of further branches. These are particularly disadvantageous in the case of a later query since these must be reassembled with complex joins.
The further branching thus leads to the fact that the data is stored less redundantly and thus the amount of data is reduced again in comparison to the star schema. However, this is at the expense of performance, since the dimension tables have to be merged again during the query, which is often very time-consuming.
This is what you should take with you
- The primary key is one or more attributes that uniquely identify each record.
- The foreign key consists of one or more attributes that are the primary key in another table. This allows connections to be made between tables in a database.
- Depending on the structure and connection of the tables, a distinction is made between the star and snowflake scheme.
What is the Snowflake Schema?
Explanation of the Snowflake scheme compared to the Star scheme.
What is Data Augmentation?
Use and methods of data augmentation.
What is Tableau?
Learn how to use Tableau for data visualization and analysis in our comprehensive guide.
What is the Normalization of databases?
Learn about database normalization and how it can improve your database. Maximize efficiency and minimize redundancy with normalization.
What is Apache Parquet?
Learn how to optimize Big Data storage with Apache Parquet. Explore its features, benefits, and implementation in this comprehensive guide.
What are CSV files?
Learn all about CSV files, including how to they are structured, best practices and comparison to Apache Parquet.
What is the CAP Theorem?
Understanding CAP Theorem: Consistency, Availability, and Partition Tolerance in Distributed Systems. Learn the trade-offs in system design.
What is Batch Processing?
Learn about batch processing in data science. Discover how batch processing works, its advantages, and common applications.
What is the Modern Data Stack?
Discover the modern data stack: A comprehensive guide to building scalable and efficient data pipelines. Learn more now!
What is Apache Airflow?
Discover Apache Airflow, a platform for programmatically authoring, scheduling, and monitoring workflows in data engineering.
Other Articles on the Topic of Primary Keys and Foreign Keys
- IBM provides a good article on the different keys of databases.