Skip to content

Primary Key & Foreign Key – easily explained!

  • Data

A database requires primary keys and foreign keys to create and define a unique data model. This allows relationships to be established between individual tables in a relational database. This ensures the integrity, i.e. the correctness, completeness, and consistency, of a database.

In this article, we look in detail at primary and foreign keys and introduce their different types using a simple example. We also examine the role of these keys in the structure of intelligent database schemas and explain how the keys can be created using SQL.

What is a Primary Key?

Each table in a database ideally has one column or a combination of several columns that have a unique value for a record. This column or several common columns is called the primary key of the table. It ensures that each record, i.e. each row in the table, can be uniquely identified. This provides the ability to access individual records using an SQL query.

If a table does not have a column with unique values, multiple columns can also be used to establish this uniqueness. This distinguishes different types of primary keys.

What are the types of Primary Keys?

Depending on whether an attribute has unique values or not, there are different types of primary keys.

Unique Key

If there is exactly one column consisting of unique values, then you have a unique key. If there are several such columns, you can decide which of the attributes you want to use, as each table can only have one primary key. Common examples of unique keys are social security numbers, order numbers or ID numbers. It is also important that this key type must not have any null values and that every data record in the table therefore has a unique value in this column.

Unique Key | Source: Author

In this table, only the attribute “Order ID” has unique values and is also suitable because the order number will remain unique.

What is a composite key?

If there are none of these attributes in a table that has only unique values, you may have to resort to a composite key. In this case, the combination of different attributes creates unique values for a row.

Composite Key | Source: Author

With a small change to the previous table, the “Order ID” column no longer has unique values, since an order number can occur more often if the order has multiple items. For this reason, the primary key must consist of two composite attributes. This is because for each order there is only one record for the first position. Thus, the two columns “Order ID” and “Position ID” together are unique again.

What is an artificial key?

In rare cases, it can happen that even the combination of several columns does not lead to the desired, unique values. Then one must fall back to creating a new column, in which one creates an artificial key. This can, for example, number the data records more simply.

Artificial Key | Source: Author

In this slightly modified table there is now no single column and also no combination of attributes that are unique. Therefore, the additional column “ID” is inserted, which numbers the individual data records. Thus, the column “ID” contains only unique values and each record can be addressed uniquely.

What is a Foreign Key?

The foreign key is an attribute or a combination of several attributes that is the primary key in another table. This is used to create links between tables in databases. Depending on the primary key in the other table, it can be a composite or unique key.

Example of a Foreign Key | Source: Author

However, it is important to mention that the foreign key does not have to be unique in its own table, but values can also occur more often. With the help of such connections, large tables can be split into several smaller tables, which saves storage space and improves clarity.

What is the difference between primary and foreign keys?

In database management systems, both keys are used to establish relationships between tables. Although both types of constraints are used to ensure data integrity, they serve different purposes.

A primary key is a column or group of columns in a table that uniquely identifies each row in that table. It is a unique identifier that cannot be duplicated within the same table. It is used to ensure the integrity of entities, which means that each row in the table represents a unique entity. Primary keys are essential for indexing, sorting, and searching data, and are also used to establish relationships between tables.

A foreign key, on the other hand, is a column or set of columns in a table that references the primary key of another table. It is used to enforce referential integrity, which means that the values in their column must match the values in the primary key column of the referenced table. In this way, they establish relationships between tables and ensure that data remains consistent across multiple tables.

Here are some key differences between the two types of keys:

  • Primary keys are used to identify unique rows within a table, while foreign keys are used to establish relationships between tables.
  • Primary keys must be unique and cannot contain null values, while the other key can contain null values.
  • Primary keys are always defined at the table level, while the other ones can be defined at either the table or column level.
  • Primary keys can be used as foreign keys in other tables, while the other ones can only refer to them in other tables.

In summary, both keys are important tools for ensuring data integrity and establishing relationships between tables in a database. While primary keys are used to identify unique rows within a table, foreign keys are used to create relationships between tables by referencing the former in other tables. Understanding the differences between these two types of keys is essential to designing a well-structured and efficient database.

How can the keys be created using SQL?

In practice, databases are often managed using the Structured Query Language (SQL), which can be used to implement various functionalities such as querying or changing data, as well as creating keys.

The primary key is often already defined during table creation. Using the above example of an order table, this could look like this:

CREATE TABLE Orders (
    Order ID INT PRIMARY KEY,
    Customer VARCHAR(100),
    Sales FLOAT
);

The individual columns and their data type are defined here. The “Order ID” column receives the additional argument “PRIMARY KEY”. If new data is now loaded into the table, the data automatically checks that the order number is unique and otherwise generates an error.

If no primary key was defined when the database was created, this can also happen subsequently, although this should be avoided. To do this, use the “ALTER TABLE” command to change an existing table. You can then use “ADD CONSTRAINT” to define a condition and then specify the column to be used as the primary key.

ALTER TABLE Orders
ADD CONSTRAINT pk_Orders PRIMARY KEY (Order ID);

Each condition needs a unique name, which should ideally be descriptive so that it is still meaningful at a later point in time. In our example, we have therefore opted for the name “pk_Orders”.

A similar logic can also be used to define a foreign key, which in the best case is also defined during the creation of a table. Assuming we define the “Orders” table differently and store the primary key and the foreign key for the “Customer” table directly, the command looks like this:

CREATE TABLE Orders (
    Order ID INT PRIMARY KEY,
    Customer VARCHAR(100),
    Sales FLOAT,
    FOREIGN KEY (Customer) REFERENCES Customers(Customer)
);

This command now specifies that the “Customer” column in the “Orders” table references the “Customer” column in the “Customers” table as a foreign key.

What is the Star Scheme?

With large amounts of data stored in databases or the data warehouse quickly becomes confusing and queries are not only complicated but also take a relatively long time. Therefore, intelligent ways are needed to create tables so that memory can be saved and queries can take place more quickly.

The first approach to this is the star schema, which includes star-shaped table structures. A distinction is made between facts and dimensions:

  • The facts are key figures or measured values that are to be analyzed or illustrated. They form the center of the analysis and are located in the central fact table. In addition to the key figures, this also consists of the keys that refer to the surrounding dimensions. In the business environment, facts are, for example, the sales quantity, the turnover, or the incoming orders.
  • The dimensions, on the other hand, are the properties of the facts and can be used to visualize the key figures. The different levels of detail of the dimensions are then stored in these and thus memory can be saved since the details only have to be stored once in the dimension table. Dimensions in the corporate environment are, for example, customer information, the date of the order, or product information.

The star schema deliberately omits normalization, which is normally an important concept in database theory. The third normal form is namely violated with a star schema. On the other hand, the structure is particularly efficient and provides fast answers even for complex queries.

What is the Snowflake Scheme?

The so-called snowflake scheme is a further expansion stage of the star scheme with the goal of completely normalizing the tables and thereby circumventing the disadvantages of the star scheme to a certain extent. The structure of snowflake results, in short, from the fact that the dimension tables are broken down and classified even further. The fact table, however, remains unchanged.

In our example, this could lead to the dimension table with the delivery addresses being further classified into country, state, and city. This normalizes the tables and the third normal form is also fulfilled, but this is at the expense of further branches. These are particularly disadvantageous in the case of a later query since these must be reassembled with complex joins.

The further branching thus leads to the fact that the data is stored less redundantly and thus the amount of data is reduced again in comparison to the star schema. However, this is at the expense of performance, since the dimension tables have to be merged again during the query, which is often very time-consuming.

What common errors can occur?

If primary and foreign keys are maintained in a database, errors can occur in practice during data maintenance that make work more difficult. The following points should therefore be noted, as they occur frequently:

  • Lack of referential integrity: If a foreign key in a table refers to a primary key that no longer exists, inconsistencies can occur. Therefore, before deleting or changing a data record, it should be checked whether certain key entries need to be changed. To avoid the occurrence of such problems from the outset, the database can also be set up in such a way that these cases are handled automatically with the help of constraints.
  • Complexity in data maintenance: Simple processes such as deleting or updating data records can become significantly more complex due to key dependencies. Therefore, when deleting a data record with a primary key, for example, it must be defined in advance how the dependent data records are to be handled. If this is not done, the data records must be checked and assigned manually, which is not only time-consuming but also prone to errors.
  • Be careful when inserting data: When inserting new data records, the database will refuse to execute if a foreign key value is inserted that does not match a primary key value. This can occur in particular if data is imported in the wrong order or is simply inconsistent. The import sequence should therefore be carefully considered in advance or temporary tables should be used to avoid these problems.

If these main points are taken into account, the consistency of the database will be strengthened and database maintenance will be easier in the long term.

This is what you should take with you

  • The primary key is one or more attributes that uniquely identify each data record.
  • The foreign key consists of one or more attributes that are a primary key in another table. This makes it possible to create links between tables in a database.
  • SQL can be used to define such key columns. In the best case, they are stored when the table is created, but it is also possible to define them afterwards.
  • Depending on the structure and connection of the tables, a distinction is made between the star and snowflake schema.
  • The use of primary and foreign keys can lead to problems when maintaining databases, for example when deleting or updating data records. It is therefore important to observe certain rules to avoid problems.
Data Governance

What is Data Governance?

Ensure the quality, availability, and integrity of your organization's data through effective data governance. Learn more here.

Data Quality / Datenqualität

What is Data Quality?

Ensuring Data Quality: Importance, Challenges, and Best Practices. Learn how to maintain high-quality data to drive better business decisions.

Data Imputation / Imputation

What is Data Imputation?

Impute missing values with data imputation techniques. Optimize data quality and learn more about the techniques and importance.

Outlier Detection / Ausreißererkennung

What is Outlier Detection?

Discover hidden anomalies in your data with advanced outlier detection techniques. Improve decision-making and uncover valuable insights.

Bivariate Analysis / Bivariate Analyse

What is the Bivariate Analysis?

Unlock insights with bivariate analysis. Explore types, scatterplots, correlation, and regression. Enhance your data analysis skills.

RESTful API

What is a RESTful API?

Learn all about RESTful APIs and how they can make your web development projects more efficient and scalable.

Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Niklas Lang

I have been working as a machine learning engineer and software developer since 2020 and am passionate about the world of data, algorithms and software development. In addition to my work in the field, I teach at several German universities, including the IU International University of Applied Sciences and the Baden-Württemberg Cooperative State University, in the fields of data science, mathematics and business analytics.

My goal is to present complex topics such as statistics and machine learning in a way that makes them not only understandable, but also exciting and tangible. I combine practical experience from industry with sound theoretical foundations to prepare my students in the best possible way for the challenges of the data world.

Cookie Consent with Real Cookie Banner