A database schema describes the logical dependencies between database objects, such as relations, i.e. tables, or views. It defines how the relationships between attributes and tables are regulated. The relational query language SQL is used for this purpose.
What is a relational Database Schema?
With large amounts of data, stored in databases or the data warehouse quickly becomes confusing and queries are not only complicated but also take a relatively long time. Therefore, intelligent ways are needed to create tables so that memory can be saved and queries can take place more quickly.
In order to exploit the full potential of databases, their structure is described and optimized in the database schema. It also ensures that all data users can independently find out where the required information is located and which query can be used to access it. In addition to the relationships between the entities, such as tables or views, the database processes are also stored in the schema.
What are the Types of Database Schemas?
In the application, one basically distinguishes two different types of database schemas. The difference between these types lies mainly in how practical they are already predisposed to:
- The logical schema defines the relationship between tables and other entities. It also determines the so-called integrity conditions. There are several integrity conditions that lead to consistent data. For example, one of them specifies that no column in the table should contain null values.
- The physical schema, on the other hand, is more concerned with the practical storage of the data and describes with which indices the data should be physically stored on the storage medium.
What is the difference between a Database Schema and a Database Instance?
The terms database instance and database schema are often used as synonyms, but they describe different properties. The database instance is a copy of a database at a certain point in time, which itself also contains data. Thus, there can be different database instances at different points in time.
The database schema, on the other hand, is rather static and is very difficult to change again once it has been put into operation. Changing existing tables in which information has already been stored is very difficult to implement since the corresponding pipelines that populate the database must also be changed.
What are the Database Schemas used in practice?
In practice, two relational database schemas, in particular, have become established, which are used depending on the specific application.
What is the Star Scheme?
The first approach to this is the star schema, which includes star-shaped table structures. A distinction is made between facts and dimensions:
- The facts are key figures or measured values that are to be analyzed or illustrated. They form the center of the analysis and are located in the central fact table. In addition to the key figures, this also consists of the keys that refer to the surrounding dimensions. In the business environment, facts are, for example, the sales quantity, the turnover, or the incoming orders.
- The dimensions, on the other hand, are the properties of the facts and can be used to visualize the key figures. The different levels of detail of the dimensions are then stored in them and thus storage space can be saved since the details only have to be stored once in the dimension table. Dimensions in the corporate environment are, for example, customer information, the date of the order or product information.
The star schema deliberately omits normalization, which is normally an important concept in database theory. The third normal form is namely violated with a star schema. On the other hand, the structure is particularly efficient and provides fast answers even for complex queries.
What is the Snowflake scheme?
The so-called snowflake scheme is a further expansion stage of the star scheme with the goal of completely normalizing the tables and thereby circumventing the disadvantages of the star scheme to a certain extent. The structure of snowflake results, in short, from the fact that the dimension tables are broken down and classified even further. The fact table, however, remains unchanged.
In our example, this could lead to the dimension table with the delivery addresses being further classified into country, state, and city. This normalizes the tables and the third normal form is also fulfilled. However, this is at the expense of further branches. These are particularly disadvantageous in the case of a later query since these must be reassembled with complex joins.
The further branching thus leads to the fact that the data is stored less redundantly and thus the amount of data is reduced again in comparison to the star schema. However, this is at the expense of performance, since the dimension tables have to be merged again during the query, which is often very time-consuming.
Star Scheme vs. Snowflake Scheme
The Star schema and the Snowflake schema are relatively similar in structure and are often compared with each other for this reason. In fact, the choice of a suitable database schema depends mainly on the concrete application.
In short, the goal of the star schema is to provide a good basis for frequent queries and still reduce the amount of data. This is created by splitting into fact and dimension tables. This allows many redundancies to be removed and the first two normal forms to be satisfied. The number of tables remains relatively small and thus queries with few joins and fast response times are possible. However, complete normalization of the database cannot be performed and some redundancies remain.
The snowflake schema, on the other hand, is a further development of the star schema with the aim of bringing about a normalization of the database. The fact table is retained and the dimension tables are further classified and divided into additional relations. Although this eliminates the remaining redundancies of the star schema, it makes queries slower and more time-consuming, since the dimension tables must first be merged again.
This is what you should take with you
- A database schema is the logical description of the relationships between different database objects, such as tables or views.
- A distinction is made between the logical and the physical database schema.
- In practice, database schemas are mainly used in connection with relational databases. The star schema or snowflake schema is the most commonly used.
Other Articles on the Topic of Database Schemas
A detailed article about the database schema can also be found at Lucidchart. This served as a basis and source for this article.