Skip to content

What is a Data Warehouse?

  • Data

A data warehouse is a central data store in a company or organization that collects relational data from various sources. The information is transferred from different transactional systems or other relational databases to the data warehouse, where it is available to analysts and decision-makers.

Data Warehouse Function

The data warehouse is used in many areas in the business environment. The database is used throughout the company to make data-driven decisions or to examine processes. Since the central data warehouse draws information from many different systems, it is seen as a single point of truth. This is to ensure that everyone in the company is talking about the same data and that decisions are based on this information.

Across departments, the data warehouse can be used for the following tasks:

  • Cost and resource analysis
  • Analysis of internal processes (e.g. production, hiring, etc.)
  • Business Intelligence
  • Calculation and provision of company-wide key performance indicators
  • Data source for analyses or data mining
  • Standardization of company-wide data into a fixed schema

Properties of Data Warehouses

When creating central data warehouses, one can be guided by certain characteristics that should help to better narrow down the structure and the necessary data of the warehouse.

Topic Orientation

A data warehouse contains information on a specific topic and not on individual business transactions. These topics can be, for example, sales, purchasing or marketing.

The warehouse aims to support decision-making with the help of business intelligence and targeted KPIs. This interpretation is also supported by the fact that information which is not relevant to decision-making or used for analysis does not end up in this central database in the first place.

Integration

The warehouse integrates data from a wide variety of systems and sources. Therefore, a common schema must be created for the information so that it is uniform and comparable. Otherwise, central analysis and KPI creation is not possible.

Time Period Reference

The data warehouse stores data from a specific time period and is therefore related to the past. Furthermore, the data is usually transmitted in aggregated form, for example at the daily level, so that the volume of data remains limited. Thus, the granularity may not be fine enough, as one is used to from the operational systems.

The operational systems, on the other hand, are time-based, as they output the information that is currently accumulating. At the same time, the information can be viewed in great detail.

Non-Volatility

Another important feature of central warehouses is the non-volatility of the data. In operational systems, information is usually only stored temporarily for a short period of time, and as soon as new data is added, old data is overwritten. In a data warehouse, on the other hand, data is stored permanently and old data persists even when newer data is added.

Difference to Data Lake

This central database can additionally be supplemented by a data lake, in which unstructured raw data is stored temporarily at a low cost so that it can be used at a later date. The two concepts differ primarily in the data they store and the way the information is stored.

FeaturesData WarehouseData Lake
DataRelational data from productive systems or other databases.All Data Types (structured, semi-structured, unstructured).
Data SchemaCan be scheduled either before the data warehouse is created or only during the analysis (schema-on-write or schema-on-read)Exclusively at the time of analysis (schema-on-read)
QueryWith local memory very fast query results– Decoupling of calculations and memory
– Fast query results with inexpensive memory
Data Quality– Pre-processed data from different sources
– Unification
– Single point of truth
– Raw data
– Processed and unprocessed
ApplicationsBusiness intelligence and graphical preparation of dataArtificial Intelligence, Analytics, Business Intelligence, Big Data
Comparison of Data Warehouse and Data Lake

This is what you should take with you

  • A data warehouse stores company-wide information centrally.
  • This is intended to support data-driven decisions and make business intelligence possible.
  • The unstructured raw data in the data lake provides a good complement to the relational and processed data in the data warehouse.

Other Articles on the Topic of Data Warehouses

  • Amazon Web Services provides a good summary here.
close
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner