A database is an organized and structured collection of information that is normally stored in a computer system (source: Oracle). The operation and administration of the database usually take place in a database management system (DBMS).
What is a database?
In a database, large amounts of data are usually stored in a structured manner and made available for retrieval. This is almost always an electronic system. Theoretically, however, analog information collections, such as a library, are also databases.
As early as the 1960s, the need for centralized data storage arose because things like data access authorization or data validation should not be done within an application, but separately from it.
What is a DBMS?
Databases consist of two major components. One is the actual data storage and the other is the so-called database management system (DBMS for short). Simply put, it acts as an interface between the data and the end-users. MySQL is an example of a concrete DBMS from Oracle.
The central tasks of a DBMS include, for example:
- Storage, modification, and deletion of data
- Definition and compliance with the data model
- Adding users and creating the corresponding rights
This management system further ensures that the so-called ACID properties are maintained within the data store. These include the following points:
- Atomicity (A): Data transactions, e.g. the entry of a new data record or the deletion of an old one, should either be executed completely or not at all. For other users, the transaction is only visible when it is completely executed. In the database of a financial institution, for example, the transfer from one account to another is only visible when the transaction is completely executed in both tables.
- Consistency (C): This property is satisfied when each data transaction moves the data store from a consistent state to a consistent state.
- Isolation (I): When multiple transactions occur simultaneously, the final state must be the same as if the transactions occurred separately. That is, the database should pass the stress test. In other words, it should not result in incorrect database transactions due to overload.
- Durability (D): The data must only change as a result of a transaction and must not be changeable by external influences. For example, a software update must not inadvertently cause data to change or possibly be deleted.
What are the Types of Databases?
There are many different types of data collection, which also depend primarily on the type of use within an organization or company. Various influencing factors play a role, such as the number of potential users and data queries, as well as the type of data to be stored:
- Relational Databases: This is where data is stored that can be stored in a tabular format, i.e. with rows and columns.
- Distributed Databases: If the data is to be stored on several different computers, this is called a distributed database. This is useful, for example, if you want to make the data collection fail-safe or if you need to handle a large number of data queries.
- Data Warehouse: If data is to be centrally accessible within a company, this is referred to as a data warehouse. Here, data from different source systems are stored and brought into a uniform data form.
- NoSQL Database: If the data to be stored does not correspond to a relational schema, for example in the case of unstructured data, it is stored in so-called NoSQL (“Not only SQL”) data collections.
These are just a few of the most common database types. Over time, many more types have emerged, but we cannot go into detail about them in this article.
If large data warehouses are introduced into organizations, administrators face a wide variety of challenges. The following points should already be considered when creating the data collection:
- Ability to increase the amount of data: Due to the ever-increasing amount of data that is generated and stored within a company, the system must have sufficient resources to expand the amount of data.
- Data Security: When partially confidential information is stored in a central location, it naturally provides a target for unauthorized access. This includes not only securing it from outside access but also distributing permissions for users within the organization.
- Scalability: As a company grows, the amount of information naturally grows as well. The database solution should be prepared for this and be able to handle more user queries and data.
- Data Timeliness: In today’s world, we are accustomed to receiving information without delay, and the same naturally applies to data storage. Therefore, architectures must be built that process and make information available as quickly as possible.
Structured Query Language
The Structured Query Language (SQL) is the most commonly used language when working with relational databases. The language can be used for much more than simple queries, despite its name. It can also be used to perform all operations necessary to create and maintain data collections.
SQL offers many functions to read, modify or delete data. It is actually used in all common relational database systems and is widely used. In addition, non-relational systems also offer extensions so that the query language can be used even though the data is not arranged in tables. This is probably due to the numerous advantages SQL offers:
- It is semantically very easy to read and understand. The commands can be understood to a large extent even by beginners.
- The language can be used directly within the database environment. For basic work with information, the data does not have to be transferred from the collection to another tool first.
- Simple calculations and queries are possible directly in the data collection.
- Compared to other spreadsheet tools, such as Excel, data analysis with Structured Query Language can be easily replicated and copied because everyone has access to the same data in the collection. Thus, the same query always leads to the same result.
This is what you should take with you
- A database is a system used to collect information in an organized and structured way.
- The relational storage system is still the most common. However, NoSQL solutions or data warehouses are also becoming increasingly popular.
- When creating such data collections, there are many different challenges to consider, such as scalability or data security.
- For querying and maintaining databases, the Structured Query Language (SQL) is still used in many cases.
Explanation of the Apache Hadoop Distributed File System with examples and benefits.
Other Articles on the Topic of Databases
- A detailed topic page can be found at Oracle, from which, for example, the well-known DBMS called “MySQL” originates.