NoSQL (“Not Only SQL”) describes databases that, unlike SQL, are non-relational, i.e. cannot be organized in tables, among other things. These approaches can also be distributed across different computer systems and are highly scalable. NoSQL solutions are therefore very interesting for many Big Data applications.
The databases are characterized by two criteria in particular, which are very broad. Firstly, data is not stored in tables and secondly, the query language is not SQL, which is also made clear by the name Not Only SQL.
What are the Advantages of NoSQL solutions?
These database systems offer several advantages over traditional SQL solutions that are crucial in the Big Data environment. The following factors are implemented in almost all NoSQL databases:
- We can scale NoSQL databases higher than comparable relational data stores because these systems are characterized by very fast data processing, and also because they do not have such high demands on data schemas and can therefore store data faster.
- Most NoSQL databases are open source and can therefore be used completely free of charge (except for storage capacity, of course), including database management.
- In practice, there are various data queries that conventional relational databases do not support or support only with a lot of effort.
- Due to the low demands of NoSQL solutions on the data schema, they are not so restrictive towards the data model. A relational data structure, on the other hand, can be very restrictive on the data model.
ACID Properties of SQL Databases
Classic relational databases fulfill the four so-called ACID properties. These state that the most important requirement for a database is to maintain the truthfulness and meaningfulness of the data. In many cases, data stores are seen as a “single point of truth”, so it would be fatal if erroneous information were stored and passed on. The four properties include the following points:
- Atomicity (A): Data transactions, e.g. the entry of a new data record or the deletion of an old one, should either be executed completely or not at all. For other users, the transaction is only visible when it is completely executed. In the database of a financial institution, for example, the transfer from one bank account to another is only visible when the transaction is completely executed in both tables.
- Consistency (C): This property is satisfied when each data transaction moves the database from a consistent state to a consistent state.
- Isolation (I): When multiple transactions occur simultaneously, the final state must be the same as if the transactions occurred separately. That is, the database should pass the stress test. In other words, incorrect database transactions should not occur due to overload.
- Durability (D): The data within the database must only change as a result of a transaction and must not be changeable by external influences. For example, a software update must not inadvertently cause data to change or possibly be deleted.
Does NoSQL meet the ACID properties?
NoSQL solutions generally cannot comply with the ACID properties, although there are exceptions, such as graph databases, which comply with all concepts. NoSQL databases are in many cases distributed across multiple devices and servers. This allows much larger amounts of data to be processed and stored simultaneously, which is the main requirement for these systems. However, this means that they do not fulfill the property of consistency.
Suppose we have implemented a NoSQL database on two physical servers, one located in Germany and the other in the USA. The databases contain the account balances and transactions of German and American customers. The German accounts are stored in Germany and the American accounts are stored on the American server.
It may now happen that a German customer makes a transfer to an American account. Then both data stores are changed and are inconsistent during this processing period. For example, it may happen that we start a database query while the processing in Germany has already been completed, but the processing in the USA has not yet been completed. In this time window, the “Inconsistency Window”, the data in the database is not correct and is inconsistent. This would not happen in a relational database.
What are the different NoSQL Categories?
NoSQL solutions fall into one of four categories:
- Document stores store a variety of information within a document. For example, a document could contain all the data for one day.
- Key-value stores are very simple data structures in which each record is stored as a value with a unique key. This key can be used to retrieve specific information.
- Wide-Column Store stores a data record in a column and not as usual in a row. They have been optimized to quickly find information in large data sets.
- Graph databases store information in so-called nodes and edges. This makes it very easy to represent social networks, for example, in which people are individual nodes and the relationship between them is represented as an edge.
What are Examples of Not Only SQL Databases?
The best-known examples of NoSQL databases are Apache Cassandra, MongoDB, Redis, and Neo4j. These also belong to the different NoSQL database categories:
- MongoDB belongs to the Document Stores. The individual data records are stored in so-called documents, whereby one can imagine a data record as a row in a relational database. All documents with a comparable structure are summarized in the so-called collections, which are again comparable with a database in the relational environment.
- Redis is a key-value store that is primarily used with streaming solutions such as Apache Kafka. Real-time data is cached in it and made available for analyses or calculations. Due to the short latency times, it offers the optimal solution for chats, games, or caching applications.
- Apache Cassandra is an open-source wide-column store. This solution is made to store large amounts of data on distributed systems, ensuring high accessibility.
- Neo4j is a graph database that can be used for free to a certain extent. With the help of nodes and edges, knowledge graphs can be organized or social networks can be mapped, for example.
Why do we need NoSQL Databases?
NoSQL solutions are especially suitable when the advantages of these databases are a main component of the application. The most important factors in such projects are usually that one does not have a uniform data schema, a very large amount of data is expected that one wants to distribute across different systems and data must be processed very quickly. In such cases, one also accepts a short-term inconsistency of the information.
A classic example of this is web movement data, which is all the information we collect about a user who is on our website. On an established site, a lot of data can accumulate very quickly in a short period, especially if we want to track the user’s actions in a very granular way. At the same time, we have a large and flexible data structure with logins, visits to different pages, and many buttons that can potentially be clicked. In this case, it makes sense to use a graph database, where we can track this movement pattern very well across the edges.
How to find the right NoSQL Database?
The selection of a suitable NoSQL database depends on various factors and it is not always necessary to switch from a classic relational database. Here are some points that should be considered when making your choice:
- Data model: NoSQL databases are available for a wide variety of data models, such as document, key-value, columnar, or graph data. You should choose a database that fits your data model to be able to take full advantage of the database.
- Scalability: Not-only SQL databases are known to be horizontally scalable to ensure fast response times even with large amounts of data and many queries. Before setting up a database, you should make sure that scalability is easily possible and can be easily implemented by your team and hardware.
- Performance: The performance of a database is evaluated differently depending on the type of application. In one application, small amounts of data are written to the database very frequently, while in other applications large amounts have to be recorded once a day. Before selecting the database, the scope of performance should be defined to select the most suitable database and then check whether the performance can also be implemented.
- Consistency: Depending on the requirement for data consistency, the choice of databases is massively limited. When choosing a NoSQL database, the degree of consistency the data should have should be taken into account to avoid problems in practice.
- Durability: In the area of durability, the extent to which the data should be protected against loss should be evaluated. It should be checked how the database behaves in the event of a system failure and which data is subsequently restored.
- Ease of use: User-friendliness also plays an important role in the selection of a suitable database. Attention should be paid to how independently users can operate the new database and it should be checked whether training may be necessary. The Neo4j graph database, for example, uses its query language, which may not be familiar to all users. It should also be checked how the database can be connected to existing systems, for example via APIs.
- Costs: Finally, the license and usage costs that may arise over the total operating time should not be forgotten. In addition to the pure usage costs, costs for hosting and maintenance should also be taken into account.
If these factors are taken into account when selecting a NoSQL database, a suitable database can be found for the respective application and the best possible solution can be found.
This is what you should take with you
- NoSQL databases are a popular alternative to traditional relational databases.
- They are designed to handle large amounts of unstructured or semi-structured data.
- NoSQL databases offer high scalability, availability, and fault tolerance.
- Choosing the right NoSQL database depends on your specific use case and requirements.
- Key factors to consider include data modeling, scalability, consistency, availability, and security.
- Some popular NoSQL databases include MongoDB, Cassandra, Couchbase, Redis, and Amazon DynamoDB.
- Overall, NoSQL databases offer a flexible and powerful solution for managing large and complex data sets, but it is important to carefully evaluate and choose the right database for your needs.
What is Data Quality?
Ensuring Data Quality: Importance, Challenges, and Best Practices. Learn how to maintain high-quality data to drive better business decisions.
What is Data Imputation?
Impute missing values with data imputation techniques. Optimize data quality and learn more about the techniques and importance.
What is Outlier Detection?
Discover hidden anomalies in your data with advanced outlier detection techniques. Improve decision-making and uncover valuable insights.
What is the Bivariate Analysis?
Unlock insights with bivariate analysis. Explore types, scatterplots, correlation, and regression. Enhance your data analysis skills.
What is a RESTful API?
Learn all about RESTful APIs and how they can make your web development projects more efficient and scalable.
What is Time Series Data?
Unlock insights from time series data with analysis and forecasting techniques. Discover trends and patterns for informed decision-making.
Other Articles on the Topic of NoSQL
- The most famous examples of NoSQL Databases are Apache Cassandra, MongoDB, Redis, and Neo4j.
Niklas Lang
I have been working as a machine learning engineer and software developer since 2020 and am passionate about the world of data, algorithms and software development. In addition to my work in the field, I teach at several German universities, including the IU International University of Applied Sciences and the Baden-Württemberg Cooperative State University, in the fields of data science, mathematics and business analytics.
My goal is to present complex topics such as statistics and machine learning in a way that makes them not only understandable, but also exciting and tangible. I combine practical experience from industry with sound theoretical foundations to prepare my students in the best possible way for the challenges of the data world.