Last Updated:

What databases exist and what they are used for: SQL, NoSQL, OLAP

When you start working on a new project, it is important to understand what criteria to choose databases by and what solutions there are in general.

Database

There are many different database families that differ in the structure of data storage, processing, and indexing. The main user scenarios determine which model is best suited in a particular case. Sometimes it is quite justified to use several different databases at once, copying the same data several times.

Here are the most important criteria for choosing a database:

  • what data should be stored in it,
  • what will be the amount of data,
  • What kind of queries will be executed when the database is accessed.

Relational databases
(SQL databases)

Relational databases are the most common. Here are just the most popular of them: Oracle, Microsoft SQL Server, PostgreSQL, MySQL. Such databases provide line-by-line storage of data in tables, which implies a strict data structure. And it is also implied that for one access to the database you will request a relatively small number of records.

Such solutions generally work quite well if the total amount of your data does not exceed a few terabytes (of course, if you have the right infrastructure), which generally makes them suitable for most projects – especially at the initial stage of development.

Relational databases are worth choosing if the following characteristics are important to you:

  • transactionality;
  • Frequent data changes
  • search by indexes;
  • Request a small number of records at a time.
  • the amount of data does not exceed several terabytes.

The choice of a particular relational database depends on additional security requirements, support, and other factors. For example, in the banking sector, they prefer to use Oracle and Microsoft SQL Server. However, these are paid solutions - and open and free PostgreSQL also shows very good performance, is actively developing and distributed under a free license. If you have a very small project, you can use any relational database.

NoSQL Databases

An alternative for relational databases is NoSQL databases. These can be document-oriented graph databases or key-value-stores.

Document-oriented databases

Document-oriented databases (for example, MongoDB, Amazon DocumentDB, CouchDB, and others) store data immediately in ready-made "documents", and not in tables and rows - like relational databases. This storage method is appropriate when the structure of the data can change or your main use cases involve loading a composite structure. Document-oriented databases are very close to the format of resources exchanged between the client and the server, which simplifies the preparation of data for transmission over the network.

In articles and books, as a typical example of the work of such databases, the scenario of loading the user's page in a social network is often cited. In this case, the person has basic data: full name, date of birth, gender – and additional information, which may include several elements and require a different structure. For example, places of study, residence, work.

As you know, one person can have many places of study or residence, and addresses in different countries can consist of different entities. In Russia, this region → city → street → house, and in France - the province of → region → city → street → house. Yes, such structures can be collected on relational databases, but in this case you will have to perform several additional queries to collect all the information about a person and draw his page.

Document-oriented databases allow you to store such information about the user in its entirety, in one place and receive it in one request.

At the same time, document-oriented databases support the ability to use references to other records in the database - and this allows you to get closer to the relational model. By the way, relational models are also moving towards supporting composite data structures – such as JSON – and allow you to search the contents of a "complex, composite" field.

Thus, the main advantage of document-oriented databases is the ability to store data without strict restrictions on the structure.

Key-value-storage

Key-value-stores (Redis, Aerospike, DynamoDB and others) store data in the form of a hash table. In such a model, each record has only one index. There is no strict restriction on the structure of the value. As a rule, such databases change data on the principle of logs, that is, they always add values to the end, and deletion is performed by adding a special record. Key-value-stores serve well as a cache, queue or logging.

Graph databases

Graph databases store all data in the form of nodes and relationships between them. This approach can speed up queries in some cases and is used in recommendation engines and applications related to geolocation. So, if you're making an app like this, maybe you should pay attention to graph storage. Although we must admit that these are specific solutions.

OLAP Databases

For analytical systems that involve working with huge amounts of data - tens of terabytes, or even several petabytes - there is a separate class of storage that stores data in a column-oriented model.

Unlike a relational data model, where the entire row of a table is stored sequentially, or a document-oriented model, where the entire document is stored sequentially, here the values of one column are stored sequentially, and it is understood that in the same position of each column values related to one row are stored. Such a model allows you to effectively compress data and build various aggregates over them: sum, average, quantity and others.

OLAP systems allow you to add data in a continuous stream or load in chunks - but they are usually not allowed to be deleted or changed. Their main task is to allow analysts and company management to analyze data. Therefore, they should store the history of events, while the rest of the databases imply storing only the current state of each entity. Olap stores typically support SQL query syntax because it is well suited for analytical tasks.

Most solutions in this area – for example, Vertica, Teradata, BigQuery – are quite expensive. However, there are also open source solutions such as ClickHouse, Apache Druid and others.

What to choose?

As a rule, most projects have enough relational databases, among which you can choose suitable free solutions. However, for some specific use cases, a document-oriented data storage model is better suited.

Of course, it is worth considering that now many relational databases support the ability to store data in JSON or XML format, which to some extent allows them to compete with the document-oriented model.

If your project has grown to a size where these solutions are not enough, it makes sense to add key-value storage as a cache or look for another hybrid data management model.

But if you need to build an analytics system with the processing of huge amounts of data, then, most likely, you will have to consider storage from the OLAP category. These include column-oriented stores or column family category stores, such as HBase or Google Cloud Bigtable.