The new breed of business needs to be polyglot in terms of databases because it’s very important to achieve the efficiency required by modern businesses and to use a best of breed solution for every problem.
If you work in the financial and banking sector and you think about the word “database”, it’s quite common to think about it as a large box that allows data storage and retrieval with reliability characteristics in mind:
- Data reliability: the data are safe, forever. If a problem arises and something is broken, you can be sure that if a banking system has said that the data were saved, they have been saved forever.
- System reliability: the database never breaks. And we’re not talking about “huge” breaks but also very minor and short lived ones; small issues that could cause minor outages typically do not happen
- Security reliability: databases have a long reputation of being secure. You can define low grain data security where single tables, rows and columns can be configured to be accessed only by the required people, and every action can be logged if needed.
- People reliability: the “database administrator” or DBA is a very specific professional figure, and you can find services that provide this resource if you have a shortage of people.
This reliability obviously comes at a cost. There are evident costs in terms of licensing, infrastructure and personnel, but there is also a hidden cost: a database is a “one size fits all” kind of infrastructure. This means that, like in the old saying, “if the only tool you have is a hammer, everything looks like a nail”. The relational nature of databases and the standardization of SQL language had a significant positive impact on the business side, but this also comes with a business cost.
Typical anti-patternes of relational databaes
The ubiquitous presence of the relational database, its availability, and the fact that a lot of developers are comfortable designing a database and dealing with the SQL language has created a lot of anti-patterns that we’ve seen implemented in various projects.
The database as a queue
It’s very simple to insert data into a database to send a message to another component, and so use the DB as a message queue. This has the positive impact that you don’t have to implement a message queue component, but creates a lot of issues when you have to delete the message, because delete operations are typically quite slow.
The database a document archive
Using the database to store large documents is quite common. In this case the impact is that once you have documents stored in a database you lose the ability to search and index them (unless your relational database has some optional indexing capabilities that you can use).
The database to store time series
Some data, like logs or historical data, have a natural time drive format that can easily be implemented in a database schema by adding some date-time columns. The drawback of this is that when you start to have millions of rows and you want to purge old data the problem arises of low performance of the deletion. To solve this, database vendors have created specific tools (sometimes requiring an additional license) to partition data by date and avoid this issue.
The multiplicity of SQL dialect
Because the SQL language is a standard that everyone can improve by adding new keywords, almost every database vendor has freely implemented new keywords to define and manage features that standard SQL does not cover. This means, on the one hand, that the capability to easily move from one database platform to another is theoretical because it will require rewriting a lot of SQL code and, on the other, that the additional features (think about features like geo-spatial data management or graph management) are implemented in a very different manner by different databases.
It's a polyglot world
At the beginning of this century, the research laboratories of Google, Facebook, Amazon and others publicly released a lot of new technologies that started what is called the “NoSQL” movement. This defines a database as something that is “Not Only SQL”, and is typically associated with the notion of “big data”. For sure one of the main goals of Google or Facebook was to overcome the limitation of databases when dealing with large (very large) amounts of data, without requiring all the costs outlined above connected with a relational database. However, it would be very short-sighted to see these technologies as something useful only for dealing with “big data”, because there are a lot of different technologies that can be used to solve business problems, even if they are not related to large volumes of data.
The new breed of business needs to be polyglot in terms of databases because it’s very important to achieve the efficiency required by modern businesses and to use a best of breed solution for every problem. The fact that the aforementioned NoSQL databases were created to sustain the needs of Google or Facebook does not mean that they need to be reserved only for large data business needs; it means that they can be scaled up for extremely large volumes (with hundreds or thousands of nodes, like those implemented at Google), but also scaled down to a limited number of nodes for practical business needs.
Massimo Gentilini
CRIF IT Solutions Area Director