This guide is geared towards experienced professionals who build enterprise scalable systems with relational databases such as Oracle, SQL Server, MySQL, ProgresSQL, etc. Now they might need to use NoSQL databases for their next project or migrate the current DB to a NoSQL, given the right use case. This could be daunting as NoSQL will challenge some fundamental principles such as Normalization.
NoSQL databases have evolved over the last decade. NoSQL can now support ACID transactions also replicate data globally providing high availability and auto-failover — all of which you can setup with a few clicks! The question is when to use NoSQL and when to stick to good old SQL databases. As always, the answer is it depends. This article will introduce you to some key concepts to guide you through that thought process.
To start with, here’s a comparison table — it has a lot so take it with a grain of salt.
NoSQL engines are fundamentally built with scalability in mind. They are much simpler than RDBMS engines as they are document stores as opposed to a set of tables with all the datatypes enforcements, constraint requirements and storing relationships. NoSQL DB is built up on a concept called ‘consistent hashing’, which means all documents are equally distributed across multiple partitions (servers/nodes). This architecture facilitates scaling out and allows for horizontal partitioning.
High availability and scalability are prioritized over strong consistency in the NoSQL world (CAP Theorem).
In a nutshell, you are dealing with a much simpler core DB engine.
Database design and modeling
When modeling a system, usually a set of entities is designed and an entity relationship is established (ER diagrams). This gives us a good view of how entities will look in their normalized forms and how data will flow. There will be some lookup tables, system tables, transactional tables, etc. This is the normal process for the enterprise line of business applications. In the NoSQL world, you should think in terms of documents that will be a denormalized version of a set of entities, and about how users will access and store data. In SQL, you could write a ‘join’ sql query to get all the data whereas in NoSQL you will design documents in such a way that they will contain all the required data. This means you may have replicated data across collections (also known as a table, collection or container).
In some cases, you may have systems that do not require a huge number of tables or relationships. For example Logging, or an IoT app will take a stream of a device. NoSQL is ideal for these scenarios.
For relational data design, take the example of an order processing system, we could design in NoSQL with multiple collections. A customer collection can be created to get customers data. An order collection can be created to get orders for that user. For relationships and dependencies, you may put Users in Order and/or Orders in Users based on the requirements. The usage of data is driving the storage format. It may even store duplicate data. Storage is relatively cheap compared to expensive queries for data retrieval because it may use more CPU and memory. The number of tables in SQL will not be equal to the number of collections in NoSQL.
Take another example: a blogging website. Let’s say there are two screens, for anonymous user, the home screen will show the top ten popular blogs whereas a logged in user will see their blogs. In this case, we could have two collections: Users and Blogs that both will store blogs. One will partition by UserId while other will partition by BlogId. Again, this is one possibility.
Two takeaways to keep in mind: 1) You could model relational data with NoSQL. 2) Design documents as per the read and write model requirements.
Scaling NoSQL databases
This concept is new for SQL DB users and could be difficult to work with. With Cloud NoSQL solutions, you could auto or manual scale out or scale back your instance for cost-saving purposes. Traditionally, SQL DBAs don’t have to modify server hardware. In the new world, you could auto scale and save money.
Software development life cycle
Usually, with RDBMS, you would install databases locally as you start the project or use them in a data center development environment. When you ready to promote, it will be deployed to higher environments like QA and eventually to Production.
In the NoSQL world, you will rarely install the database locally (I have done it once, install MongoDB locally). But for the most part, you would access the database from the cloud using either a browser or any other client tools.
Deployment of databases
The deployment of SQL databases (schema, tables, stored procedures, constraints, etc) is a big affair whether it’s an automated or manual process. Adding a column or modifying a column type can be a deployment nightmare. For the most part, this item is not required with NoSQL. There will be no minimum deployment work with NoSQL databases.
With SQL based databases, implementing and setting up high availability (read replicas/clusters) is requires a huge amount of infrastructure and configuration work. With NoSQL, this will be a relatively easy job.
With SQL, you would typically scale up by adding resources to your servers or upgrading server hardware. You bear the cost no matter if the hardware is underutilized. There are some limited manual sharding scale out options available with SQL. With NoSQL you can scale out, adding more nodes dynamically.
You will be able to secure your cloud databases just like your SQL databases with authentication, encryption, IP whitelist/blacklisting, etc depending on the provider’s feature availability.
Backup and Administration
NoSQL administration is relatively easy because of the simple nature of the document store.
Using database tools to access/query data directly
With SQL or NoSQL, you need to install a tool to view and query the database. With NoSQL, this is relatively easy as you are accessing data from a browser.
Programmatic Access — Libraries and SDKs
NoSQL databases provide SDKs for CRUD, filter, and batch operations. Getting results with SDK is very comparable to using ORMs like Entity Framework or Hibernate.
With SQL, you get ACID transactions, data integrity, and durability. With NoSQL, we trade all fine SQL properties for their simplicity, scalability, and high availability. Let’s say we had to replicate data at the other end of the world (i.e. real-time data replication happening from the Americas or Asia), there is a latency to replicate that data and mark it committed but you as a user don’t have to wait for it; therefore, it’s called eventual consistency (as opposed to strong consistency). Note, there are some NoSQL databases that are fully transactional. However, it may cost you more and there is a drawback of high latency in the case of multi-region database replication. Please visit the links below if you are interested in more details on global replication.
Partition key plays a very important role in the NoSQL world and should be carefully designed. As we scale-out, it is important to evenly distribute data across the partitions (servers/nodes). An obvious basic partition key is your primary key. There can be other keys such as date ranges, time-series data, or country/state for geographical distribution.
On the top of your partition key, you can add one or more indexes based on your requirements.
Next steps and other considerations
Based on the list above you can weigh what items are important for you, your team, the product, and the company.
My first suggestion for you is to try NoSQL databases if you haven’t. If you are an expert in SQL then get a taste of NoSQL. Explore all the features before you reject it.
Consider whether you are trying to implement for a new application or adding features to an existing one. For bigger systems, you can take a hybrid approach and have a combination of both.
Are you required to use ACID transactions and data integrity?
Considering how SQL can support NoSQL (with JSON columns), this means more branding, preference, engineering team skills set, and preferred cloud vendor supported databases.
I suggest opening up to the NoSQL. It’s certainly something to consider going forward in the cloud world. There are issues such as data integrity that can be achieved with client SDKs.
NoSQL may not be the answer for the small apps that do not require scale or high availability. But for the right use cases, consider using NoSQL.