NoSQL Cloud Databases guide for Decision Makers

My own image

Database Engines

NoSQL engines are fundamentally built with scalability in mind. They are much simpler than RDBMS engines as they are document stores as opposed to a set of tables with all the datatypes enforcements, constraint requirements and storing relationships. NoSQL DB is built up on a concept called ‘consistent hashing’, which means all documents are equally distributed across multiple partitions (servers/nodes). This architecture facilitates scaling out and allows for horizontal partitioning.

Database design and modeling

When modeling a system, usually a set of entities is designed and an entity relationship is established (ER diagrams). This gives us a good view of how entities will look in their normalized forms and how data will flow. There will be some lookup tables, system tables, transactional tables, etc. This is the normal process for the enterprise line of business applications. In the NoSQL world, you should think in terms of documents that will be a denormalized version of a set of entities, and about how users will access and store data. In SQL, you could write a ‘join’ sql query to get all the data whereas in NoSQL you will design documents in such a way that they will contain all the required data. This means you may have replicated data across collections (also known as a table, collection or container).

Scaling NoSQL databases

This concept is new for SQL DB users and could be difficult to work with. With Cloud NoSQL solutions, you could auto or manual scale out or scale back your instance for cost-saving purposes. Traditionally, SQL DBAs don’t have to modify server hardware. In the new world, you could auto scale and save money.

Software development life cycle

Usually, with RDBMS, you would install databases locally as you start the project or use them in a data center development environment. When you ready to promote, it will be deployed to higher environments like QA and eventually to Production.

Deployment of databases

The deployment of SQL databases (schema, tables, stored procedures, constraints, etc) is a big affair whether it’s an automated or manual process. Adding a column or modifying a column type can be a deployment nightmare. For the most part, this item is not required with NoSQL. There will be no minimum deployment work with NoSQL databases.

High availability

With SQL based databases, implementing and setting up high availability (read replicas/clusters) is requires a huge amount of infrastructure and configuration work. With NoSQL, this will be a relatively easy job.

Scalability

With SQL, you would typically scale up by adding resources to your servers or upgrading server hardware. You bear the cost no matter if the hardware is underutilized. There are some limited manual sharding scale out options available with SQL. With NoSQL you can scale out, adding more nodes dynamically.

Security

You will be able to secure your cloud databases just like your SQL databases with authentication, encryption, IP whitelist/blacklisting, etc depending on the provider’s feature availability.

Backup and Administration

NoSQL administration is relatively easy because of the simple nature of the document store.

Using database tools to access/query data directly

With SQL or NoSQL, you need to install a tool to view and query the database. With NoSQL, this is relatively easy as you are accessing data from a browser.

Programmatic Access — Libraries and SDKs

NoSQL databases provide SDKs for CRUD, filter, and batch operations. Getting results with SDK is very comparable to using ORMs like Entity Framework or Hibernate.

ACID Transactions

With SQL, you get ACID transactions, data integrity, and durability. With NoSQL, we trade all fine SQL properties for their simplicity, scalability, and high availability. Let’s say we had to replicate data at the other end of the world (i.e. real-time data replication happening from the Americas or Asia), there is a latency to replicate that data and mark it committed but you as a user don’t have to wait for it; therefore, it’s called eventual consistency (as opposed to strong consistency). Note, there are some NoSQL databases that are fully transactional. However, it may cost you more and there is a drawback of high latency in the case of multi-region database replication. Please visit the links below if you are interested in more details on global replication.

From AWS docs: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-integration
From Azure docs: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

Partitions/Sharding patterns

Partition key plays a very important role in the NoSQL world and should be carefully designed. As we scale-out, it is important to evenly distribute data across the partitions (servers/nodes). An obvious basic partition key is your primary key. There can be other keys such as date ranges, time-series data, or country/state for geographical distribution.

Indexes

On the top of your partition key, you can add one or more indexes based on your requirements.

Next steps and other considerations

Based on the list above you can weigh what items are important for you, your team, the product, and the company.

Conclusion

I suggest opening up to the NoSQL. It’s certainly something to consider going forward in the cloud world. There are issues such as data integrity that can be achieved with client SDKs.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Manish Jain

Manish Jain

Cloud Architect and Software Engineer