With the advent of noSQL databases, IT systems have a new mechanism to persist data. So the obvious question arises. How does you compare RDBMS to noSQL databases? Well here are some parameters for comparison.
RDBMS are designed to handle large datasets while noSQL databases handle huge datasets. Now there is no defined value to demarcate what constitues large and what can be considered as huge. However one can safely say that a large social media type web site like Google or Facebook will use a noSQL database given the quantum of data that it needs to handle. Also NoSQL databases offer better scalability vis-a-vis RDBMS.
Data Consistency Model
RDBMS use Atomicity, Consistency, Isolation and Durability (ACID) model, while noSQL databases use Basic Availability. Soft State and Eventual Consistency (BASE). The ACID model ensures that all database clients are provided a consistent view of the data. Meaning that all committed data is available to all clients. In case of BASE model, the data view is subject to the client consistency standard adopted by the noSQL database. Where consistency is paramout like stock trading, core banking applications we will opt for RDBMS. In case of a social media web site on the other hand the consistency model can be relaxed. Here we choose noSQL databases. The varied client consistency view is lucidly explained here.
Brewer stated that a distributed computer system cannot simultaneously provide three attributes namely consistency, availability and partition tolerance. According to him, a system can provide at the most two of the three attributes. RDBMSs are CA systems while noSQL databases can be either CP or AP systems. Example of CP systems are Google BigTable, MongoDB, Hbase. Example of AP systems are Amazon Dynamo, SimpleDB, CouchDB. For more details visit this excellent reference.
RDBMS have a standardized data model consisting of rows and columns. NoSQL databases have variety of data models like:
- Key Value (DynamoDB, Terrastore, GenieDB)
- Column-oriented,(Cassandra, HBase, BigTable)
- Document-oriented (MongoDB, CouchDB)
In RDBMS one can ensure data integrity by using foreign keys, constraints like unique, not null, check etc. Broadly constraints ensure that only logical data reside within the database. NoSQL databases does not have the equivalent of constraints due the nature of their design. It is primarily the responsiblility of the application encapsulating the noSQL database to ensure data integrity.
RDBMS supports the standardized SQL as its querying language. A key feature of RDBMS is support for JOINs. Implementing JOINs in noSQL is not possible/not favored as a solution. NoSQL supports the following broad categories of querying
- Programming language specific API (Java, Python, .NET, PHP etc)
- Proprietary Language (e.g. CQL in Cassandra)
- MapReduce API
- Object graph navigation API
Mapping of querying capabilities to some sample noSQL Database types
Key Value: Programming langauge specific APIs For example Amazon DynamoDB supports Java, .NET and PHP APIs.
Column Oriented: Cassandra has CQL. HBase has Java APIs.
Document Store: MongoDB has its own document-based Query langauge. MongoDB supports language specific APIs like Python, Java, Ruby, Perl. Terrastore supports HTTP and Java APIs.
NoSQL databases provide their own properiatary APIs to retrieve information. For example, HBase provides Java APIs namely Get, Put, Scans and Delete. Note in noSQL databases all operations are based on the noSQL database’s key column. This is unlike RDBMS where a developer is free to query any columns of any table. When designing a noSQL database, the key column is significant as it will serve as the focal point for all queries. There are methodologies to replicate the behavior of a JOIN. Refer the following blogpost for details on implementing the same in Casssandra. Some noSQL supports SQL like query langauge. For example Cassandra has CQL; Cassandra Query Language.
RDBMS have fixed schema. We need to define the database schema upfront before using it. noSQL databases have no such restrictions. This is an important feature. In social web sites, the need for a new column/table arrives at run time also the data structure can vary widely between two similar entities.
That’s all the points I could think of at the moment.