The term BigData is buzzing everywhere in every sectors from past few years. One of the the joint term that is buzzed along with the BigData is NoSql. NoSql is considered to be the solution for the challenges introduced by BigData.
So what is meant by NoSql..? Actually, nobody is sure about its acronym..! Because it's an accidental term coined while searching for a twitter tag that has no/less index on google for gathering enough attention from the people to host a meet-up regarding Distributed Database System. Most of the people assumed NoSql symbolizes 'Not only SQL' and some assumed as 'No more SQL'.. Whatever.. I could tell NoSql is a fancy nickname for Distributed Database systems: DDS.
Actually there could be a miss-conception like NoSql is the future database, it offers numerous benefits than existing RDBMS: Relational Database Systems, it was the recommended database for the future applications etc.. Not at all.. Actually NoSql introduces lot of challenges, compromises, confusions and querying limitations. If you have a choice between RDBMS and NoSql, it was always recommended to go for RDBMS which is more stable, enough features with rich querying capabilities. So then why people moving towards NoSql..? You will get the answer shortly..
In every application or use case, the application data always grow upwards. The rate of the growth might defer depending on the application scale, but they do grow. Apparently, application data never remain same or move downwards. When the application load attains certain threshold and the server couldn't efficiently handle the load any more, the organization scale the Server hardwares to a higher level. This is what traditionally happened in most type of applications. But over the last decade, with the development of WWW: World Wide Web and the evolvement of tiny intelligent devices, more and more data are generated (refer my older post: An Era of Data Explosion) in which the application load started meeting the threshold too quickly and indefinitely.
An organization can choose one among two ways of Scaling.
1. Scale-up/Vertical Scaling:
Upgrading the Server hardwares: adding more resources to it or migrating the application to a bigger box refers to Scaling Up/Vertical Scaling. Vertical Scaling is the initial level of scaling that is traditionally used by the organizations when the threshold period is definitive and not too quick.
2. Scale-out/Horizontal Scaling/Sharding:
Vertical Scaling has a limit, which means we can't upgrade the server hardwares over a certain level. So the next level of scaling is the Horizontal Scaling. In this method, we follow the most famous method of handling the sizable problem: 'Divide and Conquer' method. Instead of storing the data in a single server, the data are divided into several small shards and stored among several individual servers that are connected to a common network. The process of dividing the massive dataset into several smaller subsets and storing on multiple servers are called Sharding. The set of individual servers that are connected to a common network are collectively called as Cluster.
When the data are divided into smaller shards and stored among multiple servers, the immediate risk that we need to address is Single point of Failure. Yes, if the data are stored among multiple servers, if one those servers is down due to some reason, then we will lose all the data stored on that server till the server heals and come up again. Hence in order to tolerate one or few server failures and keep the system up and running, replicating the shard on more than one server is inevitable.
Problems are not solved after the replications. Actually, the main class of problems araised after doing the replication. Will see about the problems that are introduced by data replication in the next post.