Sep 11, 2011

NoSQL - Rapid Application Development Platform




Lately, I came across a lot of discussion around NoSQL's slow enterprise adoption, this prompted me to blog my experience in trying to add NoSQL in our enterprise’s toolset. 

We are working on a program to modernize our systems/applications using Web2.0/Restful services and in the process build a highly automated agile development platform. 

As part of this modernization, we came across certain use-cases that required extensible schema support. The rigid schema of our legacy systems lead to highly complex solutions to handle these use-cases. 


The uses-cases we came across were, shopping cart with extensible line items and a highly customizable customer's metadata repository. The customization of the metadata ranged from number of fields, definition of fields and the data to which the fields can be associated with. This customer metadata repository is a large data-set (2.3 billion records) spread across large set of vertically partitioned databases (80+). Our end-users can also search on these metadata fields through our Web2.0/RESTful application. We are using Apache Solr 3 (migrating from Oracle text) for our text search solution.

In legacy system, the schema flexibility was achieved by adding few spare fields in various tables or by using key-value schema design. Either of the approaches are not elegant. We also considered storing metadata as blob with customizable fields and their definitions, which again, wasn’t the optimum use of RDBMS. 


This need for schema flexibility lead us to NoSQL. 


We are indexing all of the metadata in Apache Solr. Solr was giving us high performance near-real time searching at the same time providing a consolidated (80+ databases) and denormalized (documented oriented) view of the data. So my first instinct was to consider Solr as the NoSQL solution. However, we are still in the middle of the migration from Oracle and that option looks pre-matured.

Moving on to the real NoSQL products (there are so many of them), I considered Apache Cassandra, it seemed most matured and adopted amongst all NoSQL solutions. At the time, I did not find any HTTP interface for it and using its Java API did not look all that easy to get started. Moreover, Apache Cassandra came across as a BigData solution for high volume and velocity data, geared more for social media type scenarios. We had more of a denormalization and unstructured data-storage problem.

Next in the line was Apache CouchDB, it had an HTTP / JSON interface and was easy to get started with. Implementing a PoC with CouchDB was quite easy, I already had Rest Services with data as JSON generated from JAXB . For persistence, I just did HTTP Post with the JSON document and unique rowid. There was no database schema design, no ORM layer, JPA, transaction or any of the persistence stuff need for RDBMS. 

Flexible schema and HTTP interface in itself, seemed a big reason to use NoSQL, in additional there was all the goodness of scalability and replication. This got me thinking about the price that applications have been paying for a Rigid Schema and JDBC/ODBC driver based db interface. 

The data access layer is usually the most complex layer of any application (instead of business logic layer), mostly due to the complexities of ORM, transaction/concurrency management etc. Also, to get any meaningful data from normalized schema, an application needs to perform joins, which adds to the complexity ex. Order Line in itself does not give any meaning unless it is joined with Order Header. 

I think, normalization and schema-design should be an optimization step and not an upfront design activity. In initial stages of application development  one should start will NoSQL database, once all the data storage and retrieval needs for an application are built, one can design an optimized schema.


Another aspect of schema design is that in modern architectures, system-data is not directly exposed to external entities. It is abstracted by a services or data-access layer for centralized enterprise data-access. In legacy systems, enterprises were trying to build a centralized enterprise database with standard schema that provided a consistent data format for all enterprise applications. As various systems and applications were directly accessing the data, there was a need for rigid schema.


I think, enterprises will be more interested in NoSQL if it is presented as low cost, rapid application development platform that provides business agility and cost savings due to reduced application complexity. 

In my case, even though management realized the value of NoSQL, at the time we did not go ahead with it, for all the various reasons discussed in this discussion thread

To state a few, at the time NoSQL solution did not have the maturity and enterprise readiness (support, management, monitoring etc.).
The overall organization impact in having a NoSQL platform was not well understood, ie the skillset change, changes in responsibility of organizational units (application developer, middleware admins, database admins/data modelers). 
There was some concern of having an open source product in the most-secured enterprise layer, most enterprises are still slow in adopting OpenSource. 
There was also lack of success stories of enterprises using NoSQL, the few that were there, were mostly in the context of BigData and Social media type problems.
  
Since then, there has been a lot of progress and maturity in NoSQL technology and some of the above concerns are now addressed. 

NoSQL is an impressive technology and can provide a lot of value to enterprises; it just needs a Don Draper to make that push.