Lately,
I came across a lot of discussion around NoSQL's slow enterprise adoption, this
prompted me to blog my experience in trying to add NoSQL in our enterprise’s
toolset.
We
are working on a program to modernize our systems/applications using
Web2.0/Restful services and in the process build a highly automated agile
development platform.
As part of this modernization, we came across certain use-cases that required extensible schema support. The rigid schema of our legacy systems lead to highly complex solutions to handle these use-cases.
The uses-cases we came across were, shopping cart with extensible line items and a highly customizable customer's metadata repository. The customization of the metadata ranged from number of fields, definition of fields and the data to which the fields can be associated with. This customer metadata repository is a large data-set (2.3 billion records) spread across large set of vertically partitioned databases (80+). Our end-users can also search on these metadata fields through our Web2.0/RESTful application. We are using Apache Solr 3 (migrating from Oracle text) for our text search solution.
In legacy system, the schema flexibility was achieved by adding few spare fields in various tables or by using key-value schema design. Either of the approaches are not elegant. We also considered storing metadata as blob with customizable fields and their definitions, which again, wasn’t the optimum use of RDBMS.
This need for schema
flexibility lead us to NoSQL.
We
are indexing all of the metadata in Apache Solr. Solr was giving us
high performance near-real time searching at the same time providing
a consolidated (80+ databases) and denormalized (documented
oriented) view of the data. So my first instinct was to consider Solr as
the NoSQL solution. However, we are still in the middle of the migration from
Oracle and that option looks pre-matured.
Moving on to the real NoSQL products (there are so many of them), I considered Apache Cassandra, it seemed most matured and adopted amongst all NoSQL solutions. At the time, I did not find any HTTP interface for it and using its Java API did not look all that easy to get started. Moreover, Apache Cassandra came across as a BigData solution for high volume and velocity data, geared more for social media type scenarios. We had more of a denormalization and unstructured data-storage problem.
Next
in the line was Apache CouchDB, it had an HTTP / JSON interface and
was easy to get started with. Implementing a PoC with CouchDB was quite easy, I
already had Rest Services with data as JSON generated from JAXB . For
persistence, I just did HTTP Post with the JSON document and unique rowid.
There was no database schema design, no ORM layer, JPA, transaction or any of
the persistence stuff need for RDBMS.
Flexible schema and HTTP
interface in itself, seemed a big reason to use NoSQL,
in additional there was all the goodness of scalability and replication.
This got me thinking about the price that applications have been paying for a
Rigid Schema and JDBC/ODBC driver based db interface.
The data
access layer is usually the most complex layer of any application (instead
of business logic layer), mostly due to the complexities of ORM,
transaction/concurrency management etc. Also, to get any meaningful data from normalized
schema, an application needs to perform joins, which adds to the complexity ex.
Order Line in itself does not give any meaning unless it is joined with Order
Header.
I
think, normalization and schema-design should be an optimization step and
not an upfront design activity. In initial stages of
application development one should start will NoSQL database, once
all the data storage and retrieval needs for an application are
built, one can design an optimized schema.
Another aspect of schema
design is that in modern architectures, system-data is not directly
exposed to external entities. It is abstracted by a services or data-access layer for centralized enterprise data-access. In
legacy systems, enterprises were trying to build a centralized enterprise
database with standard schema that provided a consistent data format for all
enterprise applications. As various systems and applications were directly
accessing the data, there was a need for rigid schema.
I think,
enterprises will be more interested in NoSQL if it is presented as low cost, rapid application development platform that provides business agility and cost
savings due to reduced application complexity.
In my case, even though management realized the value of NoSQL, at
the time we did not go ahead with it, for all the various reasons discussed in
this discussion thread.
To state a few, at the time NoSQL solution did not have the maturity and enterprise readiness (support, management, monitoring etc.).
To state a few, at the time NoSQL solution did not have the maturity and enterprise readiness (support, management, monitoring etc.).
The overall
organization impact in having a NoSQL platform was not well understood, ie the
skillset change, changes in responsibility of organizational units (application
developer, middleware admins, database admins/data modelers).
There was some
concern of having an open source product in the most-secured enterprise layer,
most enterprises are still slow in adopting OpenSource.
There was also lack of
success stories of enterprises using NoSQL, the few that were there, were mostly
in the context of BigData and Social media type problems.
Since then,
there has been a lot of progress and maturity in NoSQL technology and some of
the above concerns are now addressed.
NoSQL is an impressive technology and can
provide a lot of value to enterprises; it just needs a Don Draper to make that push.
Great post Romi, I think the enterprise will adopt this tech basically because they will be able to create applications faster than anything else, and they will deliver full products in no time. I want to share two posts with your users with a deeper explanation about the "Order Header/Order Line" problem solved from the NoSQL perspective: http://crosstantine.blogspot.com/2012/08/nosql-or-rdbms-again.html and here's a "generic" explanation about the master/detail problem compared between NoSQL and RDBMS: http://djondb.com/blog/?p=4 I hope you like it.
ReplyDeleteThanks for sharing your post, they provide real-world use-case based NoSQL examples in comparisons to RDBMS. This is the type of information I was looking for when I was exploring NoSQL technology. I am sure alot of people will benefit from them.
DeleteThanks for the perspective Romi. Great post. I've been going back and forth with a few others that are also in an enterprise situation that would like to adopt NoSQL. They are looking at NoSQL to solve the sharding/parititioning problem. (instead of 80+ databases, they'd like a single database that can naturally distribute across multiple hosts) For those people, JPA and Spring Data might provide a nice transition. Since they are familiar interfaces to the enterprise for data access, you can "swap out" RDBMS for NoSQL storage keeping the same interfaces. I've started work on spring-data-cassandra and spring-data-jpa-cassandra. Hopefully, they'll be available sometime soon. (https://github.com/boneill42/spring-data-cassandra & https://github.com/boneill42/spring-data-jpa-cassandra)
ReplyDeleteThanks for sharing these solutions, they will enable easy transition from rational-platform and help in early adoption. However, I wonder if after transition, JPA will make it difficult for applications to leverage dynamic/flexible schema of NoSQL .
DeleteRomi, good write-up! Did you ever consider Couchbase in addition to CouchDB? I'm at Couchbase and just curious if we ever came up in your research and if we did, why you looked elsewhere. Keep up the writing!
ReplyDeleteBaxter