We are designing a large scale distributed event-driven system
for real-time data replication
across transactional databases. The data(messages) from the source system
undergoes a series of transformations and routing-logic before reaching
its destination. These transformations are multi-process and
multi-threaded operations, comprising of smaller stateless steps and tasks that
can be performed concurrently. There is no shared state across processes
instead, the state transformations are persisted in the database, and each
process pulls its work-queue directly from the database.
Based on this, we needed a technology
that supported distributed event processing, routing and concurrency on the Java + Spring platform, the three options
considered were, MessageBroker (RabbitMQ),
Spring Integration and Akka
RabitMQ: MQ was the first choice because it is the traditional and proven solution for messaging/event-processing. RabbitMQ, because it is popular light-weight open source option with commercial support from a vendor we already use. I was pretty impressed with RabbitMQ, it was easy to use, lean, yet supported advance distribution and messaging features. The only thing that it lacked for us, was the ability to persist messages in Oracle.
Even though RabbitMQ is Open Source (free), for enterprise use, there is a substantial cost factor to it. As MQ is an additional component in the middleware stack, it requires dedicated staff for administration and maintenance, and a commercial support for the product. Also, setup and configuration of MesageBroker has its own complexity and involves cross-team coordination.
MQs are primarily EAI products and provide
cross-platform (multi-language, multi-protocol) support. They might be too bulky and expensive when used just as asynchronous concurrency and parallelism solution.
Spring Integration: Spring has a few
modules that provide scalable asynchronous execution.
Spring TaskExecutor provides asynchronous
processing with lightweight thread pool options.
Spring Batch allows distributed
asynchronous processing via the Job Launcher and Job Repository.
Spring Integration extends it further by
providing EAI features, messaging, routing and mediation capabilities.
While all three Spring modules have some of the required feature,
it was difficult to get everything together. Like this user, I was expecting Spring Integration would have RMI-like remoting capability.
Akka Java: Akka is a toolkit and
runtime for building highly concurrent, distributed, and fault tolerant
event-driven applications on the JVM. It has a Java API and I decided to give
it a try.
Akka was easy to get started, I found Activator quite helpful. Akka is
based on Actor Model, which
is a message-passing paradigm of achieving concurrency without
shared-objects and blocking. In Akka, rather than invoking an object
directly, a message is constructed and send it to the object (called an actor)
by way of an actor reference. This design greatly simplifies concurrency management.
However, the simplicity does not mean that a traditional lock-based
concurrent program (thread/synchronization) can be converted into Akka with few code changes. One needs to design their Actor System by defining smaller tasks, messages and communication between the them.
There is a learning curve for Akka’s concepts and Actor Model
paradigm. It is comparatively small, given the complexity of concurrency and
parallelism that it abstracts.
Akka offers the right level of
abstraction, where you do not have to worry about thread and synchronization of
shared-state, yet you get full flexibility and control to write your custom
concurrency solution.
Besides simplicity, I thought the real power of Akka is, remoting and its ability to distribute actors across multiple nodes for high scalability. Akka's Location Transparency and Fault Tolerance make it easy to scale and distribute application without code changes.
I was able to build a PoC for my multi-process and multi-threading use-case, fairly easily. I still need to work out Spring injection in Actors.
A few words of caution, Akka’s Java code has a lot of typecasting
due to Scala’s type system and achieving object mutability could be
tricky. I am tempted to reuse my existing JPA entities
(mutable) as messages for reduced database calls.
Also, Akka community, is
geared towards Scala and there is less material on Akka Java.
In spite of all this, Akka Java seems cheaper, faster and
efficient option out of the three.