Embracing Event-Driven Microservices

A few months ago, Anand discussed the benefits of transitioning from traditional HTTP communication to event-driven microservices. He noted how this transition decoupled our services, forced scalable patterns, and increased our resilience. Here, we’ll detail the lessons we learned when we re-implemented our Cron-esque scheduler service, Khronos.

Keeping Microservices Domain Specific

Khronos handled many tasks for us: health checks, container cleanup, network tests, gathering metrics, and more. While very effective, we determined that this microservice was guilty of violating many domain boundaries throughout its processes. Khronos required knowledge of the internal workings of other services in order to perform its scheduled tasks. This meant that using similar dependencies or external libraries, and even code duplication was needed to perform actions already handled in another service. Additionally, any changes made to other services could lead to breaking changes in Khronos if not updated simultaneously.

In order to adhere to our microservices pattern, we needed to decouple our microservices. Rather than performing tasks cross-domain, Khronos could be stripped down to solely it's timekeeping duties, moving external tasks to their respective microservice.

Before Vs. After

Before, Khronos would schedule its own workers to perform tasks in other domains. This meant that scheduling new tasks with Khronos required multiple pull requests. Not only did you have to add a new worker to Khronos, but you also had to modify deploy scripts to trigger your worker within Khronos. Khronos became bloated with external dependencies since these workers had to perform actions in other domains.

Because the event-driven nature of our infrastructure, we realized all of this bloat was unnecessary. Now, Khronos exclusively publishes timekeeping events that other microservices can listen to. Domain specific code remains within the respective microservice and our services can be decoupled. Code duplication and external dependencies have been minimized. Developers can now add scheduled tasks to their services by simply listening to these timekeeping events.

Timekeeping Event Listener

Our intent is to listen to global events and publish local tasks for the service workers to consume. This way, certain events like the passing of a set duration of time, can trigger tasks to run in many different domains, without direct communication. Ponos is our open-source RabbitMQ based worker server, that allows us to easily accomplish this. In our case, a Ponos worker would consume the event time.one-day.passed, and then publish a docker.images.cleanup task that will trigger a worker to cleanup stale docker images.

Naming Timekeeping Events

Below is an excerpt from the Ponos constructor within Khronos pertaining to our timekeeping events. Events that we propagate throughout our messaging queue should be well-named and descriptive of what exactly has transpired. We start with the domain: time, for obvious reasons. This is followed by the duration: one-day or thirty-minutes to keep our events human readable and understandable at a glance. We finish with the past-tense verb passed, correlating to what had happened.

const publisher = new RabbitMQ(
  events: [
    { 'time.one-day.passed',
      jobSchema: joi.object({}).unknown()
    },
    { 'time.four-hours.passed',
      jobSchema: joi.object({}).unknown()
    },
    { 'time.one-hour.passsed',
      jobSchema: joi.object({}).unknown()
    },
    { 'time.thirty-minutes.passed',
      jobSchema: joi.object({}).unknown()
    },
    { 'time.five-minutes.passed',
      jobSchema: joi.object({}).unknown()
    },
  ])
})

Conclusions

Some of the major benefits that we have found after making this switch is that:

Our code is easier to maintain.
We can listen on any number of timekeeping events.
We have a single source of truth for the passage of time (other services don’t need to use their own setInterval).
And that adding scheduled tasks do not require multiple PRs or changes to deploy scripts.

Future Ideas

Moving forward, I would like to make Khronos even more robust by:

Adding a listener that can create customizable timekeeping events.
And publishing Khronos as an open-source project!