How Sockets Can Block Node’s Event Loop

Writing good code requires an understanding of how the language’s features fit together. Writing great code requires an understanding of how the code will actually run underneath. If you don’t, your code could cause huge hidden issues.

When we first started implementing our socket logic in Node.js, we didn’t have much experience with doing real-time operations with Node.js. We quickly learned that our socket experience with other stacks was of little use. The more we investigated, the less we understood. To find the solution, we first had to understand Node.js itself.

The Event Loop

To start, let me give a brief explanation of how Node.js runs. Node.js is single-threaded, but to enable it to do many different things at any given time, it uses the “event loop”. This is the main execution loop of every function that is invoked from any asynchronous event. Any asynchronous function invoked is appended to the event loop’s queue, and is allowed to run uninterrupted until it finishes. All synchronous tasks hold the event loop until it finishes, blocking other important events from being resolved.

Optimizing Node.js code is all about keeping that event loop flowing. The more synchronous code you have, the longer the event loop can potentially be blocked. This includes toString() and most of the array methods, which seem harmless enough, but are synchronous and could run much longer than you anticipated. Check out my older blog post to read about how much performance can be lost from these inconspicuous functions.

Our Problem

The issue we were having involved a microservice we use to retrieve container logs from Docker containers. If you’ve ever had to do this, you may have seen a couple of junk characters at the beginning of each line. That’s actually a header put on each line to tell you which stream it belongs to (stdin, stdout, or stderr), and the length of the message.

Sounds easy enough, right? Dockerode, the Node.js Docker API, suggests using docker-modem’s demuxer to split up the messages and remove the header. We hooked it up, and it worked for a bit, but soon it started slowing down and crashing. What was wrong? Let’s check the code:

Modem.prototype.demuxStream = function(stream, stdout, stderr) {
  var header = null;

  stream.on('readable', function() { // grab the event loop
    header = header || stream.read(8);
    while (header !== null) { // repeat until header is null
      var type = header.readUInt8(0);
      var payload = stream.read(header.readUInt32BE(4));
      if (payload === null) break; // if the payload isn’t valid, break the loop
      if (type == 2) {
        stderr.write(payload);
      } else {
        stdout.write(payload);
      }
      header = stream.read(8); // if there’s another message on the stream, load it into header
    }
    // release the event loop
  });
};

That doesn’t seem that bad, right? This library was most likely designed for small, local Docker operations. You don’t really have to worry about blocking the event loop when you’re only serving a couple clients because the socket buffers will be emptied as quickly as they are filled. Because the while loop doesn’t run much at all, the event loop is never blocked long enough to affect performance.

But when you start adding more clients and more containers, the server can’t keep up with all the new messages, and everything snowballs. As sockets filled up with more data, the while loop has to process more messages per iteration of the readable method. The longer the while loop runs, the longer the event loop can’t be interrupted to serve more clients. And the longer the event loop is stuck, the longer the data on the sockets gets, increasing the time spent in the while loop.

Eventually, by the time a new container’s logs are read for the first time, its entire log could be waiting in the buffer. If that happens, the while loop will run until the buffer is emptied, while the other clients don’t get anything at all.

How Did We Fix It?

We had to replace that while loop, which isn’t as straightforward as it sounds. We couldn’t just use an asynchronous while loop, because new socket data could get processed before the loop resolves. We needed something that interrupts the flow of data, and transforms it as it goes. Node.js has such a thing, called a Stream.Transform.

Transforms allow us to modify the data while it flows through the socket, and only processes the next chunk when we invoke the callback. The transform buffers incoming data until we have a valid message. It then removes the header from the message and emits the data to the client. To cycle through multiple messages, we can use an async.whilst, and call its callback with a setTimeout. The setTimeout schedules the next block to be read at some later time, allowing other sockets to have their messages processed. Newer messages for this socket will be queued up, waiting in line to be transformed. Once that transform finally calls back, the next messages will be ready to be transformed. The event loop is never blocked for more than one message, and the messages are always processed in order. Here is an example of what that all looks like:

const HEADER_LENGTH = 8

class Cleanser extends stream.Transform {
 constructor(options) { super(options) }
 cleanse (chunk, enc, cb) {
   if (!chunk || !chunk.length) { return cb() }
   let header = 0
   let endOfData = 0

   async.whilst(
     () => {
       if (chunk.length <= HEADER_LENGTH) {
         return false // if chunk isn’t long enough, just buffer until next run
       }
       header = chunk.slice(0, HEADER_LENGTH)
       endOfData = HEADER_LENGTH + header.readUInt32BE(4) // get eod position
       return chunk.length >= endOfData
     },
     whilstCb => {
       const content = chunk.slice(HEADER_LENGTH, endOfData)
       if (content.length) {
         this.push(content, enc);
       }
       // move chunk along itself
       chunk = chunk.slice(endOfData)

       setTimeout(whilstCb, 0)  // schedules the next message to run later
     }, () => {
       if (chunk.length) {
         this.buffer = Buffer.from(chunk) // buffer remaining data
       }
       cb()  // concludes processing this chunk, will cause next one to run
     })
 }
 _transform (chunk, enc, cb) {
   if (this.buffer) {
     chunk = Buffer.concat([this.buffer, chunk])
     delete this.buffer
   }
   this.cleanse(chunk, enc, cb)
 }

 _flush (cb) { this.cleanse(this.buffer, 'buffer', cb) }
}
module.exports = Cleanser

Conclusion

If you’re writing Node.js, make everything as asynchronous as you can. Especially if you’re working with real-time socket data. It can be hard to imagine what code does when you’re running it at scale, even when you consider O(n) complexity. Because of this, it’s best to avoid using synchronous loops, especially if the data is from an external source.

This investigation not only helped us solve the issue, but it helped us understand how to write better Node.js code in general. With this understanding, we can write code that takes advantage of the benefits of Node, instead of working around it.