A JSON event-based convention for WebSockets

Ismael Celis

This post was originally published on the New Bamboo blog, before New Bamboo joined thoughtbot in London.


HTML5 WebSockets are cool. Given a compliant server – and browser – all you need to do is instantiate your socket object and start listening to server-pushed data.

var socket = new WebSocket('ws://socket.server.com');

socket.onmessage = function(evt){
  alert("I got data: " + evt.data)
}

How awesome is that?! Your browser is now officially hooked-up to the server by a persistent, firewall-safe, bidirectional TCP connection. The server can send data to the browser at any time, via predefined callbacks, while talking back to the server is as straight-forward as:

socket.send( "Thank you, Mr. server!" );

Think about it for a moment. You’ve just upgraded your browser to a Real-Time, Hyper-Responsive Information Device (RTHRID. I’m sure it’s going to catch on). All that without leaving the comfort of your beloved HTML and Javascript.

Not excited yet? What’s the problem with you!

So your browser and server are now aware of each other. Great! But what can we do with it? A Javascript chat client, for one (because that’s what the world really needs). That’s easy enough:

// Incoming messages
socket.onmessage = function(evt){
  // We use jQuery because life's too short for raw DOM scripting
  $('#messages').prepend( "<li>" + evt.data + "</li>" );
}

// Send message to chat server
$('form').submit(function(){
  socket.send( $(this).find('input.message').val() );
  return false;
});

We get a message from the server, we add it to the page. We hijack a form’s submit event and send its contents to the chat server. Other clients will be listening to my messages in the same way.

But now lets say you want to know when other users have connected (and you want to know their names). What do you do? How can you tell what type of message you’re getting from the server? Is it a text message from a previous user? Is it a “user just connected” message? Which user?

Here’s the thing: WebSockets just give you a bare-metal TCP pipe. Normally you would use it to send text data back and forth, but there’s nothing preventing you from building a more structured protocol on top of it. Take STOMP, for example. STOMP is a text-based messaging protocol. You can do things like subscribe to channels, let other users -or machines- know that you’ve connected, publish data, etc. A STOMP message looks like this:

SEND
destination:/some/channel

hello everybody
^@

It’s got a command name, a destination header (a channel, chat room, topic, etc) and a body (the “^@” at the end means “end of the message”). It’s up to client implementations to parse this by splitting each part and do something with it.

See where I’m going with this? You could implement a STOMP Javascript client and make it talk to a STOMP server over WebSockets (here’s how).

But who would want to do such a thing? Parsing sloppy text strings to make out what the server wants from you? No fun.

We might as well use JSON.

JSON is Javascript, Javascript is JSON. Decent browsers even come with built-in JSON parsers! Lets have our imaginary chat server send us JSON strings like this one:

[
  "user_connected",
  {
    "name" : "Ismael Celis",
    "message" : "Yo",
    "twitter" : "ismasan"
  }
]

This is a made up message format, but it’s useful to our purposes: it’s a JSON array where the first element is an event name “userconnected” and the second element is the event “data”, in this case a JSON object with some properties. In our particular application some of these properties might be mandatory (such as name and message), whereas others can be optional (twitter). It’s important, though, that we settle in the `[eventname, data_object]` general message format. I’ll show you why.

Standard, mini-protocol, call it what you like. The fact is that this tiny convention for server-browser communication brings about all sorts of awesomeness to our little Javascript application. We can now wrap message-parsing in a neat, Javascript-y object with it’s own event semantics and everything. This is how it looks:

var server = new ServerEventsDispatcher();

server.bind('user_connected', function(user){
  $('#messages').prepend( "<li class='connected'>" + user.name + " has connected!</li>" );
})

server.bind('user_message', function(data){
  $('#messages').prepend( "<li>" + data.name + " says: " +  data.message + "</li>" );
})

WTF just happened?!

You courteously inquire.

See that ServerEventsDispatcher thing? I just made it up. Go on, take a look. It’s only 30 lines.

Quite trivial, isn’t it? It wraps the native WebSocket object you’ve come to know and love during the last 10 minutes. It then implements it’s own event-binding and triggering mechanism and pipes said events from and to the server as JSON-encoded strings in the format we just defined.

In fact, you can even hook in multiple handlers to the same event, already an improvement over WebSocket semantics!

server.bind('user_connected', function(user){
  $('#messages').prepend( "<li class='connected'>" + user.name + " has connected!</li>" );
})

var user_count = 0;

server.bind('user_connected', function(user){
  user_count++;
  $('#user_count').html( user_count + " users currently connected");
})

You can use the same mechanism to add handlers to the native WebSocket events open, message and close.

server.bind('close', function(){
  $('#status').prepend( "Connection closed by server" );
})

… And you can send messages (“events”, in our idiom) back to the server and to all connected users. The underlying method is called “send” in the WebSocket API, but I’ve chosen to call it “trigger” so we stay close to the idea of events. I like that because Javascript in the browser is all about events (“click”, “mouseover”, “submit”) and I really like the idea of treating the server just like you would any other DOM element.

// Send message to chat server ... By "triggering" the user_message event.
$('form').submit(function(){

  server.trigger(
    'user_message',
    {
      name: 'Ismael Celis',
      message: $(this).find('input.message').val()
    }
  );

  return false;
});

This is just how jQuery events API works, by the way, and this little pattern fits quite well with the bigger picture of Javascript applications.

It’s not limited to boring chat applications, either. The following example illustrates sending mouse coordinates for, say, moving your character around in the next generation of Call of Duty, Javascript Edition:

// Broadcast your mouse movements to other players

$('body').mousemove(function(evt){
  server.trigger('player_move', {
    player_name: 'Ismael',
    x: evt.clientX,
    y: evt.clientY
  });
});

// Listen to other players moves and update their characters on screen.

server.bind('player_move', function( move ){
  $('#player_' + move.player_name).css( {left: move.x, top: move.y} );
});

The one thing to take from all this? WebSockets are important. You can add them easily to your web applications and build your own, domain-specific protocols on top of them. AND IT’S ALL JAVASCRIPT.

I hope you are excited by now.

UPDATE: I’ve uploaded slides from a presentation I did some days ago around this subject. You can see them here.

NOTE: The server side of WebSockets world is a whole different post. While the protocol itself is ridiculously simple, there are some performance considerations to have in mind. Luckily smarter people than us have already done most of the work. There are tons of solutions for Java, Python and even server-side Javascript, but if you’re fond of Ruby as I am I recommend you take EM-WebSocket or Cramp for a spin.