How we did it: Scalable and Reliable Near Real-time Messaging

Posted on Posted in Technology

Introduction

At Akosha (now Helpchat), we handle three lakh plus customer service issues a month – a number that continues to grow exponentially. Our prime focus has been to provide swift response to these issues. That’s why the logical step for us was to build an Android app which lets users to chat with agents about any query / complaint related to any company or service.

After months of work, we finally launched the app Akosha( now Helpchat). Play with it and do share your feedback at app.support@helpchat.in

In this post, I describe how we overcame specific challenges we faced while building this messaging application that supports thousands of messages per seconds.

Our messaging application consists of three components:

  • An Android application: A native Android app through which a user can initiate a chat with an agent.
  • A Messaging server: A server which receives messages from the user and relays it to the agent and vice versa.
  • An agent dashboard: An AngularJs based web-based front end for agents. Agents receive chat messages from users and can respond to their queries through this dashboard.

Design Philosophy

The most important factors in our design were:

  • Lightweight: The messaging protocol needed to be lightweight for the simple reason that our communication was over mobile device. This was one of the reasons we didn’t go for XMPP which is the most widespread chat protocol. Long session handshake and verbosity of XML makes XMPP an unsuitable choice for mobile devices.
  • Both Device and Web: Unlike WhatsApp, we had a unique challenge for the platform to provide messaging over both device and web. Telegram solved this problem with their custom protocol MTProto, which was certainly an inspiration for us.
  • Low Latency: In any messaging tool, the swiftness of delivery of messages is important and that was one of our prime decision factors in selection of messaging platform.
  • Scalability: With a user base of over 2 million (and growing) and more than three lakh complaints a month, it was imperative that we designed this system with scalability in mind.
  • Simple interface: A simple interface allows agents to respond to user queries swiftly.

Selection of messaging platform

We carefully evaluated the pros and cons of the following messaging platforms during our selection process:

Socket.io

Socket.io enables real-time bi-directional, event-based communication and works on almost any platform, browser, or device, focusing equally on reliability and speed.

Socket.io is composed of two parts:

  • A server that integrates with the Node.JS HTTP Server
  • A client library that is loaded on the browser/ device

socket

The missing parts

socket.io is not able to detect event when user disconnect from internet and reconnect (This is very important for presence management).

There is no acknowledgement to sender when message is successfully sent.

RabbitMQ

RabbitMQ is a messaging broker – an intermediary for messaging. It provides applications a common platform to send and receive messages, and gives messages a safe place to reside until received.

RabbitMQ provides the following functionality in terms of sending/ receiving messages:

  • Delivery acknowledgements
  • Persistence connection
  • High availability

The core idea in the full messaging model of RabbitMQ is as follows:

A producer is a user application that sends messages

An exchange is a very simple thing which receives messages from producers and push them to queue

A queue is a buffer that stores messages.

A consumer is a user application that receives messages.

rabbitmq

Missing parts in RabbitMQ

  • Power usage: We found the android client of RabbitMQ to be quite power hungry.

PubNub

PubNub Real-Time Network offers a collection of “building-block” services for any kind of real-time application. We found PubNub to be superior to other platforms available as of now. Some of the unique features of PubNub are:

  • PubNub Presence — Provides real-time updates for users who are online, and alerts for when they go off-line.
  • PubNub Storage/Playback — PubNub automatically stores all data published to each channel, and provides two mechanisms for retrieving this data: (a) a simple REST request to retrieve the data all at once, and (b) a playback mechanism for “playing back” this data.
  • AES Encryption — PubNub provides out-of-the-box AES 256 encryption support, ensuring that data remains encrypted while routing through the PubNub network.
  • Multiplexing — PubNub enhances multi-channel communication with a mechanism that allows all data topics to be streamed over a single TCP Socket Connection. With PubNub Multiplexing the savings are most apparent with mobile device resources such as phones on battery and slower network connectivity.

Push notifications

Sending intuitive and timely push notification for new messages is important for any messaging application. We found Google Cloud Messaging to be a reliable and timely service for sending push notifications to users. Two key reasons we went with GCM:

It is part of Google Play Services which is already installed on Android phones.

PubNub nicely integrates with GCM. For example, sending a push notification now for our app is as simple as sending this lightweight JSON:

{
    "pn_gcm": {
                "data" : {

                            "summary": "Game update 49ers touchdown",
                            "teams" : ["49ers", "raiders"],
                            "score" : [7, 0], 
                            "lastplay" : "5yd run up the middle"
                         }

              }

}

One of the challenges we faced while implementing push notification is whenever there was an incoming message on the device, it was received by both PubNub and GCM. While PubNub was usually faster in delivering the message, we could not make such an assumption. Hence, we resorted to reading the message from the first component which received the message which could be either GCM or PubNub. Since both messages contained same message_id, we simply ignored the duplicate message_id from the other component.

Emoticons for both Device and Web

One of the challenges we’ve faced is with regards to a cross platform emoticons solution. There are just too many emoticons standards out there. There is no standard as of now although Google has come together to formalize Emoji Symbols.

GitHub introduced emoji pngs . Campfire supports emoji. Both GitHub, Campfire, and others support a simple set of codes to inject these pngs into an html page.

http://www.emoji-cheat-sheet.com has emerged as a guide to users and has almost formalized how these codes should work.

Most of the above techniques can be categorized into three basic categories:

  • Colon: The emojis are converted to their colon style strings. This is simple to save in the database since it’s just a string. See examples here: http://www.emoji-cheat-sheet.comWeb interface of telegram uses this technique of rendering and storing emoji. Github flavored markdown also uses this.
  • UTF-8 Characters: Emojis are mapped to their Unicode characters. The advantage of this method is that some platforms (such as Android, iOS) can render them automatically as emoji, unlike colon style encoding which almost always requires decoding. One disadvantage, saving them in databases requires special handling.A comprehensive list of unicode codes can be obtained from: http://apps.timwhitlock.info/emoji/tables/unicode
  •  HTML: Emojis are converted to HTML <img> tags rendering each emoji as an image either from a single image or a sprite. This is the least useful method to adopt as it’s not cross platform. There is no standardization of Emoji sprite images and hence you will never be sure whether the target platform has the same emoji images. This solution works perfect for a single isolated website though.

Our Solution

Since we are using AngularJs as our frontend, we have come up with an AngularJs directive and set of filters to handle all of above cases for Emoji. The solution supports encoding and decoding emoji from various formats. For us, We decided to go with storing Emoji as UTF-8 characters for the simple reason that Android can render them natively and hence, no special effort is required to implement this on Android side. Though, it did require tweaking our MySql tables and database to switch from utf8 to utf8mb4.

ALTER DATABASE my_database DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
ALTER TABLE my_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

Giving it back!
We are also open sourcing our AngularJs directives and filters for handling emoticons. Check out this Github repository for information on using these in your own application.

Looking Ahead

This is the first of many posts (hopefully) to follow on our technology milestones. We will be posting more in our “How we did it” series as we take readers through our journey of making Akosha a one stop solution for all customer service issues.

For now, we are excited about the launch of our Android application. We think we have solved some fun design, product, and engineering challenges. And there’s still a lot left to do.

If you’re excited about building product-driven technology, come join us!

References

 

One thought on “How we did it: Scalable and Reliable Near Real-time Messaging

Leave a Reply

Your email address will not be published. Required fields are marked *