Emoji Encoding: A new style for binary encoding for the web
Computers think in binary, and you would have thought that sending binary data around would be pretty easy. But that turns out to be a completely non trivial task. The problem is those pesky humans and needing to interface with them.
For example, if I need to send some binary data over email, I can either do that as attachment, with high probability of at least a few people never getting it, or I can encode it somehow. Typical choices are Base64 encoding for the low tech and barcodes / QR code and the like. For the fancy among us, we can try go with Base85 and other such things. That is pretty standard, but it really has a lot of limitations. Base64 will increase the size of the data by 25%, and it is case sensitive, so it is hard to get right if you need to actually look at it and not just copy/paste it. It is also limited to plain old ASCII, for compatibility reasons that don’t make a lot of sense in today’s world.
I have been thinking about this for a long time, because we need to send binary data (license information) in text, and we also need that to look well and formatted.
After a lot of thought and experimentation, I’m proud to announce a new form of encoding: the Emoji Encoder, available currently for .NET, but soon to be available for Ruby, Python, Go, Node.JS, Ember.js, React.JS and maybe jQuery.
The idea for this innovation came to me because of the following observations:
- Emojis are becoming much more important in any textual conversation (to the point where people will say an emoji). That mean that we can rely on them for long term, which is very important for storage technology.
- Trying to read meaning from emojis being sent is clearly impossible, as anyone taking a peek at a text conversation between two teenage girls can say. (Although they appear to have a hidden meaning, if she sent the red heel and not the blue heel emoji that apparently means something.)
- Because emojis are so relevant, they can be sent anywhere a normal text would go, including email, social media, printing, etc.
- There are a lot of emojis, allowing us to overcome the bloat of Base64 and its friends by dedicating a single emoji for each byte in a 1:1: mapping.
That means that in terms of characters, Emoji Encoding is a net win. Consider the following equivalent information:
- I5xy4dT9Qyjp7DKwuVI6y95EwlDeO/NBeiuc3GJ5Mjo= <—45 characters
- ℹ⤴⚫✔⭕㊗◀☔➖✂♥⛵✖♍❤⛵✅✏ℹ⛲✂ <—33 characters
That is quite important when dealing with constrained textual formats, such as twitter, where the above will be rendered as:
There are other advantages. This data is actually a 256 bits key for use in encryption. And you can actually show it to a user and have a reasonably good chance that they will be able to tell it apart from something else. It rely on the ability of humans to recognize shapes, but it will be very hard for them to actually tell someone your key. There has been a lot of research around such things, and while it isn’t a primary motivation for us, it is a very nice perk.
I mentioned that a key interest for us is the usage in licensing code. Here is an example of how a license email will now look:
I think that in addition to being pretty, it is also going to bring a smile to people faces, so the Emoji Encoder is a win all around.
looks pretty useful... can't wait for same day of the next year for v2 :)
This is life changing! I wonder what my password cache will look like in emoji form! So beautiful!
This will also increase the size of the encoded bytes... But at least it looks nicer!
Well, I don't quite know what is wrong, but I just received the newly encoded RavenDB license file and it tells me "invalid license" when I try using it. What could be going wrong? Do I need to upgrade to the new RavenDB smiley edition?
CheloXL, Yes, it would, but we are counting characters here, since that is what is wildly used, and that is much different.
Alex, We have considered this issue, but then we realized that we have a great solution for this. An untapped resource for tech support and a way to promote equality and goodness in the world. Teenage girls. While typically overlooked for tech support for reasons unspecified, the fact is that due to their lifestyle, they have a lot of experience in dealing with emoji related issues. As such, they have developed the expertise necessary to ensure proper emoji interchange across various media.
A particularly juicy result of this is that teenage girls are everywhere, and we expect that anyone who run into an issue like that is capable of reaching out to such a person and get their immediate assistance. And, thinking long term, those teenage girls are likely to grow up and become people who use our software, so we are aiming at a key demographic at an early stage.
The encoding increases the overall payload and it is not quite efficient. ℹ⤴⚫✔⭕㊗◀☔➖✂♥⛵✖♍❤⛵✅✏ℹ⛲✂ <—33 characters and is 66 bytes
Salar, There are plenty of cases in which case characters are what counted. Twitter for example. But more to the point, the target is humans, and they see chars, not bytes