Avatar A personal blog about technical things I find useful. Also, random ramblings and rants...

MySQL UTF-8 Encoding Bug

TLDR, MySQL UTF-8 encoding was 1 byte shorter!

While working with a LAMP stack in 2015, I encountered something weird. Some emojis worked while some didn’t 🫨. Emojis were not so popular back then and the stakeholders were not pleased to see me working on them as they didn’t solve any business problems. I loved 🐙 , they express emotions. Typing on slow android phones was a hassle back then. A single emoji conveyed more.

TIL, the issue was in the MySQL encoding. Though it was set to UTF-8, it was utf8mb3, supporting a maximum of 3-byte characters. To support 4-byte characters, you need to use utf8mb4. This should have been mentioned in MySQL’s documentation.

Who should care?

Anyone working with legacy MySQL databases(~version 5).

Link to the bug

Link to MySQL’s explanation

Photo by Denis Cherkashin on Unsplash

all tags