Character encoding is foundational to how data is stored, queried, and rendered in modern applications and hosting environments. If you’re running a website or web application on a shared hosting, VPS, or dedicated server powered by MySQL or MariaDB, you’ve likely encountered the terms utf8 and utf8mb4. At first glance, they seem synonymous—both represent Unicode encodings. But under the hood, there’s a critical distinction that can make or break your app’s ability to store modern text data like emojis, multilingual content, or certain CJK (Chinese, Japanese, Korean) characters—especially in internationalized or multilingual hosting solutions.
utf8
MySQL’s legacy Unicode encoding.
Supports only 3 bytes per character.
Capable of storing characters in the Basic Multilingual Plane (BMP): U+0000 to U+FFFF.
Cannot store emojis, musical symbols, certain Chinese characters, and other supplementary characters.
utf8mb4
(Multi-Byte 4)The real UTF-8 implementation.
Supports full Unicode, including characters outside the BMP.
Uses up to 4 bytes per character—as UTF-8 was designed to do.
Required for storing emojis (😊), rare Chinese characters (𠀋), or mathematical symbols (𝛑).
In MySQL, the utf8 character set is not a full implementation of the UTF-8 standard. It is limited to 3 bytes, whereas standard UTF-8 uses up to 4 bytes. This means:
utf8 in MySQL is not real UTF-8.
It’s more like a subset of UTF-8 that excludes code points beyond U+FFFF.
By contrast, utf8mb4 complies fully with the UTF-8 standard.
Feature | utf8 | utf8mb4 |
---|---|---|
Max bytes per character | 3 | 4 |
Unicode coverage | Up to U+FFFF (BMP only) | Full range (up to U+10FFFF) |
Emoji support | ❌ No | ✅ Yes |
Supplementary character support | ❌ No | ✅ Yes |
MySQL compatibility | ✅ Legacy-safe | ✅ Full Unicode |
Collation options | Limited | More extensive (e.g., utf8mb4_0900_ai_ci) |
You can’t store 🐱, 🧠, 🚀, or 🇩🇪 using MySQL’s utf8. These are outside the BMP.
utf8mb4 supports newer collations like:
utf8mb4_unicode_ci: Unicode standard sorting
utf8mb4_general_ci: Fast but less accurate
utf8mb4_0900_ai_ci: Modern Unicode 9.0-aware collation (available in MySQL 8+)
As Unicode expands, newer characters will fall outside the 3-byte range. utf8mb4 ensures you’re not locked out of future symbols.
If you attempt to insert a 4-byte character (like an emoji) into a column with+, you’ll get this error:
Worse, your app might silently truncate or corrupt data if not properly validated.
To safely migrate your schema:
Ensure your app connects using utf8mb4:
✅ Always use utf8mb4 for new databases.
✅ Use utf8mb4_unicode_ci for accuracy or utf8mb4_general_ci for performance.
✅ Set default charset at table and database levels.
✅ Ensure application-layer libraries (e.g., PDO, MySQLi, Sequelize) support utf8mb4.
The difference between utf8 and utf8mb4 in MySQL is more than just a byte—it’s the difference between modern Unicode compatibility and silent failure. While utf8 remains backward-compatible, it is deprecated for many modern use cases. Always prefer utf8mb4 to future-proof your application and ensure complete multilingual, emoji, and special symbol support.