What is the Difference Between utf8 and utf8mb4 in MySQL?

Character encoding is foundational to how data is stored, queried, and rendered in modern applications and hosting environments. If you’re running a website or web application on a shared hosting, VPS, or dedicated server powered by MySQL or MariaDB, you’ve likely encountered the terms utf8 and utf8mb4. At first glance, they seem synonymous—both represent Unicode encodings. But under the hood, there’s a critical distinction that can make or break your app’s ability to store modern text data like emojis, multilingual content, or certain CJK (Chinese, Japanese, Korean) characters—especially in internationalized or multilingual hosting solutions.

Definitions

utf8

  • MySQL’s legacy Unicode encoding.

  • Supports only 3 bytes per character.

  • Capable of storing characters in the Basic Multilingual Plane (BMP): U+0000 to U+FFFF.

  • Cannot store emojis, musical symbols, certain Chinese characters, and other supplementary characters.

utf8mb4 (Multi-Byte 4)

  • The real UTF-8 implementation.

  • Supports full Unicode, including characters outside the BMP.

  • Uses up to 4 bytes per character—as UTF-8 was designed to do.

  • Required for storing emojis (😊), rare Chinese characters (𠀋), or mathematical symbols (𝛑).

 The Misleading utf8 in MySQL

In MySQL, the utf8 character set is not a full implementation of the UTF-8 standard. It is limited to 3 bytes, whereas standard UTF-8 uses up to 4 bytes. This means:

  • utf8 in MySQL is not real UTF-8.

  • It’s more like a subset of UTF-8 that excludes code points beyond U+FFFF.

By contrast, utf8mb4 complies fully with the UTF-8 standard.

Technical Comparison

Featureutf8utf8mb4
Max bytes per character34
Unicode coverageUp to U+FFFF (BMP only)Full range (up to U+10FFFF)
Emoji support❌ No✅ Yes
Supplementary character support❌ No✅ Yes
MySQL compatibility✅ Legacy-safe✅ Full Unicode
Collation optionsLimitedMore extensive (e.g., utf8mb4_0900_ai_ci)

Why utf8mb4 Is the Right Choice

1. Emoji and Modern Symbol Support

You can’t store 🐱, 🧠, 🚀, or 🇩🇪 using MySQL’s utf8. These are outside the BMP.

2. Better Collation and Sorting

utf8mb4 supports newer collations like:

  • utf8mb4_unicode_ci: Unicode standard sorting

  • utf8mb4_general_ci: Fast but less accurate

  • utf8mb4_0900_ai_ci: Modern Unicode 9.0-aware collation (available in MySQL 8+)

3. Future-Proofing

As Unicode expands, newer characters will fall outside the 3-byte range. utf8mb4 ensures you’re not locked out of future symbols.

What Happens If You Use utf8?

If you attempt to insert a 4-byte character (like an emoji) into a column with+, you’ll get this error:

ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x81' for column 'title' at row 1

Worse, your app might silently truncate or corrupt data if not properly validated.

Migrating from utf8 to utf8mb4

To safely migrate your schema:

Step 1: Update table and column definitions

ALTER TABLE my_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Step 2: Update database defaults

ALTER DATABASE my_db CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

Step 3: Update application connection settings

Ensure your app connects using utf8mb4:

SET NAMES utf8mb4;

Best Practices

  • Always use utf8mb4 for new databases.

  • ✅ Use utf8mb4_unicode_ci for accuracy or utf8mb4_general_ci for performance.

  • ✅ Set default charset at table and database levels.

  • ✅ Ensure application-layer libraries (e.g., PDO, MySQLi, Sequelize) support utf8mb4.

Conclusion

The difference between utf8 and utf8mb4 in MySQL is more than just a byte—it’s the difference between modern Unicode compatibility and silent failure. While utf8 remains backward-compatible, it is deprecated for many modern use cases. Always prefer utf8mb4 to future-proof your application and ensure complete multilingual, emoji, and special symbol support.