When can you encode email addresses using RFC 2047?
Matthew Whittaker
Co-founder & CTO, Suped
Published 2 Jun 2025
Updated 19 Aug 2025
6 min read
When handling email, especially for international audiences, we often encounter characters beyond the basic ASCII set. This is where RFC 2047 comes into play, a standard designed to allow non-ASCII text in email headers. It uses a specific encoding syntax, often seen as strings like =?charset?encoding?encoded-text?=. The intention behind RFC 2047 is to ensure that email headers, which are traditionally limited to 7-bit ASCII, can correctly display characters from various languages and scripts.
The challenge arises when implementers or email senders misinterpret the scope of RFC 2047, leading to compliance issues with major email service providers. While it enables the inclusion of characters like umlauts, accented letters, or Cyrillic script, its application is strictly defined to prevent potential security vulnerabilities and ensure consistent parsing across different mail clients and servers.
Understanding the precise boundaries of RFC 2047 is crucial for maintaining good email deliverability and avoiding bounces or emails landing in spam. It's not just about getting the encoding right, but also about applying it in the correct places within the email header structure.
What RFC 2047 is for
RFC 2047, also known as MIME (Multipurpose Internet Mail Extensions) Part 3, specifies how to encode non-ASCII text in various parts of an email message header. The primary goal is to ensure that email clients can properly display characters that are outside the standard 7-bit ASCII character set, without breaking the underlying header format which relies on ASCII characters for parsing.
It defines a mechanism called 'encoded-word' which allows segments of header field bodies to contain characters other than US-ASCII. These encoded words use a specific syntax, indicating the character set (e.g., UTF-8, ISO-8859-1), the encoding method (Base64 or Quoted-Printable), and the actual encoded text. For example, a subject line with special characters might appear like this:
This encoding is vital for email headers that contain non-ASCII characters, allowing names, subjects, and comments to be expressed in a user's native language. Without it, these characters would be rendered incorrectly, appearing as garbled text or question marks, which degrades the user experience and can make emails look unprofessional or even suspicious.
Permitted header fields for encoding
RFC 2047 explicitly defines which header fields (or parts of them) are permitted to use this encoding. According to RFC 2047 Section 5, encoded words are only allowed in specific locations to maintain the integrity and parsability of the email headers. This means you cannot simply encode any part of a header you wish.
The standard allows encoding for the display name in From, To, Cc, and Bcc headers. This is commonly referred to as the 'friendly name' or 'display name.' For instance, friendly email addresses are often encoded this way. Additionally, the Subject and Comments headers also permit RFC 2047 encoding.
This table summarizes where RFC 2047 encoding is typically used and where it should be avoided:
One of the most frequent mistakes I see is attempting to encode the entire email address portion of a header using RFC 2047. While some older or more lenient email systems might process these emails, major providers like Gmail will reject them with errors such as 'Messages missing a valid address in From: header, or having no From: header, are not accepted.'
This happens because RFC 2047 is not intended for the local-part or domain of an email address. These parts must remain in plain ASCII, or, for truly internationalized email addresses, they must comply with newer standards like RFC 6530 and its related specifications which define Email Address Internationalization (EAI). Misusing RFC 2047 for the entire address is considered invalid and can trigger RFC compliance errors.
Incorrect RFC 2047 usage
Encoding the entire 'From' header, including the email address itself, is a common error. This often leads to messages being rejected by stricter email providers because it violates RFC 5322 specifications regarding address format.
Security risk: Can be exploited for phishing and spoofing by obscuring malicious addresses.
Another common pitfall is attempting to use RFC 2047 encoding for the List-Unsubscribe header. Despite some systems seemingly allowing it, this is not permitted by the RFCs. I've observed that Google may silently remove such encoding when viewing the original message, leading to confusion when unsubscribe links fail. Adhering to the specific guidelines for each header, as outlined in the relevant RFCs, is essential for robust email deliverability.
The proper way to encode
RFC 2047 is designed for encoding the human-readable display names and subject lines, not the email address itself. The address (local-part@domain) must remain unencoded ASCII or use specific EAI standards for non-ASCII characters.
Compliance: Ensures adherence to RFC 5322 specifications, improving acceptance by major providers.
Readability: Ensures friendly names display correctly while maintaining a parsable email address.
Conclusion
In summary, RFC 2047 is a critical standard for allowing non-ASCII characters in email headers, primarily for the 'human-readable' parts like display names in From, To, and Cc headers, as well as the Subject and Comments headers. It is not, however, designed for encoding the email address itself.
Attempting to encode the full email address using RFC 2047 will likely result in delivery failures, especially with stringent providers. Furthermore, misapplication to headers like List-Unsubscribe can lead to silent failures that impact compliance and user experience. For non-ASCII characters in the actual email address, Email Address Internationalization (EAI) standards (RFC 6530 series) are the correct approach.
Adhering strictly to these RFC guidelines ensures that your emails are not only correctly displayed but also reliably delivered, fostering better trust and deliverability. Always test your email header encoding to ensure compliance and avoid common pitfalls.
Views from the trenches
Best practices
Always encode only the display name, subject, or comments in headers, leaving the actual email address in plain ASCII.
Use UTF-8 as your character set whenever possible for broader compatibility with modern email clients and languages.
Test your email headers using a reliable email testing tool to ensure correct RFC 2047 implementation and avoid errors.
Common pitfalls
Encoding the entire 'From' or 'To' header, including the email address itself, using RFC 2047, which leads to bounces.
Applying RFC 2047 encoding to headers like 'List-Unsubscribe', which is not permitted and can cause issues.
Using incorrect character sets or encoding methods, leading to garbled text for recipients.
Expert tips
For full internationalization of email addresses, investigate RFC 6530 (EAI) rather than misapplying RFC 2047.
Be aware that some email providers might silently strip or modify non-compliant header encoding, making troubleshooting difficult.
Prioritize strict RFC compliance over perceived compatibility; modern email providers are becoming stricter.
Expert view
Expert from Email Geeks says RFC 2047 encoding is strictly for human-readable text, such as the friendly name in From or To headers, and the subject line, not the actual email address.
2024-03-08 - Email Geeks
Expert view
Expert from Email Geeks says encoding the email address itself, even if it occasionally works with some software, is never considered a valid practice according to email standards.