Can I encode the entire email address using RFC 2047?

No, RFC 2047 is designed only for specific, human-readable parts of email headers like the display name (e.g., in the From or To field) and the Subject or Comments headers. The actual email address (the local-part and domain) must remain in plain ASCII, or follow specific internationalization standards like RFC 6530 for non-ASCII characters.

What happens if I incorrectly encode an email address with RFC 2047?

Attempting to encode the full email address using RFC 2047 can lead to delivery failures and bounce messages, particularly from major email providers like Gmail, which enforce strict adherence to RFC 2047 and RFC 5322 standards. It can also pose a security risk, allowing for potential phishing attacks.

Which parts of an email header can be encoded with RFC 2047?

RFC 2047 encoding is specifically allowed for the 'display name' (e.g., John Doe in From: John Doe ), the Subject header, and the Comments header. These are the parts intended for human readability where non-ASCII characters are frequently used.

Where should I absolutely avoid using RFC 2047 encoding?

You should avoid using RFC 2047 encoding for the actual email address part of any header (local-part@domain). Additionally, it should not be used for technical headers like Message-ID or List-Unsubscribe , as this is not compliant and can cause functionality issues.

When can you encode email addresses using RFC 2047? - Technical - Email deliverability - Knowledge base

When handling email, especially for international audiences, we often encounter characters beyond the basic ASCII set. This is where RFC 2047 comes into play, a standard designed to allow non-ASCII text in email headers. It uses a specific encoding syntax, often seen as strings like =?charset?encoding?encoded-text?=. The intention behind RFC 2047 is to ensure that email headers, which are traditionally limited to 7-bit ASCII, can correctly display characters from various languages and scripts.

The challenge arises when implementers or email senders misinterpret the scope of RFC 2047, leading to compliance issues with major email service providers. While it enables the inclusion of characters like umlauts, accented letters, or Cyrillic script, its application is strictly defined to prevent potential security vulnerabilities and ensure consistent parsing across different mail clients and servers.

Understanding the precise boundaries of RFC 2047 is crucial for maintaining good email deliverability and avoiding bounces or emails landing in spam. It's not just about getting the encoding right, but also about applying it in the correct places within the email header structure.

What RFC 2047 is for

RFC 2047, also known as MIME (Multipurpose Internet Mail Extensions) Part 3, specifies how to encode non-ASCII text in various parts of an email message header. The primary goal is to ensure that email clients can properly display characters that are outside the standard 7-bit ASCII character set, without breaking the underlying header format which relies on ASCII characters for parsing.

It defines a mechanism called 'encoded-word' which allows segments of header field bodies to contain characters other than US-ASCII. These encoded words use a specific syntax, indicating the character set (e.g., UTF-8, ISO-8859-1), the encoding method (Base64 or Quoted-Printable), and the actual encoded text. For example, a subject line with special characters might appear like this:

Example of an RFC 2047 encoded subject linetext

Subject: =?UTF-8?Q?Fwd:_Rechnung_f=C3=BCr_M=C3=BChlen?=

This encoding is vital for email headers that contain non-ASCII characters, allowing names, subjects, and comments to be expressed in a user's native language. Without it, these characters would be rendered incorrectly, appearing as garbled text or question marks, which degrades the user experience and can make emails look unprofessional or even suspicious.

Permitted header fields for encoding

RFC 2047 explicitly defines which header fields (or parts of them) are permitted to use this encoding. According to RFC 2047 Section 5, encoded words are only allowed in specific locations to maintain the integrity and parsability of the email headers. This means you cannot simply encode any part of a header you wish.

The standard allows encoding for the display name in From, To, Cc, and Bcc headers. This is commonly referred to as the 'friendly name' or 'display name.' For instance, friendly email addresses are often encoded this way. Additionally, the Subject and Comments headers also permit RFC 2047 encoding.

This table summarizes where RFC 2047 encoding is typically used and where it should be avoided:

Header Field Part	Permitted for RFC 2047 Encoding	Example
Display Name (From, To, Cc)	Yes	From: =?utf-8?b?SGVsbG8gV29ybGQ=?= <user@example.com>
Subject	Yes	Subject: =?UTF-8?Q?An_Important_E-mail?=
Comments	Yes	Comments: =?UTF-8?Q?Some_Encoded_Comment?=
Email Address	No
Message-ID	No
List-Unsubscribe	No

When not to encode with RFC 2047

One of the most frequent mistakes I see is attempting to encode the entire email address portion of a header using RFC 2047. While some older or more lenient email systems might process these emails, major providers like

Gmail will reject them with errors such as 'Messages missing a valid address in From: header, or having no From: header, are not accepted.'

This happens because RFC 2047 is not intended for the local-part or domain of an email address. These parts must remain in plain ASCII, or, for truly internationalized email addresses, they must comply with newer standards like RFC 6530 and its related specifications which define Email Address Internationalization (EAI). Misusing RFC 2047 for the entire address is considered invalid and can trigger RFC compliance errors.

Incorrect RFC 2047 usage

Encoding the entire 'From' header, including the email address itself, is a common error. This often leads to messages being rejected by stricter email providers because it violates RFC 5322 specifications regarding address format.

Fully encoded From header (incorrect)text

FROM: =?utf-8?b?0JTQvNC40YLRgNC+INCT0L7QvNC+0L3RjtC6IDxkbXl0cm9AaG9tb25pdWsuY29tPg==?=

Deliverability impact: Leads to rejections (e.g., Gmail bouncing emails with RFC5322 errors) as the address is not properly formatted.
Security risk: Can be exploited for phishing and spoofing by obscuring malicious addresses.

Another common pitfall is attempting to use RFC 2047 encoding for the List-Unsubscribe header. Despite some systems seemingly allowing it, this is not permitted by the RFCs. I've observed that

Google may silently remove such encoding when viewing the original message, leading to confusion when unsubscribe links fail. Adhering to the specific guidelines for each header, as outlined in the relevant RFCs, is essential for robust email deliverability.

The proper way to encode

RFC 2047 is designed for encoding the human-readable display names and subject lines, not the email address itself. The address (local-part@domain) must remain unencoded ASCII or use specific EAI standards for non-ASCII characters.

Correctly encoded From headertext

FROM: =?utf-8?b?0JTQvNC40YLRgNC+INCT0L7QvNC+0L3RjtC6?= <dmytro@homoniuk.com>

Compliance: Ensures adherence to RFC 5322 specifications, improving acceptance by major providers.
Readability: Ensures friendly names display correctly while maintaining a parsable email address.

Conclusion

In summary, RFC 2047 is a critical standard for allowing non-ASCII characters in email headers, primarily for the 'human-readable' parts like display names in From, To, and Cc headers, as well as the Subject and Comments headers. It is not, however, designed for encoding the email address itself.

Attempting to encode the full email address using RFC 2047 will likely result in delivery failures, especially with stringent providers. Furthermore, misapplication to headers like List-Unsubscribe can lead to silent failures that impact compliance and user experience. For non-ASCII characters in the actual email address, Email Address Internationalization (EAI) standards (RFC 6530 series) are the correct approach.

Adhering strictly to these RFC guidelines ensures that your emails are not only correctly displayed but also reliably delivered, fostering better trust and deliverability. Always test your email header encoding to ensure compliance and avoid common pitfalls.

Views from the trenches

Best practices

Always encode only the display name, subject, or comments in headers, leaving the actual email address in plain ASCII.

Use UTF-8 as your character set whenever possible for broader compatibility with modern email clients and languages.

Test your email headers using a reliable email testing tool to ensure correct RFC 2047 implementation and avoid errors.

Common pitfalls

Encoding the entire 'From' or 'To' header, including the email address itself, using RFC 2047, which leads to bounces.

Applying RFC 2047 encoding to headers like 'List-Unsubscribe', which is not permitted and can cause issues.

Using incorrect character sets or encoding methods, leading to garbled text for recipients.

Expert tips

For full internationalization of email addresses, investigate RFC 6530 (EAI) rather than misapplying RFC 2047.

Be aware that some email providers might silently strip or modify non-compliant header encoding, making troubleshooting difficult.

Prioritize strict RFC compliance over perceived compatibility; modern email providers are becoming stricter.

Expert view

Expert from Email Geeks says RFC 2047 encoding is strictly for human-readable text, such as the friendly name in From or To headers, and the subject line, not the actual email address.

2024-03-08 - Email Geeks

Expert view

Expert from Email Geeks says encoding the email address itself, even if it occasionally works with some software, is never considered a valid practice according to email standards.

2024-02-20 - Email Geeks