Why are automated scripts and crawlers opening my emails, and how can I identify and exclude them from tracking?

12Marketer opinions 4Expert opinions 5Technical articles 3Resources

Summary

Automated scripts and crawlers open emails due to a combination of factors, including security scans by email providers and organizations, indexing by search engine bots (like Googlebot and Bingbot), and malicious activity from spammers. This inflated open rate can be misleading. To mitigate this, a multi-faceted approach is required. Key strategies involve implementing double opt-in processes and CAPTCHAs to prevent bot sign-ups, and regularly cleaning email lists to remove unengaged users. Identifying and excluding bot traffic requires monitoring user-agent strings (e.g., 'python-requests', 'AHC/2.1'), analyzing IP addresses (particularly those originating from cloud services like AWS, GCP, DO, and Azure), and scrutinizing open patterns (e.g., very rapid opens after sending). Public resources like AWS's IP range JSON file and Spamhaus blacklists can aid in identifying malicious IPs. Additionally, services like Apple's Mail Privacy Protection (MPP) also influence open rates and need consideration. Furthermore, understanding SMTP standards from IETF helps detect traffic anomalies. Ultimately, a combination of preventative measures, identification techniques, and continuous monitoring is vital for maintaining accurate email analytics.

Key findings

Security Scanning: Email security programs scan emails for threats, leading to automated opens.
Search Engine Indexing: Search engine crawlers like Googlebot and Bingbot index email content.
Cloud Service Origins: A significant portion of bot traffic originates from cloud services like AWS, GCP, Digital Ocean, and Azure.
User-Agent Patterns: Specific user-agent strings (e.g., 'python-requests', 'AHC/2.1') are indicative of bot activity.
MPP Inflation: Apple's Mail Privacy Protection (MPP) inflates open rates by pre-loading images.
Doulbe opt-in: double opt-in reduces signups that are bots.

Key considerations

Implement Double Opt-In: Require double opt-in for new subscribers to prevent bot sign-ups.
Monitor User Agents: Continuously monitor user-agent strings and filter out known bot user agents.
Analyze IP Addresses: Analyze and exclude traffic from IP addresses associated with cloud services and known bot networks.
Review Open Patterns: Examine open patterns for anomalies like rapid opens immediately after sending.
Utilize Public Resources: Leverage resources like AWS's IP range JSON file and Spamhaus blacklists.
Regular List Cleaning: Remove inactive or unengaged subscribers to reduce overall traffic from bots.
Segmentation Testing: segmenting and testing mailings can help you to identify bot activity and segment it out.

What email marketers say
12Marketer opinions

Automated scripts and crawlers open emails primarily due to security scans and indexing by search engines, inflating open rates and distorting email marketing metrics. To mitigate this, marketers should implement double opt-in processes, CAPTCHAs, and regular list cleaning. Identifying and excluding bot traffic involves monitoring user agent strings (e.g., python-requests, AHC/2.1), IP addresses (especially those from AWS), and open patterns (e.g., very rapid opens). Tools and techniques include AWS's IP range JSON file, analyzing open times and frequencies, and considering the impact of Apple's Mail Privacy Protection (MPP).

Key opinions

Security Scans: Security software and appliances open emails to scan for threats, leading to inflated open rates.
Bot Identification: Bots can be identified by their user agent strings (e.g., python-requests), IP addresses (often from AWS or other cloud providers), and rapid open times.
Double Opt-In: Implementing double opt-in helps ensure that email addresses are valid and reduces the number of bot sign-ups.
List Cleaning: Regularly cleaning email lists removes unengaged subscribers and reduces the impact of bot traffic.
MPP Impact: Apple's Mail Privacy Protection (MPP) loads images automatically, inflating open rates and mimicking bot behavior.

Key considerations

User Agent Monitoring: Regularly monitor user agent strings in email analytics to identify and exclude known bot user agents.
IP Address Exclusion: Exclude IP addresses associated with cloud providers (e.g., AWS) and known bot networks from open tracking.
Pattern Analysis: Analyze open patterns, such as unusually fast opens after sending, to identify and filter out bot traffic.
AWS IP Ranges: Utilize AWS's JSON file of IP ranges to identify and exclude AWS-originated traffic.
Double Opt-In Implementation: Ensure a robust double opt-in process is in place to validate new subscribers and reduce bot sign-ups.
Tracking Pixel: Implement a unique tracking pixel per recipient and monitor unusual patterns like rapid opens.

Marketer view

Email marketer from ZeroBounce.net explains that implementing a double opt-in to confirm each email address can reduce invalid signups. This is one of the first lines of defense in preventing bots from skewing open rates.

January 2023 - ZeroBounce.net

Marketer view

Email marketer from EmailonAcid.com shares that security programs are scanning emails as a means of providing security to their users. Recommends using a combination of methods to filter bots, including excluding known bot IPs, identifying common bot user agents (like python-requests), and analyzing open patterns (like very fast opens after sending).

June 2022 - EmailonAcid.com

Marketer view

Email marketer from StackOverflow notes that automated systems often open emails for security checks. The user suggests implementing a unique tracking pixel per recipient and monitoring unusual patterns like rapid opens or opens from uncommon user agents.

June 2021 - StackOverflow

Marketer view

Marketer from Email Geeks identifies AHC/2.1 as associated with SchoolMessenger (<https://www.schoolmessenger.com/student-email-safety/>) and notes the IPs are usually AWS, indicating non-human activity.

August 2021 - Email Geeks

Marketer view

Email marketer from EmailVendorSelection.com responds that bots and email security software can trigger opens without genuine engagement. Recommends analyzing IP addresses, user agents, and open times to identify and exclude bot traffic from email metrics.

June 2022 - EmailVendorSelection.com

Marketer view

Email marketer from Mailjet.com answers that bots inflate email open rates and suggests using a double opt-in process to confirm genuine subscriptions, and implementing a reCAPTCHA on the signup form. They also suggest regularly cleaning your email list to remove unengaged subscribers.

July 2024 - Mailjet.com

Marketer view

Marketer from Email Geeks shares a link to AWS's JSON file for IPv4 and IPv6 ranges: <https://ip-ranges.amazonaws.com/ip-ranges.json>.

October 2021 - Email Geeks

Marketer view

Email marketer from Litmus.com suggests using a combination of methods to filter bots, including excluding known bot IPs, identifying common bot user agents (like python-requests), and analyzing open patterns (like very fast opens after sending) to improve email marketing reports.

August 2023 - Litmus.com

Marketer view

Email marketer from SparkPost.com explains that bots and automated systems can inflate open rates, and provides the following advice: Examine IP addresses, user agents and frequency of opens to help exclude them. Also, create a double opt-in system to help avoid fake sign-ups.

January 2025 - SparkPost.com

Marketer view

Email marketer from Reddit explains that many security appliances open emails to check for malicious content, inflating open rates. Suggested solutions include identifying these 'false' opens by their user agent or IP and filtering them out.

October 2023 - Reddit

Marketer view

Email marketer from SenderPulse.com notes that Apple's Mail Privacy Protection (MPP) loads images on behalf of the user thus inflating open rates, much like a bot would. Users can use their software to identify email opens that are potentially protected by MPP.

December 2024 - SenderPulse.com

Marketer view

Email marketer from NeilPatel.com shares that bot traffic inflates open rates and suggests implementing CAPTCHAs on signup forms, using double opt-in, and regularly cleaning email lists to minimize the impact of bots.

October 2023 - NeilPatel.com

What the experts say
4Expert opinions

Automated scripts and crawlers open emails primarily due to security software scanning for threats and automated systems interacting with email content. To address this, experts recommend treating traffic from cloud services like AWS, GCP, Digital Ocean, and Azure suspiciously, as these are unlikely to represent genuine user opens. Identifying these non-human interactions involves monitoring user agent strings (e.g., 'python-requests'), IP addresses (specifically those from cloud providers), and analyzing open patterns, such as rapid opens immediately after sending. Segmenting and testing mailings can further refine bot identification and mitigation efforts.

Key opinions

Cloud Service Traffic: Traffic originating from cloud services (AWS, GCP, Digital Ocean, Azure) should be treated with suspicion as it is less likely to be from real users.
Security Software Scanning: Security software scanning emails for threats can cause automated opens, inflating open rates.
User Agent Monitoring: Monitoring user agent strings like 'python-requests' helps identify automated scripts and crawlers.
Open Pattern Analysis: Analyzing open patterns, such as rapid opens, helps distinguish bot activity from genuine user engagement.

Key considerations

IP Exclusion: Consider excluding IP addresses associated with cloud providers from open tracking metrics.
Suspicious Traffic Handling: Treat traffic from cloud services as potentially non-human and adjust reporting accordingly.
User Agent Tracking: Implement systems to track and filter out traffic based on identified bot user agent strings.
Segmentation and Testing: Segment mailings and test results to refine bot identification and improve the accuracy of email marketing metrics.

Expert view

Expert from Word to the Wise shares that bot traffic from security scans is often misattributed and suggests monitoring user agent strings, and identifying patterns in opens to identify these non-human opens. They also recommend segmenting and testing your mailings.

January 2022 - Word to the Wise

Expert view

Expert from Spam Resource explains that one reason for automated opens is security software scanning emails for threats. They share to identify these opens, monitor user-agent strings like 'python-requests' or look for rapid opens after the email is sent.

May 2022 - Spam Resource

Expert view

Expert from Email Geeks suggests that any AWS IPs can be excluded because end-users do not typically use AWS as their internet connection.

November 2024 - Email Geeks

Expert view

Expert from Email Geeks advises treating open traffic from cloud services like AWS, GCP, DO, and Azure suspiciously, considering them as non-human interactions and not 'true' opens for reporting.

January 2022 - Email Geeks

What the documentation says
5Technical articles

Automated scripts and crawlers open emails for various reasons, including indexing by search engines (Googlebot, Bing) and malicious activity. Identifying these bots involves using user-agent strings, IP addresses, and publicly available resources such as AWS's IP ranges and Spamhaus's blacklists. Understanding SMTP standards, as defined by the IETF, helps identify anomalies in traffic patterns. Excluding this bot traffic is essential for accurate email analytics.

Key findings

Search Engine Crawlers: Googlebot and Bingbot crawl web content, potentially triggering email opens.
User-Agent & IP Identification: Bots can be identified using user-agent strings and IP addresses provided by search engines (Google, Microsoft).
AWS IP Ranges: Amazon Web Services publishes a JSON file of their IPv4 and IPv6 ranges, helping identify AWS-originated traffic.
Spamhaus Blacklists: Spamhaus maintains blacklists of IPs and domains used by spammers and bots.
SMTP Standards: IETF's SMTP standards provide context for identifying legitimate email behavior and anomalies.

Key considerations

User-Agent Filtering: Filter email traffic based on known bot user-agent strings to prevent skewed analytics.
IP Address Analysis: Analyze and potentially exclude traffic originating from AWS IP ranges or IPs listed on Spamhaus blacklists.
Regular Updates: Regularly update IP address ranges and blacklist checks due to the dynamic nature of bot networks.
SMTP Compliance: Use SMTP standards to guide anomaly detection and identify suspicious email traffic patterns.
Documentation Review: Refer to official documentation from Google, Microsoft, AWS, Spamhaus, and IETF for accurate identification and mitigation strategies.

Technical article

Documentation from IETF provides detailed technical standards for SMTP, including user agent conventions. These documents are used to understand the expected behavior and format of legitimate email clients and identify anomalies associated with bot traffic.

October 2022 - ietf.org

Technical article

Documentation from Amazon Web Services shares that they publish a JSON file containing all their public IPv4 and IPv6 address ranges. This list can be used to identify and filter out bot traffic originating from AWS infrastructure. The ip-ranges.json file is updated frequently and should be checked regularly.

March 2024 - Amazon Web Services

Technical article

Documentation from developers.google.com explains that Googlebot crawls the web to index content. These crawls may trigger opens, but should ideally be identified via user-agent strings and IP ranges to avoid skewing email analytics.

September 2023 - developers.google.com

Technical article

Documentation from Spamhaus.org shares that they maintain blacklists of IPs and domains used by spammers and bots. Checking email traffic against these lists can help identify and block malicious bot activity.

March 2025 - Spamhaus.org

Technical article

Documentation from learn.microsoft.com provides information that Microsoft uses crawlers to index websites for its search engine, Bing. These crawlers can be identified by user agent strings and IP addresses provided in Microsoft's documentation.

October 2023 - learn.microsoft.com

Discrepancies with server side tracking · plausible analytics ...

I’m seeing some pretty large discrepancies between netlify analytics and plausible, e.g. for yesterday: pageviews: netlify 2726 / plausible 1383 unique visitors: netlify 1170 / plausible 687 only e...

GitHub