How to Conduct Email Forensics

Email Forensics refers to the process of analysing metadata, content, and attachments of an email to identify phishing attempts, malicious content, and extract potential indicators of compromise (IOCs).

Performing email forensics involves a structured methodology to investigate suspicious emails, identify their true origin, analyse their content, and determine if they pose a threat. This guide breaks the process into 3 key areas for analysis: the email header, the body and the attachment.

The Email Header

Email headers are lines of text that form the essential part of an email, providing critical information about its origin, routing, and how it should be handled. You can often view full headers directly within email clients like Mozilla Thunderbird (by going to View > Message Source), Gmail (by More > Show original), or Outlook on the web (by More actions > View message source).

Headers can be viewed using any text editor like Sublime Text or command-line tools like cat or grep if you have the email saved as a file (e.g., .eml format). Tools like MX Toolbox can parse headers into an easy-to-read table format, often performing client-side processing to protect sensitive data.

The Anatomy of an Email Header

Screenshot of an email header displaying technical details like received timestamps, server information, SPF and DKIM results, with a subject line mentioning "Microsoft account unusual sign-in activity" and a reply-to address.

Date

Documents the email's delivery time and is useful for performing searches for similar emails or correlating other suspicious activities, for example, in a Security Information and Event Management (SIM) system.
From

This is one of the most frequently spoofed headers by attackers because it can be set to any arbitrary value. Documenting it is important to identify discrepancies and use as an artifact for searching.
Subject

Useful for fingerprinting an email; it can be used to search against an email gateway for associated emails or, in rare cases, to block specific subject lines.
Message ID

A unique identifier generated by the first Mail Transfer Agent (MTA) that the message traverses. It should always be unique for a particular version of a message by the server that generates it. A repeating message ID within the same email system can indicate forgery. A mismatch between the message ID format and the known format for a purportedly used mail system can indicate a forged email.
To (Recipient)

Specifies the primary recipient(s) of the email. Helps determine the scope of a potential incident, especially if multiple users received the same suspicious email. Attackers can use Blind Carbon Copy (BCC) to hide multiple recipients, making other headers like timestamp, subject, and from fields useful for broader searches.
Reply-To

Specifies the email address to which replies should be directed, distinct from the "From" address. Attackers often use a different "Reply-To" address (one they control) when spoofing a legitimate domain, as they do not have access to the legitimate domain's mailbox.
Return-Path

Also known as the envelope sender address or bounce address, it specifies where delivery failure notifications should be sent. Can be checked for inconsistencies with the "From" address, as a mismatch may indicate a forged message. Attackers mass phishing might direct bouncebacks elsewhere.
X-Header Fields

These are custom "X-headers" that sometimes store the IP address of the machine sending the message. These can also be experimental or extended headers, often prefixed with "X-". Mail providers commonly add them for purposes like spam filter information, authentication results, and tracking. If present, the IP can be geolocated, checked for reputation, or used for a reverse DNS lookup to identify the mail server's domain.
Received Headers

Each email server (MTA) that handles a message adds its own "Received" header, creating a chronological record of the message's path (a "received chain") from source to destination.

Arguably the most important and reliable header because they typically cannot be spoofed by the sender, as they are added by the MTAs themselves. They are in reverse chronological order, meaning the topmost "Received" header is the most recent (closest to the recipient), and the bottom-most is the one closest to the source. They often include the sending and receiving mail servers, hostnames, and IP addresses. Once you identify IP addresses or domains from headers, perform further investigation.
Authentication Headers (SPF, DKIM, DMARC)

Authentication headers, specifically in the context of email, refer to a set of technologies that help verify the legitimacy and origin of email messages. The three major email authentication technologies are Sender Policy Framework (SPF), DomainKeys Identified Mail (DKIM), and Domain-based Message Authentication, Reporting, and Conformance (DMARC).

A passing SPF check confirms authorisation but not legitimacy if the attacker uses their own domain or a lookalike. DomainKeys Identified Mail (DKIM) uses a digital signature to verify that an email originated from the claimed domain and that its content hasn't been tampered with during transit. Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an extension of SPF and DKIM that allows domain owners to specify policies for how emails failing these checks should be handled (e.g., none, quarantine, reject) and provides reporting mechanisms.
MIME-Version

Indicates the version of the Multipurpose Internet Mail Extensions (MIME) standard being used.

Content-Type

Helps identify if an email contains HTML-rendered content or plain text, and it can indicate the presence of multiple content parts.
Content-Transfer-Encoding

Specifies how the content is encoded for transmission (e.g., 7-bit, 8-bit, Base64, quoted-printable). Useful for detecting obfuscated content that might evade weak email filters.
Boundary

A string used to differentiate various parts of a MIME-encoded email body

The Email Body

The email body serves as the main content space within an email message and can include various elements such as plain text, HTML, or even multimedia-rich content. While email clients render the body to appear as intended (e.g., with custom styling, buttons, logos), the raw HTML markup or source can be exported by downloading the email and examining it in a text editor like Sublime Text to see the underlying code. Discrepancies between how a company's name is spelt or grammatical errors can be immediate red flags

Identify urgency (e.g., account suspension warnings), trust (e.g., authentic-looking logos, official language), authority (e.g., impersonating executives), intimidation (e.g., account suspensions), scarcity (e.g., limited-time offers), and familiarity.

URLs embedded in emails need thorough analysis to determine their legitimacy and destination. URLs can be extracted manually by hovering over links or copying them, but automated tools like Sublime Text's search function or CyberChef's "Extract URLs" feature are safer and more robust. Best practice for documentation is defanging IOCs to avoid accidental clicks.

Look for discrepancies like typosquatting (e.g., goole.com vs. g00gle.com) or suspicious subdomains abusing legitimate services. Open-source exchanges like PhishTank collect and categorise suspected malicious URLs.

The Email Attachment

Email attachments can range from benign to malicious, often carrying malware or malicious macros. Attackers frequently use email attachments as a common technique in phishing campaigns to deliver malicious payloads, which can lead to severe consequences such as ransomware attacks, deployment of remote access trojans, or theft of sensitive information. • PDFs are often used by attackers because they are widely trusted and common for business purposes (e.g., invoices, forms), and email gateways may not fully analyse embedded content. Similarly, malicious Office documents (maldocs) that contain macros are frequently exploited due to the widespread use of Office software in organisations.

File Hashes File hashes are unique "fingerprints" generated from a file's content. Even a minor change in a file results in a wildly different hash value. This makes them crucial for file integrity verification and reputation checks. Most email clients allow users to save attachments to their local system, from which hashes can be computed. Reputation services like VirusTotal can provide details on security vendors flagging the file, detection aliases, historical data, and even contact URLs/domains, IP addresses, and behavioural information (e.g., dropped files)

Opening attachments in a controlled sandbox environment (e.g., JoeSandbox, AnyRun) to observe their behaviour (process activity, registry changes, network connections, file activity) without risking your system.

Additional Resources

The phishing samples for the post were retrieved from Phishing Pot.
Learn and Test DMARC (https://www.learndmarc.com/) is an excellent visual resource to learn how email servers communicate, SPF, DKIM, and DMARC and how they work together.
A Python script that automates extracting IOCs can be found here.

Email Forensics 101