Home/Blog/Understanding PE, ELF, and Mach-O: Executable File Format Deep Dive
Cybersecurity

Understanding PE, ELF, and Mach-O: Executable File Format Deep Dive

A comprehensive guide to the three major executable file formats - PE (Windows), ELF (Linux/Unix), and Mach-O (macOS). Learn their structure, security implications, and analysis techniques for malware research and reverse engineering.

By InventiveHQ Security Team

For cybersecurity professionals, malware analysts, and reverse engineers, understanding executable file formats is fundamental to analyzing binaries, detecting threats, and securing systems. Whether you're investigating suspicious files, developing security tools, or simply deepening your technical knowledge, mastering PE, ELF, and Mach-O formats is essential.

These three formats represent the backbone of executable code across the world's most popular operating systems: PE for Windows, ELF for Linux and Unix systems, and Mach-O for macOS. Each format has unique characteristics shaped by its platform's architecture and security model, yet they share common concepts that make cross-platform analysis possible.

In this deep dive, we'll explore the internal structure of each format, examine their security implications, and provide practical techniques for parsing and analyzing them. Whether you're responding to a security incident, conducting malware research, or building analysis tools, this guide will equip you with the knowledge to navigate these critical file formats.

PE (Portable Executable) Format - Windows

The Portable Executable (PE) format is Microsoft's standard executable format for Windows operating systems, encompassing not just .exe files but also dynamic link libraries (.dll), kernel modules (.sys), control panel applications (.cpl), and more. Based on the Common Object File Format (COFF), PE files are designed to support both 32-bit and 64-bit Windows systems, as well as UEFI environments.

PE File Structure

Every PE file begins with a DOS header, a 64-byte structure that maintains backward compatibility with MS-DOS. This header contains the famous "MZ" magic bytes (0x5A4D) and points to the actual PE header location. Following the DOS header is the DOS stub - a small MS-DOS 2.0 compatible program that simply displays "This program cannot be run in DOS mode" when executed in DOS mode.

The real PE structure begins with the PE signature ("PE\0\0"), followed by the COFF file header. This header contains critical metadata about the target architecture and file characteristics:

COFF File Header:
- Machine: 0x014c (32-bit Intel) or 0x8664 (x64)
- NumberOfSections: Count of sections in the file
- TimeDateStamp: Compilation timestamp
- Characteristics: File flags (executable, DLL, etc.)

The optional header follows, despite its name being mandatory for executable files. The magic number in this header determines whether the file is PE32 (0x010B for 32-bit) or PE32+ (0x020B for 64-bit). This header specifies the entry point address, image base address, section alignment, and addresses of important data directories.

Sections and Import/Export Tables

PE files are divided into sections, each serving a specific purpose:

  • .text: Contains executable code, mapped as execute/read-only
  • .data: Holds initialized global variables, mapped as read/write with no-execute
  • .rdata: Contains read-only data including import and export tables
  • .rsrc: Stores resources like icons, dialogs, and version information

The import table is particularly significant for security analysis. It lists all DLL names and function names that the executable requires, allowing analysts to understand external dependencies and potential malicious API calls. The import address table (IAT) is populated by the Windows loader with actual function addresses at runtime, making it a common target for hooking and code injection attacks.

Export tables list functions that a PE file (typically a DLL) makes available to other programs. Malware often uses custom exports for inter-module communication or to masquerade as legitimate system libraries.

ELF (Executable and Linkable Format) - Linux/Unix

The Executable and Linkable Format (ELF) is the standard executable format for Linux, Unix, and many embedded systems. Considered more flexible and cleaner in design than PE, ELF serves multiple purposes: executable files, shared libraries (.so), object files (.o), and core dumps.

ELF File Structure

An ELF file consists of three main components: the ELF header, program header table (describing segments), and section header table (describing sections). The ELF header is 52 bytes for 32-bit binaries and 64 bytes for 64-bit binaries:

ELF Header:
- Magic: 0x7F 'E' 'L' 'F' (identification bytes)
- Class: ELFCLASS32 (1) or ELFCLASS64 (2)
- Data: Little-endian (1) or Big-endian (2)
- Type: ET_EXEC (executable), ET_DYN (shared object)
- Machine: Architecture (x86, x86-64, ARM, etc.)
- Entry: Virtual address of entry point

Segments vs. Sections: A Critical Distinction

ELF's dual organization into segments and sections is a powerful feature that initially confuses many analysts. Segments contain runtime information, while sections contain information for linking and relocation.

Segments (described by program headers) are what the kernel uses during execution. The loader maps segments into virtual memory using mmap(). Only PT_LOAD segments actually get loaded into memory - all other segments must be mapped within a PT_LOAD segment's memory range. Common segment types include:

  • PT_LOAD: Loadable segment mapped into memory
  • PT_DYNAMIC: Dynamic linking information
  • PT_INTERP: Path to dynamic linker/loader
  • PT_GNU_STACK: Stack permissions (executable or not)

Sections (described by section headers) are used by compilers and linkers during the build process. Common sections include:

  • .text: Executable code
  • .data: Initialized writable data
  • .bss: Uninitialized data (zero-filled at runtime)
  • .got: Global Offset Table for position-independent code
  • .plt: Procedure Linkage Table for dynamic function resolution

The .got and .plt sections are critical for understanding dynamic linking in ELF files. When a program calls an external function, it jumps through the PLT, which initially redirects to the dynamic linker. After the first call, the GOT entry is updated with the actual function address, optimizing subsequent calls.

Entry Points and Execution

The ELF header's e_entry field specifies the virtual address where execution begins. For dynamically linked executables, this often points to the _start symbol, which performs initialization before calling the main() function. Analysts can use this entry point to begin static or dynamic analysis of binary behavior.

Mach-O Format - macOS

The Mach-O (Mach Object) format is Apple's executable format for macOS, iOS, and other Apple platforms. Derived from the Mach microkernel, Mach-O files power everything from command-line utilities to full applications, frameworks, and kernel extensions.

Mach-O File Structure

Every Mach-O file contains three major regions: the header, load commands, and segments with their sections. The header identifies the file and specifies its architecture:

Mach-O Header (32-bit):
- magic: 0xFEEDFACE (little-endian) or 0xCEFAEDFE (big-endian)
- cputype: CPU_TYPE_I386, CPU_TYPE_X86_64, CPU_TYPE_ARM64
- cpusubtype: Specific processor variant
- filetype: MH_EXECUTE, MH_DYLIB, MH_BUNDLE
- ncmds: Number of load commands
- sizeofcmds: Total size of load commands
- flags: MH_PIE (position-independent), MH_TWOLEVEL, etc.

The 64-bit header (mach_header_64) extends this to 32 bytes with different magic values: 0xFEEDFACF for 64-bit little-endian.

Load Commands and Segments

Following the header, load commands describe the file's structure and what the loader should do. Common load commands include:

  • LC_SEGMENT_64: Defines a 64-bit segment to be mapped
  • LC_LOAD_DYLIB: Specifies a dynamic library dependency
  • LC_MAIN: Entry point for execution (modern approach)
  • LC_CODE_SIGNATURE: Location of code signature data

Segments in Mach-O files always start on page boundaries (typically 4KB or 16KB), though sections within segments aren't necessarily page-aligned. The two essential segments are:

  • __TEXT: Read-only segment containing executable code (__text section), constant strings (__cstring), and other immutable data. Marked as read and execute permissions.
  • __DATA: Read-write segment containing global variables, initialized data, and dynamic linking structures. Marked as read and write permissions.

Universal Binaries

A unique feature of Mach-O is support for universal binaries (fat binaries), which contain multiple architectures in a single file. The file begins with a fat_header and fat_arch structures that point to individual Mach-O binaries for different architectures (Intel x86_64, Apple Silicon ARM64, etc.). This allows a single application to run natively on multiple processor types.

Code Signing Integration

Unlike PE and ELF, Mach-O integrates code signing directly into the format through the LC_CODE_SIGNATURE load command. This points to a code signature blob containing cryptographic hashes of each page in the binary, certificates, and entitlements. macOS validates these signatures before allowing execution, making it significantly harder to modify Mach-O binaries without detection.

Comparison and Common Elements

Despite their differences, PE, ELF, and Mach-O share fundamental concepts shaped by the common challenges of executable file management:

FeaturePE (Windows)ELF (Linux/Unix)Mach-O (macOS)
Magic Bytes"MZ" (DOS), "PE"0x7F 'E' 'L' 'F'0xFEEDFACE/CF
Architecture FieldMachine (COFF header)Machine (ELF header)cputype
Code Section.text.text__TEXT/__text
Data Section.data.data__DATA/__data
Dynamic LinkingImport table + IAT.got + .pltdyld info
Entry PointAddressOfEntryPointe_entryLC_MAIN/LC_UNIXTHREAD
Multi-arch SupportSeparate binariesSeparate binariesUniversal/Fat binaries

All three formats separate code from data for security (DEP/NX protection), support position-independent code for ASLR, and provide mechanisms for dynamic linking to reduce memory footprint and enable code reuse.

Practical Analysis Techniques

Understanding the theory is essential, but practical analysis requires the right tools and workflows. Here's how to parse and examine these formats effectively.

Command-Line Tools

Each platform provides native utilities for examining its executable format:

Windows PE Analysis:

  • dumpbin: Microsoft's PE analysis tool (part of Visual Studio)
    dumpbin /headers program.exe
    dumpbin /imports program.exe
    

When examining the .text section of PE files, you'll encounter raw machine code opcodes. Our machine code disassembler tool can help you quickly decode these bytes into readable assembly instructions without installing heavyweight analysis software.

Linux ELF Analysis:

  • readelf: Comprehensive ELF file inspector
    readelf -h binary          # Display ELF header
    readelf -l binary          # Show program headers (segments)
    readelf -S binary          # List sections
    

macOS Mach-O Analysis:

  • otool: Object file display tool
    otool -h binary            # Display Mach-O header
    otool -l binary            # List load commands
    

Python Libraries for Cross-Platform Analysis

Python provides powerful libraries for programmatic file format analysis:

pefile for PE files:

import pefile

pe = pefile.PE('malware.exe')
print(f"Entry point: {hex(pe.OPTIONAL_HEADER.AddressOfEntryPoint)}")
print(f"Image base: {hex(pe.OPTIONAL_HEADER.ImageBase)}")

# Enumerate imported DLLs and functions
for entry in pe.DIRECTORY_ENTRY_IMPORT:
    print(f"\n{entry.dll.decode()}")
    for imp in entry.imports:
        print(f"  - {imp.name.decode() if imp.name else 'Ordinal ' + str(imp.ordinal)}")

pyelftools for ELF files:

from elftools.elf.elffile import ELFFile

with open('binary', 'rb') as f:
    elf = ELFFile(f)
    print(f"Architecture: {elf.get_machine_arch()}")
    print(f"Entry point: {hex(elf.header['e_entry'])}")

    # List segments
    for segment in elf.iter_segments():
        print(f"{segment['p_type']:12s} {hex(segment['p_vaddr'])}")

For a comprehensive workflow, analysts should combine static examination with dynamic analysis using debuggers (WinDbg, GDB, LLDB) and disassemblers. Tools like our machine-code-disassembler can help visualize instruction-level details during reverse engineering.

Security Considerations

Executable file formats are both critical infrastructure and attack surface. Understanding their security implications is essential for defending systems and analyzing threats.

Obfuscation and Packing Techniques

Malware authors extensively use packing - compressing or encrypting executables - to evade signature-based detection. Packed files modify the file signature, making traditional antivirus detection ineffective. Analysis of 24,000 PE files revealed that packing remains widely used in modern malware.

Advanced obfuscation techniques include:

  • Section manipulation: Encrypting code sections and decrypting at runtime
  • Import table obfuscation: Hiding API calls by dynamically resolving function addresses
  • GOT/PLT manipulation: In ELF files, modifying Global Offset Tables to redirect function calls
  • Stripped binaries: Removing symbol tables to hinder reverse engineering

According to MITRE ATT&CK T1027, obfuscated files and information represent a persistent threat technique across all platforms, with 23.6% of observed attacks in 2024 classified as zero-day exploits that often rely on file format manipulation.

Detection and Analysis Strategies

Security professionals should employ layered analysis approaches:

  1. Static analysis: Examine file headers, sections, imports without execution
  2. Heuristic scanning: Look for suspicious patterns (unusual entry points, compressed sections, anti-analysis code)
  3. Dynamic analysis: Execute in sandboxed environments to observe runtime behavior
  4. Memory forensics: Analyze unpacked code in memory after deobfuscation

Tools like Deep File Inspection (DFI) use format-specific parsers (pefile for PE, readelf for ELF) to analyze headers, sections, and imported libraries, providing a foundation for detecting anomalies and potential threats.

Conclusion

Mastering PE, ELF, and Mach-O executable file formats is a foundational skill for cybersecurity professionals. Whether you're analyzing malware, reverse engineering software, developing security tools, or investigating incidents, deep knowledge of these formats enables you to see beyond the surface of binary files.

Each format reflects the design philosophy of its platform - PE's Windows compatibility layers, ELF's clean segment/section separation, and Mach-O's integrated code signing. Yet all three share common goals: efficient code loading, dynamic linking, and increasingly, security features like ASLR and DEP.

As threat actors continue to develop sophisticated evasion techniques, understanding these formats at a structural level becomes ever more critical. The tools and techniques covered in this guide provide a starting point, but true mastery comes from hands-on analysis of real-world binaries.

Continue your learning by examining benign system binaries with the tools discussed, then progress to analyzing malware samples in isolated environments. The ability to read and understand executable file formats will serve as a force multiplier throughout your cybersecurity career.


Sources

Related Reading

Need Expert Cybersecurity Guidance?

Our team of security experts is ready to help protect your business from evolving threats.