Malware Analysis: Compiled Python Executables
This article explains the process of decompiling malicious Python executables using tools like Pyinstxractor-ng and Decompyle++. It covers the various stages of the Pyinstaller compilation process, from analysis and collection to bytecode compilation, packaging, and bootloader creation, and concludes with creating a YARA rule for detecting PyInstaller executables and a step-by-step guide to decompiling a ransomware sample in a malware analysis environment. This knowledge is crucial for malware analysts seeking to understand and counter Python-based threats.
Introduction
If you're exploring Python-compiled malware, you've probably come across PyInstaller. This tool packages Python applications into standalone executables, making it popular among malware creators who want to distribute their malicious scripts without needing the victims to have Python installed. By bundling all necessary files and dependencies into one executable, PyInstaller makes distribution easier but also creates a significant challenge for those trying to analyze and understand the malicious code.
In this blog article, I'll explain the compilation process of malicious Python executables using PyInstaller and guide you through the steps to decompile them. This is an important skill that, in my opinion, every Malware Analyst should learn. Understanding how to reverse-engineer these executables can provide valuable insights into the tactics, techniques, and procedures (TTPs) used by adversaries.
My analysis environment will be a FlareVM virtual machine, a specialized Windows-based VM preloaded with tools for malware analysis and reverse engineering. Along with FlareVM, I will use a few specialized tools that are effective for decompiling PyInstaller executables. These tools will convert the compiled Python files back into readable code, allowing us to dissect and understand the malicious behavior within. By the end of this article, you will understand how to approach the decompilation process, identify key indicators of compromise, and gain deeper insights into the inner workings of Python-based malware.
Compilation Process
The PyInstaller compilation process involves several key steps crucial for creating the final executable. Let's break them down in detail.
Analysis Phase
The first step in the PyInstaller compilation process is the analysis phase. In this phase, PyInstaller examines the main Python script you provided and identifies all the imports and dependencies needed for it to run. It uses Python's built-in AST (Abstract Syntax Tree) module to parse the script and understand its structure.
PyInstaller, along with the AST, finds the absolute paths of these dependencies based on your Python environment and sys.path. This ensures all necessary files are accurately located. It also handles more complex import scenarios, like conditional imports that only happen under certain conditions, dynamic imports determined at runtime, and imports inside functions or classes.
By the end of the analysis phase, PyInstaller has built a detailed dependency graph. This graph shows how all the modules and packages are connected, ensuring no important part is missed. This mapping is crucial for the next stages of the compilation process, as it sets the stage for bundling all the required files into the final executable.
Collection Phase
Once PyInstaller identifies all the dependencies, it moves on to the collection phase. Here, PyInstaller gathers everything it needs, such as Python modules, libraries, and other resources, into a temporary directory. This directory mirrors your original project, ensuring all files are in the correct locations.
PyInstaller doesn't just collect Python files; it also includes compiled C extensions, like .pyd files on Windows or .so files on Unix-like systems. Additionally, it gathers data files and any extra resources you specify with command-line options like --add-data or --add-binary. So, if your app needs images, config files, or other non-Python items, PyInstaller ensures they're included in the final package. This way, you get a complete bundle with everything your app needs to run smoothly on the target system. This thorough collection process is essential for creating a standalone executable that works independently of your original development setup.
Bytecode Compilation
Alright, let's talk about bytecode compilation. In this phase, PyInstaller takes all the Python files it gathered and compiles them into bytecode (.pyc files) using the py_compile module. This step is very important for a few reasons:
Speeding Up Execution: By pre-compiling the Python code into bytecode, execution time improves significantly. The Python interpreter can run the bytecode directly without needing to parse and compile the source code on the fly.
Obfuscation: Compiling to bytecode makes it harder for anyone to reverse-engineer the original source code. It's not foolproof, but it adds a layer of security by hiding the human-readable code.
Compatibility: This process ensures the bytecode works with the target Python interpreter version. This is crucial when deploying apps across different environments with various Python versions.
The bytecode files are then neatly organized into a separate directory within the temporary build directory. This setup helps PyInstaller manage different optimization levels and Python versions more smoothly. For example, it can handle cases where some modules need specific optimization flags or when different parts of the app are meant to run on different Python interpreter versions.
Additionally, PyInstaller can include any necessary metadata alongside the bytecode files. This metadata can contain information about the original source files, compilation options, and dependencies, which can be useful for debugging and further optimization.
By the end of the bytecode compilation phase, PyInstaller has a well-organized collection of bytecode files and associated metadata, all ready for the next steps in the build process. This meticulous organization ensures that the final executable will run smoothly and efficiently on the target system.
Packaging
Once everything is set, PyInstaller moves on to the packaging phase. This part is crucial because it involves creating a custom archive format called the CArchive (Container Archive). Think of it as a compressed zip file that includes the compiled Python bytecode, any compiled C extensions (.pyd files on Windows or .so files on Unix-like systems), data files, and other resources you added with options like --add-data or --add-binary. This way, PyInstaller ensures the final executable is self-contained and can run without needing the original development setup.
The CArchive format is well-organized and includes metadata about its contents, making it easy to extract and load what you need when you run the app. This metadata helps the app quickly find and use the required files without any extra hassle.
To compress the archive, PyInstaller uses the zlib compression algorithm. This method is excellent at reducing the archive size, which is beneficial for distribution and deployment. Smaller archives mean faster downloads and less storage space needed on the target systems.
One cool thing about the CArchive is its unique magic number: "MEI\014\013\012\013\016" (hexadecimal: 4D 45 49 0E 0D 0A 0D 0E). This magic number acts like a signature for PyInstaller archives, making them easy to identify. This can be very useful when setting up detection rules for security or debugging.
In short, the packaging phase is where PyInstaller bundles everything into a single, compressed archive that's ready to share. This step is crucial for ensuring the final product is both efficient and easy to deploy.
Bootloader Creation
The bootloader is a key part of PyInstaller-generated executables. It's a small program written in C that starts the packaged application. The bootloader's main tasks include:
Extracting the CArchive: The bootloader first extracts the CArchive, which contains all the compiled bytecode, dependencies, and resources, to a temporary location. This step ensures that everything the application needs is available.
Setting up the Python interpreter environment: Once the CArchive is extracted, the bootloader sets up the Python interpreter environment. This involves configuring paths and environment variables so the interpreter can find and run the necessary bytecode and modules.
Finding and running the main script's bytecode: After setting up the environment, the bootloader locates the main script's bytecode within the CArchive and executes it. This step starts the application.
Handling cleanup after execution: Once the application has finished running, the bootloader cleans up any temporary files and resources created during the extraction and execution process. This ensures no unnecessary files are left behind.
PyInstaller creates platform-specific bootloaders for different operating systems. On Windows, the bootloader is in PE format, while on Linux, it uses ELF. This design ensures compatibility across various environments.
The bootloader handles different runtime situations, like frozen importlib bootstrap and multiprocessing support, making PyInstaller executables versatile for multi-platform deployment. However, this versatility can also be exploited by threat actors.
Overall, the bootloader's ability to manage extraction, environment setup, execution, and cleanup makes PyInstaller executables robust and reliable across diverse operating systems.
Executable Generation
In the final step, PyInstaller carefully bundles the bootloader and the CArchive into a single executable file. This step is especially important for Windows portable executables because the CArchive is embedded as a resource section within the PE (Portable Executable) structure, ensuring smooth integration.
The final executable includes several key components:
The Bootloader Code: This is placed at the entry point of the executable. The bootloader initializes the environment and starts the bundled Python application.
The Compressed CArchive: This archive contains all the Python bytecode and dependencies needed by the application. By compressing these files, PyInstaller keeps the executable as small as possible while still including everything required to run the application.
Necessary Metadata: This includes version information, manifest data, and other essential details. The metadata ensures that the executable can be properly identified and executed within the target operating system environment.
The end result is a standalone package that can run on systems without needing Python installed. This is great for developers who want to share their apps with users who might not have Python on their machines, but it is also a useful tool for malware authors.
Tools
Pyinstxractor-ng
- Pyinstxractor-ng is a handy tool that helps you pull apart files created by PyInstaller. It's able to break the executable down into its original pieces. This includes the Python scripts, libraries, and other stuff bundled inside. This tool is a lifesaver for reversing the packaging process and getting a peek into the inner workings of the compiled malware.
Decompyle++
- Decompyle++ is a cool tool that turns compiled Python byte-code back into readable Python source code. After you've used Pyinstxractor-ng to extract the contents of a PyInstaller file, you'll usually end up with Python byte-code that looks like gibberish. That's where Decompyle++ comes in. It takes that byte-code and decompiles it into Python code that you can actually read to understand the malware's logic, what it does, and the threats it poses. This deep dive is key to coming up with effective countermeasures.
Decompilation Process
Decompiling a Python executable involves several important steps. Here’s a detailed breakdown of the process:
Bytecode Parsing
Decompyle++ begins by reading the compiled .pyc
or .pyo
file. It extracts the bytecode instructions embedded within these files. This step is crucial as it lays the foundation for understanding the executable's structure.
Instruction Analysis
Once the bytecode is parsed, Decompyle++ analyzes each instruction. This involves identifying the purpose of each bytecode command and how it contributes to the overall functionality of the program. This step helps in mapping out the logic flow of the code.
Control Flow Reconstruction
In this phase, Decompyle++ reconstructs the control flow of the program. It identifies loops, conditionals, and function calls, piecing together how the program executes. This step is vital for understanding the logical structure and flow of the code.
AST Generation
Decompyle++ then creates an Abstract Syntax Tree (AST) from the analyzed bytecode. The AST is a hierarchical representation of the source code's structure, capturing the syntax and semantics of the program. This step translates the low-level bytecode into a more understandable format.
Source Code Generation
Finally, Decompyle++ converts the AST into human-readable Python source code. This step involves generating the actual Python code that can be read and analyzed by developers. The output is a near-original version of the source code before it was compiled.
Detection
Now that both the compilation and decompilation processes have been explained, it is time to select an example from my malware collection.
Yeah, never mind. Manually searching through gigabytes of samples doesn't sound like a fun time. There's an easier way to find what we need with better accuracy. Let's make a Yara rule for the task.
Yara Rule Creation
To create a good YARA rule, we need to identify the unique signs and clues associated with PyInstaller executables. I mentioned some of these earlier.
PyInstaller-compiled executables have a special CArchive (Container Archive) format with a unique magic number: { 4D 45 49 0E 0D 0A 0D 0E }. This magic number is a clear indicator of PyInstaller. Additionally, PyInstaller executables often contain specific strings like "pyi" and "pyiboot," which we can use as extra clues.
Now, let's create a YARA rule to detect PyInstaller executables. Based on the indicators we discussed, here's what I came up with:
rule compiled_pyinstaller_executable
{
meta:
description = "Detects executables compiled with PyInstaller"
author = "Dru Banks @S0KRAT3Z"
date = "2024-08-07"
version = "1.1"
strings:
// PyInstaller CArchive magic number
$pyinstaller_magic = { 4D 45 49 0E 0D 0A 0D 0E }
//pyinstaller signature
$pyinstaller_indicator1 = "pyi" ascii
$pyinstaller_indicator2 = "pyiboot" ascii
condition:
// Detect PyInstaller CArchive at the beginning of the file
$pyinstaller_magic at 0 or
// Additional indicators for PyInstaller
(all of ($pyinstaller_indicator*))
}
As you can see, this rule works like a charm. This is where the fun starts. 😎
Analysis
Out of the detections, I noticed an interesting ransomware sample, so let's go with that. First, we need to decompress and extract the Python bytecode from the PE using Pyinstxractor-ng. A really handy feature is that the tool automatically creates a directory and suggests possible entry points for analysis. For example, the ransomware has a file called "GUImain.pyc".
Inspecting Directory Contents
I'll definitely check GUImain.pyc later, but first I want to show what the extracted directory looks like. They typically always contain the following:
Python bytecode files (
.pyc
) for the bundled Python modules.Compiled C extensions (
.pyd
files on Windows or.so
files on Unix-like systems) used by the application.Data files and resources that were bundled with the application using PyInstaller's
--add-data
ordatas
option in the.spec
file.Any additional files and directories specified by the application's dependencies. The PYZ-00.pyz_extracted directory is among the first places you should search to understand the imports and dependencies of the executable.
One important file in this directory that you'll often come across is struct.pyc. While it might seem like just another Python module, its presence can reveal crucial insights about the malware's capabilities and intentions. Struct.pyc is the compiled version of Python's struct module, a core component used for converting between Python values and C structs represented as Python byte objects. Its presence in malware indicates that the code is likely performing low-level data manipulation, obfuscation, network communication, or binary data processing.
Decompiling Main Logic
After extracting and inspecting the internals of the executable, our next step is to decompile the Python bytecode found in GUImain.pyc using Decompyle++.
By default, Decompyle++ outputs the decompiled results to the terminal. However, for thorough analysis, I prefer redirecting these results to a text file. This approach makes it easier to review and annotate the code. Now, let's proceed to inspect the GUImain.pyc file.
Upon inspecting the text file, we observe the decompiled result, revealing the actual source code of the malware. By examining this code, we can identify how the malware operates, what data it targets, and how it interacts with the system. As shown below, we can see the plaintext version of the ransomware's encryption and decryption routines.
Side note: What's really interesting about the sample we picked is that the decryption key for the ransomware is actually hard-coded! We're definitely not dealing with an APT here, folks.
Conclusion
In conclusion, knowing how to decompile Python-based malware, especially ones compiled with PyInstaller, is a must for malware analysts. By getting a grip on both the compilation and decompilation stages and using tools like Pyinstxractor-ng and Decompyle++, we can reverse-engineer these files to uncover hidden malicious code and extract IOCs. Creating YARA rules to spot PyInstaller executables also helps us identify potential threats. This approach boosts our understanding of attacker tactics and strengthens our defenses.
Subscribe to my newsletter
Read articles from Dru Banks directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Dru Banks
Dru Banks
I am a cybersecurity professional with a deep passion for offensive security, threat intelligence, reverse engineering, and malware analysis. I believe that 'knowledge is power,' and that at every opportunity, knowledge should be shared. My blog serves that purpose and will be a public source for my studies, including write-ups on various topics.