Running AI workloads can be resource-intensive, often requiring expensive GPUs or centralized cloud infrastructure. But what if you could deploy AI models on a decentralized network and run inference jobs without owning high-end hardware?

This is where Lilypad Modules come in.

A Lilypad Module is a self-contained, task-specific computational unit designed specifically to run on the Lilypad network. It encapsulates everything needed to perform a specific job, such as input handling, model execution, workflow orchestration, and output generation. AI models are the backbone of most Lilypad Modules, enabling advanced computations such as natural language processing, image generation, and video synthesis. Lilypad modules leverage these models to deliver results efficiently and at scale.

Modules are structured as Git repositories, which makes them easy to manage, share and version. This structure makes sure that developers can organize their modules in a way that adheres to Lilypad’s standards while remaining flexible for customization.

So why are Lilypad Modules important? They allow users, researchers, developers and the like to add specific functionalities to the Lilypad Network, enabling access to decentralized compute resources that are capable of running AI inference jobs without requiring a high-performance GPU. Each module, tailored for specific tasks, serves as a unique building block to address the diverse needs across AI and computational workloads.

Core characteristics of a Lilypad module

Each Lilypad module orchestrates AI workflows or computations with standardized input, processing, and output pipelines. By encapsulating all dependencies, modules enable isolated, reproducible computations, making them ideal for scaling workloads, sharing within the community or even collaborating on module development.

Lilypad modules are built around several essential components that work together to ensure functionality and versatility:

1. Core Workflow Components

These elements define how a module processes jobs and delivers results:

Input Handling: Manages data intake and preparation, adapting inputs to meet a module’s requirements.
Task Logic: Encapsulates the computational process, including AI model execution, determining how jobs are processed.
Output Generation: Formats and delivers results in a way that is actionable and accessible for users.

2. Operational Infrastructure

Modules require robust infrastructure to run efficiently and reliably:

Workflow Orchestration: Coordinates data flow and computations within a module.
Dependency Management: Specifies necessary libraries, frameworks, and runtime environments (e.g., requirements.txt or Docker).
Module Configuration: Customizes behavior through settings defined in files like lilypad_module.json.tmpl.

3. Reliability and Scalability

To handle diverse environments and workloads, modules incorporate:

Logging and Monitoring: Tracks performance and errors, aiding debugging and optimization.
Error Handling: Safeguards against invalid inputs or unexpected failures with mechanisms like try-catch blocks.
Scalability: Ensures a module runs efficiently across various environments, including GPUs, CPUs or decentralized nodes.

The most common files found in a Lilypad module include:

Dockerfile: Defines the containerized environment for a module, specifying the runtime dependencies, such as system packages, libraries, and configurations needed to run a module on nodes on the Lilypad network.
requirements.txt: Lists the Python dependencies required by a module, such as AI frameworks (e.g., TensorFlow, PyTorch) and utility libraries.
lilypad_module.json.tmpl: This template file defines a module's metadata, including its name, description, required inputs, outputs and other configurations.
Inference script: The core script (e.g., run_inference.py) that handles the execution of the AI model or task, including processing inputs, running computations and generating outputs. This is the entry point file for running modules.
Model directory: A module must execute correctly at runtime. With the model stored in a model directory, the model is bundled within the container, making it readily accessible at runtime without requiring a separate download step.

Downloading the model

Downloading a model is an essential step to ensure it can be referenced and utilized offline at runtime. In general, this process involves retrieving both the model and its tokenizer or configuration files from a model hub or repository. The exact script used to download a model will vary depending on the type of model and the library it belongs to, such as Hugging Face Transformers or TensorFlow.

In transformers, different model classes are designed for specific tasks. AutoModelForSeq2SeqLM is used for sequence-to-sequence (Seq2Seq) models, which have both an encoder and a decoder, making them suitable for tasks like translation, summarization, and text-to-text generation (e.g., T5, BART). On the other hand, causal language models (CLMs), like Falcon and GPT, use AutoModelForCausalLM and generate text in an autoregressive manner, predicting one token at a time without an encoder. Choosing the right class ensures the model functions correctly within your AI module. The structure of the script and the methods for downloading and saving the model can differ between models and libraries, so it’s important to tailor the process to the specific requirements of the model you're working with.

In the example below, the model and tokenizer are fetched using a library-specific method, such as from_pretrained() provided by the AutoTokenizer utility, which downloads the necessary files and configurations to a local directory. This directory (/model) serves as the runtime reference for the model during execution. Once downloaded, the model and tokenizer are saved locally using save_pretrained(), guaranteeing that they are available for repeated use without needing to redownload them.

from transformers import AutoTokenizer, AutoModelForCausalLM

def download_model():
    model_name = "tiiuae/falcon-7b-instruct"

    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)

    # Save the tokenizer and model
    tokenizer.save_pretrained('./model')
    model.save_pretrained('./model')

if __name__ == "__main__":
    download_model()

Run the script to begin the download. (Note: This may take a few minutes depending on the size of the model):

python download.py

Dependencies

The requirements.txt file defines the Python dependencies needed for a module to run. It ensures that all necessary libraries, frameworks, and tools are installed in a module’s runtime environment, enabling consistent execution across different nodes in the Lilypad network.

For example, in the Falcon-7B module we explore below, the following dependencies might be included in requirements.txt:

transformers==4.36.0
torch==2.1.0
numpy<2.0.0
accelerate==0.25.0
bitsandbytes>=0.41.1

These libraries support essential tasks such as loading models (transformers), running computations with hardware acceleration (torch) and defining configurations for generation workflows. However, the contents of requirements.txt will vary based on the specific requirements of your module. For instance:

A module handling natural language processing (NLP) might include transformers for pre-trained models and sentencepiece for tokenization.
A module designed for image processing could require torchvision or Pillow.
A module for custom AI workflows might include scipy, numpy, or other task-specific libraries.

Creating the Dockerfile

The Dockerfile is a fundamental component of a Lilypad module, defining the containerized environment in which the module runs. It specifies everything needed to build and execute the module, from the base operating system to the libraries, runtime dependencies, and execution commands. This allows the module to operate consistently across diverse environments such as Lilypad.

By containerizing a module, the Dockerfile encapsulates all dependencies and configurations, eliminating compatibility issues that might arise from differences in operating systems, installed packages, or hardware. It also simplifies deployment, as nodes can pull and run the prebuilt container without additional setup.

Different modules may require different configurations, though here is an example of the Dockerfile used for the Falcon 7B module below:

# Specify architecture
FROM --platform=linux/amd64 python:3.9-slim as builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && \\
    apt-get install -y --no-install-recommends \\
    build-essential \\
    && rm -rf /var/lib/apt/lists/*

# Copy and install requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages

# Create outputs directory
RUN mkdir -p /outputs
RUN chmod 777 /outputs

# Copy the inference script
COPY run_inference.py .

# Copy the download script
COPY download_module.py .

RUN python3 download_module.py

# Set outputs directory as a volume
VOLUME /app/outputs

# Run the inference script
CMD ["python", "run_inference.py"]

Building your module

The inference script is the entry point for a Lilypad module, handling inputs, executing computations, and producing outputs. Designed for diverse environments, particularly Lilypad's infrastructure, it relies on dynamic configurations, such as environment variables for user data or model directories, to remain flexible and avoid hardcoded parameters.

While this guide demonstrates one approach to structuring a text-to-text inference script, the implementation will vary depending on your model and its requirements. Some models, like those using Hugging Face, support standard libraries, while others, such as the SDXL module, require custom initialization. For more examples and approaches, visit our module examples.

One important thing to note is that Lilypad modules operate within a controlled execution environment where network access is completely restricted. This design ensures security, reproducibility and prevents unintended data leaks or external dependencies. Since modules cannot fetch external resources or communicate over the internet, any required data must be provided as inputs at runtime.

The script’s core responsibility is to encapsulate task logic in reusable functions, processing inputs and generating outputs for tasks such as paraphrasing or image generation. This modularity allows it to adapt to various models and ensures compatibility with decentralized execution. The standard and expected output is required to be in JSON format.

Let’s examine the Falcon-7B module, which takes a text input, generates a response using a locally stored model, and saves the results in JSON format:

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
import os
import json

def clean_response(text):
    # Find the start of Assistant's response
    assistant_start = text.find("Assistant: ")
    if assistant_start != -1:
        # Get everything after "Assistant: "
        response = text[assistant_start + len("Assistant: "):]
        # Find where the next "User: " starts (if it exists)
        user_start = response.find("\\nUser")
        if user_start != -1:
            # Only take the text up to the next "User: "
            response = response[:user_start]
        return response.strip()
    return text.strip()

def main():
    # Create outputs directory if it doesn't exist
    os.makedirs("outputs", exist_ok=True)

    # Get input from environment variable, or use default if not provided
    input_text = os.getenv("MODEL_INPUT", "Tell me a story about a giraffe.")

    local_path = "./local-falcon-7b-instruct"
    print(f"Loading model from {local_path}...")

    # Load tokenizer and model from local path
    tokenizer = AutoTokenizer.from_pretrained(local_path, local_files_only=True)
    # Set pad token to eos token if not set
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(
        local_path,
        torch_dtype=torch.bfloat16,
        device_map="auto",  # This will use GPU if available, otherwise CPU
        local_files_only=True,
        pad_token_id=tokenizer.pad_token_id
    )

    # Get the device that the model is on
    device = model.device

    # Set up generation config
    generation_config = GenerationConfig(
        max_new_tokens=256,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )
    model.generation_config = generation_config

    # We use the tokenizer's chat template to format each message
    messages = [
        {"role": "user", "content": input_text},  # Use the environment variable input
    ]

    input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    # Include attention mask in tokenization
    inputs = tokenizer(
        input_text, 
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=2048,
        return_attention_mask=True
    )

    # Move input tensors to the same device as the model
    inputs = {k: v.to(device) for k, v in inputs.items()}

    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"]
    )
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Clean up the response to get just the assistant's part
    clean_output = clean_response(generated_text)

    # Prepare output data
    output_data = {
        "prompt": input_text.strip(),
        "response": clean_output
    }

    print(f"Generated text: {clean_output}")
    print(f"Output data: {output_data}")

    output_path = f'/outputs/results.json'
    os.makedirs(os.path.dirname(output_path), exist_ok=True)

    # Save to JSON file
    with open(output_path, "w") as f:
        json.dump(output_data, f, indent=2)

    print(f"Results saved to {output_path}")

if __name__ == "__main__":
    main()

Testing your module locally

Once you’re happy with how your code looks for the inference script, we can build and containerize the module and start testing it by running a job.

Please note that if you’re building a module with a high-compute model, running jobs locally may fail or perform poorly if your system doesn’t have a GPU capable of handling the model's requirements. Using a GPU-enabled environment is strongly recommended for such cases.

To build the image, run the following:

docker build -t <MODULE_NAME>:<MODULE_TAG> .

Once the build has completed, to run it you will need to specify the inputs for the job, along with where the results are being stored (/outputs) and finally the module name and tag from the previous build step.

docker run -e INPUT_TEXT="Today was a good day." \\
-v $(pwd)/outputs:/outputs \\
<MODULE_NAME>:<MODULE_TAG>

If you run into any issues while testing locally, check your Docker logs. This will help you identify what part of the process is causing issues: docker logs <CONTAINER_ID>

Uploading Docker image

To make your Lilypad module accessible on the network, you'll need to upload your Docker image to a container registry, such as DockerHub.

This step is necessary because Lilypad resource providers will pull your module’s image to execute jobs. In this guide we will be using Docker Hub so please refer to the official Docker Hub guide to create your repository.

The approach for macOS differs because Mac systems typically use a different architecture (ARM64) compared to the Linux based environments (Linux/AMD64) where Lilypad modules are executed. To make the Docker image compatible with Linux resource providers, the docker buildx command is used on macOS, allowing the builder to specify the target platform using --platform linux/amd64.

For Linux: docker build -t <USERNAME>/<MODULE_NAME>:<MODULE_TAG> --push .

For macOS:

docker buildx build \\
--platform linux/amd64 \\
-t <USERNAME>/<MODULE_NAME>:<MODULE_TAG> \\
--push \\
.

Pushing your module to GitHub

The lilypad_module.json.tmpl file serves as the interface between a module and Lilypad, defining how jobs are executed and allowing users to customize inputs, outputs,and tunable parameters. This file is key to making your module functional and adaptable, because it provides the specifications needed for job execution and resource allocation. It defines the compute resources (e.g., GPU, CPU), job execution details such as the Docker image, entrypoint, and environment variables, output directories for results, resource requirements, job concurrency and timeouts.

Inputs vary from model to model, so declaring them in this file is crucial for handling user inputs for a module. The EnvironmentVariables sections centralizes all user inputs (including tunables like seeds, steps, batch size etc) and prepares them as environment variables for the containerized job environment.

Here is an example of the file for the Falcon 7B module:

{
    "machine": {
        "gpu": 1,
        "cpu": 1000,
        "ram": 8000
    },
    "job": {
        "APIVersion": "V1beta1",
        "Spec": {
            "Deal": {
                "Concurrency": 1
            },
            "Docker": {
                "Entrypoint": ["python", "/app/run_inference.py"],
                "WorkingDirectory": "/app",
                "EnvironmentVariables": [
                    {{ if .MODEL_INPUT }}"MODEL_INPUT={{ js .MODEL_INPUT }}"{{ else }}"MODEL_INPUT=Write a haiku about Lilypads"{{ end }},
                    "HF_HUB_OFFLINE=1"
                ],
                "Image": "narbs91/lilypad-falcon-7b-instruct-modulev8:latest"
            },
            "Engine": "Docker",
            "Network": {
                "Type": "None"
            },
            "Outputs": [
                {
                    "Name": "outputs",
                    "Path": "/outputs"
                }
            ],
            "PublisherSpec": {
                "Type": "ipfs"
            },
            "Resources": {
                "GPU": "1"
            },
            "Timeout": 600
        }
    }
}

Before pushing your changes to GitHub, you must change the image reference in the lilypad_module.json.tmpl file. Take note of the line in the code below for "Image": "narbs91/lilypad-falcon-7b-instruct-modulev8:latest". This points to the latest build for that image. You can specify it by using the same <USERNAME>/<MODULE_NAME>:<MODULE_TAG> structure we used when building and pushing the image.

Create a new repository and name it according to your desired module name. Push all your code to this repository. When running the module with the Lilypad CLI, you’ll need to either retrieve the commit hash or tag a specific version to use.

Testing module on Lilypad

To test your Lilypad module, you will need the following things before running the CLI command:

Have the Lilypad CLI installed
Module repo link (github.com/<USERNAME>/<REPO_NAME>)
The desired commit hash (SHA) or tag

Once you have all of the above you can open your terminal and run your command. Note that your command will look different based on the inputs you’ve declared in the lilypad_module.json.tmpl file. For example, using the Falcon 7b module that was illustrated in the previous sections would look like:

lilypad run --network <NETWORK> github.com/<USERNAME>/<REPO_NAME>:<SHA_OR_TAG> --web3-private-key <PK> -i <INPUT_NAME>='<INPUT>'

lilypad run --network demonet github.com/narbs91/lilypad-falcon-7b-instruct-module:v1.8.0 --web3-private-key b3994e7660abe5f65f729bb64163c6cd6b7d0b1a8c67881a7346e3e8c7f026f5 -i MODEL_INPUT='Write me a haiku about Lilypads'

When running a Module on DemoNet, if the job run appears to be stuck after a few minutes (sometimes it takes time for a Module to download to the RP node), cancel the job and try again. Open a ticket in the Lilypad Discord with any issues that persist.

If your module is configured correctly, you should be able to run a job successfully! The CLI will return results and store them in a directory inside of your /tmp/lilypad/data/downloaded-files/.

Conclusion and next steps

Lilypad modules serve as the foundation for enabling decentralized AI workloads, providing a standardized yet flexible framework for running diverse computational tasks on the Lilypad network. By encapsulating input handling, task logic, and output generation into self-contained units, these modules empower developers, researchers and resource providers to contribute to a scalable and collaborative ecosystem.

As next steps, explore creating modules tailored to your specific AI tasks or computational workloads. Follow best practices for optimization, such as efficient dependency management, logging and error handling. You can also contribute to the growing Lilypad ecosystem by sharing your module, collaborating with the community and refining it based on real-world use cases.

Resources

Lilypad module builder docs
awesome-lilypad modules - A collection of modules available on Lilypad
Falcon 7B Lilypad module code - The module used as an example in this guide

Help us improve! We’d love to hear about your experiences building modules. Please start a new discussion here and provide as much information as possible! Any feedback is very much appreciated.

Lilypad Module Builder Guide

Table of contents