Phase 2 – Pivoting for Progress: From Manual Data to a Fine-Tuned AI Model

Objective:

With a solid validation workflow, the next goal was to scale up our dataset and train the AI. However, we quickly realized that manually creating hundreds of high-quality examples, even with our script, was a major bottleneck. This led to a critical strategic pivot.

Problem 1: The Dataset Bottleneck:

Manually authoring hundreds of diverse and secure IaC examples would take weeks or months. This wasn't just a time issue; it also risked creating a dataset that wasn't as comprehensive as established academic benchmarks.

Solution: Adopt an Existing Benchmark

A research dive uncovered IaC-Eval, a peer-reviewed benchmark from NeurIPS 2024 containing 458 human-curated Terraform scenarios. This was a game-changer. We made the strategic decision to abandon our manual CloudFormation dataset and switch to Terraform to leverage this incredible resource. This instantly solved our dataset problem and aligned our project with cutting-edge research.

The New Workflow:

We wrote a new script, prepare_dataset.py, to download the IaC-Eval dataset directly from the Hugging Face Hub and automatically format it into the data.json1 file required for fine-tuning.

Problem 2: Fine-Tuning a 7B Model on a Free Tier:

Fine-tuning a 7-billion-parameter model requires a powerful GPU with a lot of memory. Our initial attempts in Google Colab using standard libraries crashed the session due to running out of RAM.

Solution: Unsloth for Memory-Efficient Fine-Tuning

We switched to Unsloth, a library specifically designed for high-performance, memory-efficient fine-tuning. It allowed us to load, fine-tune, and save the codellama-7b-hf model on the IaC-Eval dataset using a standard free Colab T4 GPU, a task that was impossible with the default tools.

The Result:

After a successful training run, we converted the final model to the efficient GGUF format (aether-v1-q4_k_m.gguf) and uploaded it to our new Hugging Face Hub repository, making it publicly available.

Problem 3: The 15-Minute CPU Performance Crawl:

With our custom model built, we integrated it into our local aether CLI tool. The ask command worked, but it was painfully slow, taking over 15 minutes to generate code on a laptop CPU. For the tool to be usable, it needed to run in seconds.

Cause:

The model was running entirely on the CPU. We needed to enable GPU acceleration, which required re-compiling the llama-cpp-python library with NVIDIA CUDA support inside WSL.

The Debugging Journey:

Error 1: Missing CUDA Toolkit. The initial build failed because the core NVIDIA development kit wasn't installed in WSL.
Error 2: Driver Mismatch. After installing the toolkit, a new error appeared: CUDA driver version is insufficient for CUDA runtime version. The NVIDIA driver on the host Windows OS was too old for the new CUDA Toolkit inside WSL.

Fix (The Final Performance Breakthrough):

The solution was to update the NVIDIA Studio Driver on the host Windows machine to the latest version. This resolved the incompatibility and allowed the GPU-enabled library to compile.

The Result:

After updating the main.py script to offload 35 layers to the GPU, the ask command's performance went from 15+ minutes to just 8 seconds.

What's Working So Far:

A custom AI model, fine-tuned on a 458-scenario benchmark, is now live on Hugging Face Hub.
The aether CLI is fully functional with setup, ask, validate, and apply commands.
GPU acceleration is working, providing near-instant code generation.
The innovative "Auto-Fix" feature has been implemented, allowing the tool to correct its own deployment errors.

Want to Follow Along?

The core development is now complete! The next and final phase is to benchmark Aether against the original IaC-Eval research paper and publish our findings. Stay tuned for the results.

Project Aether: Building an AI-Native IaC Tool From a Secure Foundation