This article documents the latest progress on "Aether," our AI-native Infrastructure as Code (IaC) tool. In the last update, we had a feature-complete CLI with a powerful 7B parameter model ("Aether-Pro"). However, its performance on standard hardware was a major concern. This week, we set out to solve that by creating a faster, more accessible "Aether-Lite" model, a journey that revealed the critical challenges of working with smaller AI models.

Phase 3 – Creating "Aether-Lite" (Progress & Challenges)

Objective

To create a smaller, faster "Aether-Lite" model that provides near-instant code generation, even on standard hardware without powerful GPUs. The goal is to make Aether accessible to all users while maintaining a high standard of code quality.

Problem 1: Balancing Speed vs. Quality

Our 7B "Pro" model was powerful, but its speed was still not ideal on a laptop with a consumer-grade GPU. We needed a model that could run in seconds, not tens of seconds.

Solution: We decided to fine-tune a much smaller, but still highly capable, base model: Google's CodeGemma 2B. This 2.7-billion parameter model is specifically designed for code tasks and has a permissive license. The plan was to run it through the same fine-tuning process using our IaC-Eval dataset to create the "Aether-Lite" version.

Problem 2: The Fine-Tuning Gauntlet

While the process was familiar, it presented new, real-world challenges that are common in AI development.

Challenges:

Gated Model Access: We first hit a 401 Unauthorized error. Unlike our previous model, CodeGemma is a "gated" model. This required us to first accept its license terms on the Hugging Face website and then add an authentication step to our Colab notebook to gain access.
Evolving Libraries: We encountered a series of TypeError and RuntimeError issues. These were caused by recent updates to the Unsloth and datasets libraries, where function arguments had changed. This is a frequent occurrence in the fast-moving AI space and required careful debugging and script updates.

Solution: After stepping through each error, we created a definitive, three-cell Colab script that cleanly separated the login, training, and GGUF conversion steps, making the process robust and repeatable.

Problem 3: The Hallucination—When Small Models Go Wrong

This was the most critical finding of the entire project so far. When we tested the newly trained "Aether-Lite" model in our local CLI, it failed spectacularly.

The Output: For a simple prompt like "Create a public S3 bucket for a website," the model started correctly but then entered a repetition loop. It generated hundreds of lines of garbled, nonsensical, and duplicated code, completely failing the task.

Cause: This is a classic case of an undertrained model. Our initial fine-tuning run of 200 steps was enough to teach the model the style of Terraform, but it was not enough for the smaller model to learn the deeper reasoning required to know when a task is complete. While much faster, smaller models are more susceptible to this "hallucination" and require more thorough training to become reliable.

Key Takeaways & The Path Forward

The Trade-off is Real: We have now quantitatively seen the trade-off between model size, inference speed, and output quality. "Aether-Lite" is fast, but our initial version is not yet smart enough.
Training Depth is Crucial: Fine-tuning is not a one-shot process. The number of training steps is critical, especially for smaller models.

Want to Follow Along?

We are on the verge of creating a truly viable "Aether-Lite" model. In the next update, we'll share the results of our deeper training run and move into the final phase: benchmarking both the "Pro" and "Lite" models and publishing our findings.

I’ll be sharing weekly progress — issues, logs, architecture, and the AI model itself. If you've solved similar problems (like automated cloud optimization or building AI developer tools), I’d love to hear your insights

Project Aether: Building an AI-Native IaC Tool From a Secure Foundation