GSoC Weeks 9 and 10: Enhancing Code with Refactoring

Anshuman SahooAnshuman Sahoo
5 min read

These weeks taught me a humbling lesson that I'll carry throughout my career - I thought writing software from scratch was the hardest thing, but now I think maintaining and reading code that's not written by you is much harder and takes significantly more time. Reading code that others wrote requires a different mindset than writing new code. You need patience, curiosity, and the humility to admit when you don't understand something.

That's why the refactoring work has been taking so much time, but it's also why it's been so valuable for my growth as a developer.

Challenges Encountered

When you're building a plugin system from scratch, you have the luxury of clean interfaces and perfect knowledge of every component. But when you're integrating that system with existing code, you enter a different world entirely.

The Go Map Iteration Surprise

I discovered there was a race condition where the VM type would resolve differently depending on whether it was called during instance creation or by the host agent. I didn’t know that Go’s map iteration is non-deterministic. This meant that repeatedly iterating over the same map could yield different results. This was a problem for a plugin system where driver priority mattered.

m := map[string]int{
    "apple":  1,
    "banana": 2,
    "cherry": 3,
    "date":   4,
}

for key, value := range m {
    fmt.Printf("%s: %d\n", key, value)
}

// First Iteration    Second Iteration(might produce)
// cherry: 3             date: 4
// date: 4               apple: 1
// apple: 1              banana: 2
// banana: 2             cherry: 3

One of the most significant architectural changes during this period was also implementing of a completely new VM type resolution system. Instead of Lima trying to "guess" the right driver based on user configuration, we now have a deterministic, OS-specific default system. Previously, Lima would analyse user configuration and try to pick a compatible driver automatically. Now, Lima picks the best default driver for your OS and explicitly tells you if your configuration is compatible or not.

Test Environment Challenges

Working with existing tests presented its own set of challenges. The tests were written with assumptions about the old architecture, and my plugin system changes broke many of these assumptions.

Driver Interface Evolution

As I integrated the plugin system deeper into Lima's core with a mindset of modularity, the driver interface continued to evolve. What started as a simple interface grew more sophisticated to handle real-world requirements(but currently, we are still uncertain of its final form, which will become clear after adding more drivers to Lima). Also, many thanks to my mentor, Anders, for testing the driver plugin system with new drivers(AC/DC) and giving his input. Seeing these proof-of-concept drivers validates that the plugin system architecture is working as intended:

type Driver interface {
    Create() error          // Renamed from Initialize() for clarity
    Delete() error          // New function for vm cleanup
    InspectStatus() string  // New function for inspection
    AcceptConfig() error    // New function to validate config by the driver
    FillConfig() error      // New Function to fill-in driver default values
    SSHAddress() string     // New function for dynamic addressing
}

One elegant solution that emerged was the use of boolean feature flags to handle driver-specific capabilities:

type DriverFeatures struct {
    DynamicSSHAddress    bool // Some drivers can change SSH ports dynamically
    SkipSocketForwarding bool // Some drivers handle forwarding internally
}

This approach allows the core Lima system to adapt its behavior based on driver capabilities without hardcoding driver-specific logic everywhere.

The Server Shutdown Bug Hunt

Perhaps the most challenging debugging session involved a server shutdown bug that took considerable time to track down:

Failed to check guest agent forwarding: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /tmp/lima-driver-vz-1917.sock: connect: no such file or directory"

The issue was subtle: when stopping a QEMU instance, it shared the start context, so limactl stop's SIGKILL would leave a waiter untriggered. The fix required giving QEMU its own context and using a buffered wait channel:

func (l *LimaQemuDriver) Start() error {
    ctx := context.Background() // Separate context for QEMU
    qWaitCh := make(chan error, 1) // Buffered channel

    go func() {
        qWaitCh <- qCmd.Wait()
        close(qWaitCh) // Ensure cleanup
    }()

    return nil
}

This bug taught me about the importance of proper context management in concurrent Go programs. But I guess there is a lot to learn about contexts in Go.

Preparing for the Finish Line

As I write this, we're very close to the GSoC final submissions. I'm preparing the final PR for merge. The plugin system is now deeply integrated with Lima's core functionality, but there's still work to be done in polishing edge cases and ensuring backwards compatibility. The refactoring work took significantly longer than initially anticipated, which meant I wasn't able to implement some of the exciting features we discussed in earlier blog posts:

  • Implement the libkrun driver

  • Accelerate the ga.sock proxy connection

  • Documenting how to implement a new driver in Lima.

However, these were always marked as optional add-ons to the core plugin system functionality. The fundamental goal of GSoC was to create a working plugin architecture that enables external drivers, and that goal has been achieved successfully. The libkrun driver addition and ga.sock optimisations remain excellent candidates for future contributions - either by me or by other community members who can now use the plugin system we've built.

What's left:

  • Final PR(over 2,200 additions and 1,500 deletions, touching nearly every part of Lima's core architecture), polish, split, and merge

  • GitHub gist summarising the entire summer's work

Next update will be the final GSoC summary - the culmination of this incredible learning journey!

0
Subscribe to my newsletter

Read articles from Anshuman Sahoo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anshuman Sahoo
Anshuman Sahoo