Notes: Local LLM (Ollama) Co-pilot Alternative


Over the holidays and in the first week of the 2025, I muddled through setting up an alternative to Co-pilot using Ollama on my Mac and Continue as the UI in VSCode. I was thinking about writing an article about it and posting it here. However, this article on the IBM Developer blog: Build a local AI co-pilot using IBM Granite Code, Ollama, and Continue is very thorough. I’d say they cover pretty much everything, though I cannot vouch for their models, simply because I have not yet tried them (see update below). Here are some of the settings from my Continue config I’m using right now:
{
"models": [
{
"title": "Llama3 Chat",
"model": "llama3-8b",
"provider": "ollama"
},
{
"model": "AUTODETECT",
"title": "Autodetect",
"provider": "ollama"
},
{
"model": "StarCoder2:15b",
"title": "StarCoder2:15b",
"provider": "ollama"
}
],
"tabAutocompleteModel": {
"title": "StarCoder2:3b",
"model": "StarCoder2:3b",
"provider": "ollama"
},
"embeddingsProvider": {
"title": "Nomic Embed Text",
"provider": "ollama",
"model": "nomic-embed-text"
}
}
I started on this journey of discovery because of a confluence of events: 1) GitHub made a serious pitch to get me to pay for CoPilot, sending me lots of e-mails about an “expiring coupon”. I did some research and figured out that “expiring coupon” meant they wanted me to start paying for CoPilot. Then I read this post on Simon Willison’s blog: I can now run a GPT-4 class model on my laptop. I was inspired to try my hand at the same but discovered that the model Willison was using was still way too big for my Mac. Then GitHub started pinging me that “CoPilot is free.” So I gave up on local AI, and turned CoPilot back on… until I figured out that “free” for CoPilot meant “free for a limited time.” SO, I gave it a week’s worth of discovery time, and I found some articles on models that would be better suited for a Co-pilot-esque LLM-powered coding assistant. After a bunch of experimentation, I settled on using Llama3-8b for my chat model, and StarCoder2:3b for my autocomplete model. This has worked out fairly well. It feels pretty much like working with CoPilot. Which isn’t as much of a compliment as you’d imagine, because CoPilot (and autocomplete in general) can be pretty annoying at times.
Future experimentation
I plan to try out the IBM models to see if they work any better. I’ll update this article with my thoughts on that. I also would like to experiment with writing system prompts using Continue’s actions feature.
UPDATE: the IBM models are really nice
First, I must send a shout-out to my colleague at UCSF, Eric Guerin, who first brought the IBM blog posting to my attention. I’ve tried out the IBM models, and they feel much faster than the configuration I was using before (with StarCoder and Nomic Embed Text). I invite you to play around with different models and see what fits best for your work. But, in the end, I think I’ll stick with the IBM models.
Subscribe to my newsletter
Read articles from Hardy Pottinger directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Hardy Pottinger
Hardy Pottinger
DSpace Committer, since 2011, and works for the California Digital Library as a Publishing Systems Developer. Currently learning a lot about many different tech stacks, but especially enjoying working with the Janeway scholarly publishing platform. Prior to working for CDL, worked for UCLA Library as a developer on both their services team (back-end stuff), and their applications team (front-end stuff). Keenly interested in DevOps tools, technologies, and cultures. Loves working at home, and will talk your ear off about it.