Run an LLM on your computer with Ollama, Ngrok, and Cursor


Here’s a quick how-to guide for running a large language model (LLM) locally using three tools: Ollama, Ngrok, and Cursor IDE. This setup is perfect if you want to test AI features without relying on OpenAI’s cloud, or if you're just nerding out and want more control.
Tools We'll Be Using
Cursor (🔗 https://www.cursor.com)
Cursor is a developer-focused IDE with AI built right in. You can switch between different models and agents, all powered by your OpenAI API key (unless you’re paying Cursor directly to use theirs).
Some standout features:
Understands your codebase and adds context to prompts
Runs commands for you (with your approval)
Spot bugs and offers fixes
Autocompletes like a champ
Built-in smart reviews
It’s VS Code on steroids, with some real AI muscle.
Ngrok (🔗 https://ngrok.com)
Ngrok is an API Gateway service that handles everything related to your API so that you focus on your business rules without worrying about secure connections and infrastructure. While it offers a full suite of infrastructure tools, we’ll focus on Webhook testing, which gives us a public URL to test a local server.
Ollama (🔗 https://ollama.com)
Ollama is a fantastic open-source tool that lets you run LLMs locally. You can pull supported models, serve them through a local API, and even customize or train your own.
Browse available models at ollama.com/library
Let’s Get Started
1. Pull LLMs with Ollama
Once everything’s installed, the first step is to pick and download an LLM. Just a heads-up: LLMs are resource-hungry. Bigger models need more RAM and CPU, so make sure your machine can handle it.
To install a model:
ollama pull <model>
For example:
ollama pull gemma3:4b
You can check installed models with:
2. Set Up Ngrok
Create an Ngrok account and enable Multi-Factor Auth (highly recommended). Then, install the CLI and connect it with your account by running:
ngrok config add-authtoken <your-token>
The command above will set up your computer to connect securely with the Ngrok Web platform, but it won’t secure any endpoint you later expose.
🚧 Heads up: Your Ngrok endpoint exposes your local server to the internet. Even on the free plan, Ngrok gives you options like auth, IP filtering, and rate limits. Worth setting up later!
3. Run the Ollama Server with Ngrok
Ollama can serve your model through a local API. Start it like this:
OLLAMA_ORIGINS=* ollama serve
By default, it runs on port 11434
.
Now expose that port with Ngrok:
ngrok http 11434 --host-header="localhost:11434"
Ngrok will generate a public HTTPS URL, called “endpoint” in the ngrok language, that maps to your local server.
We now have a public URL to connect us from the internet to our local Ollama server, It’s time to tell Cursor how to use this URL to hit our Ollama server locally.
4. Connect Ollama to Cursor IDE
Add Your Model
In Cursor, go to Settings > Cursor Settings > Models.
You’ll see some defaults, but they likely won’t match your locally installed model names. Hit + Add Model
and enter the exact model name from ollama list
.
Override OpenAI API Base URL
Scroll to the OpenAI API Key section and expand Override OpenAI Base URL.
Paste your Ngrok HTTPS URL here. You can put anything in the API key field, it’s not validated in this case since our API doesn’t have any authentication method enabled yet.
5. Test the Setup
To make sure everything works, you should:
✅ Be running the Ollama server
✅ Have Ngrok expose port 11434
✅ Have your LLM registered in Cursor
✅ Set the Ngrok URL in the API override
Now, toggle the Enable OpenAI API Key slider in Cursor’s settings. It’ll ping your API to confirm the connection.
If there's an error, Cursor will pop up the response — super useful for debugging.
Once it connects, you’re good to go! Start chatting with your local LLM in the left panel.
Make sure to:
Switch prompt mode to
Ask
orManual
Disable automatic model selection
Manually pick one of your local models
Takeaways
Congrats — you’ve got a local LLM running and integrated into your dev workflow.
That said, performance varies depending on your hardware. Try a smaller model (1b or 3b parameters) if your machine struggles. Models like 7b and up usually need at least 16GB of RAM.
And don’t be afraid to experiment. Pull multiple models and see which works best for your use case.
Also, remember: you don’t need Cursor or Ngrok specifically. All you need is:
Ollama to run the model locally
Some way to expose it via a fixed URL (Ngrok, localtunnel, etc.)
An IDE that lets you set a custom OpenAI API base URL
Next Steps
🔄 Automate it all
Right now, everything is manual. Consider scripting the startup, spinning up the server, and tunneling with one command.🔐 Secure your endpoint
Even if your CLI is authenticated, your public API is wide open. Use Ngrok’s free auth and rate limits to avoid abuse. But then set the API Key in Cursor to successfully connect it to the API.🔗 Claim a static Ngrok URL
Ngrok lets you claim a static domain for free. This means you won't have to update Cursor every time you restart the tunnel. Just be sure to lock it down properly if you go that route.
Subscribe to my newsletter
Read articles from Rodrigo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Rodrigo
Rodrigo
I'm a Senior Software Engineer with 8+ years of experience. I love to apply design patterns, write clean code and do TDD while testing I'm interested in growing every day as a person and developer