Run LLMs Locally with Ollama, Ngrok & Cursor IDE

Here’s a quick how-to guide for running a large language model (LLM) locally using three tools: Ollama, Ngrok, and Cursor IDE. This setup is perfect if you want to test AI features without relying on OpenAI’s cloud, or if you're just nerding out and want more control.

Tools We'll Be Using

Cursor (🔗 https://www.cursor.com)

Cursor is a developer-focused IDE with AI built right in. You can switch between different models and agents, all powered by your OpenAI API key (unless you’re paying Cursor directly to use theirs).

Some standout features:

Understands your codebase and adds context to prompts
Runs commands for you (with your approval)
Spot bugs and offers fixes
Autocompletes like a champ
Built-in smart reviews

It’s VS Code on steroids, with some real AI muscle.

Ngrok (🔗 https://ngrok.com)

Ngrok is an API Gateway service that handles everything related to your API so that you focus on your business rules without worrying about secure connections and infrastructure. While it offers a full suite of infrastructure tools, we’ll focus on Webhook testing, which gives us a public URL to test a local server.

Ollama (🔗 https://ollama.com)

Ollama is a fantastic open-source tool that lets you run LLMs locally. You can pull supported models, serve them through a local API, and even customize or train your own.

Browse available models at ollama.com/library

Let’s Get Started

1. Pull LLMs with Ollama

Once everything’s installed, the first step is to pick and download an LLM. Just a heads-up: LLMs are resource-hungry. Bigger models need more RAM and CPU, so make sure your machine can handle it.

To install a model:

ollama pull <model>

For example:

ollama pull gemma3:4b

You can check installed models with:

2. Set Up Ngrok

Create an Ngrok account and enable Multi-Factor Auth (highly recommended). Then, install the CLI and connect it with your account by running:

ngrok config add-authtoken <your-token>

The command above will set up your computer to connect securely with the Ngrok Web platform, but it won’t secure any endpoint you later expose.

🚧 Heads up: Your Ngrok endpoint exposes your local server to the internet. Even on the free plan, Ngrok gives you options like auth, IP filtering, and rate limits. Worth setting up later!

3. Run the Ollama Server with Ngrok

Ollama can serve your model through a local API. Start it like this:

OLLAMA_ORIGINS=* ollama serve

By default, it runs on port 11434.

Now expose that port with Ngrok:

ngrok http 11434 --host-header="localhost:11434"

Ngrok will generate a public HTTPS URL, called “endpoint” in the ngrok language, that maps to your local server.

We now have a public URL to connect us from the internet to our local Ollama server, It’s time to tell Cursor how to use this URL to hit our Ollama server locally.

4. Connect Ollama to Cursor IDE

Add Your Model

In Cursor, go to Settings > Cursor Settings > Models.

You’ll see some defaults, but they likely won’t match your locally installed model names. Hit + Add Model and enter the exact model name from ollama list.

Override OpenAI API Base URL

Scroll to the OpenAI API Key section and expand Override OpenAI Base URL.

Paste your Ngrok HTTPS URL here. You can put anything in the API key field, it’s not validated in this case since our API doesn’t have any authentication method enabled yet.

5. Test the Setup

To make sure everything works, you should:

✅ Be running the Ollama server
✅ Have Ngrok expose port 11434
✅ Have your LLM registered in Cursor
✅ Set the Ngrok URL in the API override

Now, toggle the Enable OpenAI API Key slider in Cursor’s settings. It’ll ping your API to confirm the connection.

If there's an error, Cursor will pop up the response — super useful for debugging.

Once it connects, you’re good to go! Start chatting with your local LLM in the left panel.

Make sure to:

Switch prompt mode to Ask or Manual
Disable automatic model selection
Manually pick one of your local models

Takeaways

Congrats — you’ve got a local LLM running and integrated into your dev workflow.

That said, performance varies depending on your hardware. Try a smaller model (1b or 3b parameters) if your machine struggles. Models like 7b and up usually need at least 16GB of RAM.

And don’t be afraid to experiment. Pull multiple models and see which works best for your use case.

Also, remember: you don’t need Cursor or Ngrok specifically. All you need is:

Ollama to run the model locally
Some way to expose it via a fixed URL (Ngrok, localtunnel, etc.)
An IDE that lets you set a custom OpenAI API base URL

Next Steps

🔄 Automate it all
Right now, everything is manual. Consider scripting the startup, spinning up the server, and tunneling with one command.
🔐 Secure your endpoint
Even if your CLI is authenticated, your public API is wide open. Use Ngrok’s free auth and rate limits to avoid abuse. But then set the API Key in Cursor to successfully connect it to the API.
🔗 Claim a static Ngrok URL
Ngrok lets you claim a static domain for free. This means you won't have to update Cursor every time you restart the tunnel. Just be sure to lock it down properly if you go that route.

Run an LLM on your computer with Ollama, Ngrok, and Cursor