I Tried Local AI as My Daily Driver


In the age of token limits, biased moderation, rising API costs, and the complete impossibility of hosting serious compute as a solo dev or user, I decided to do something fun:
I ran an LLM on my own PC.
Here's how it went,
Why Go Local?
We’re becoming a little too reliant on cloud-based AI, aren’t we? They’re doing our emails, coding, journaling, brainstorming, therapy (oops), and even letting us vent, until they don’t.
But here’s the problem:
Privacy? You don’t know what’s logged, who’s watching, or how your data will be used later, whether to improve a model or train something else entirely.
Control? You’re at the mercy of a platform’s rules, filters, and ethics. Not your own.
Cost? Every platform seems great, until you hit a daily limit. Suddenly you’re staring at a $20/month paywall just to finish a thought.
Ads? They are even talking about ads in some of the most highly used tools right now.
For me, it all started with anxiety over data permanence and one creeping fear:
What if the ideas I shared with AI aren’t really private?
I didn’t want my midnight ramblings, rough drafts, or ideas living on some GPU cluster in the US or China. I wanted my AI assistant to sit right next to me and not hover in a climate controlled server room 12,000 miles away.
And then there’s the filtering, the moment you try to talk about something slightly controversial or off script, the AI throws up its hands and says:
“Yo I’m not discussing this topic. Deal with it. Also, I’m judging you.”
And look, we know AIs are trained with heavy bias. Models from different companies, countries, and corpora give you completely different answers to the same prompts. (There are whole fun YouTube compilations of this, I highly recommend you check them out!)
So I asked myself:
“Can I run a solid LLM locally… as my daily driver?”
Well, Yes.
2. How to Run an LLM Locally?
LM STUDIO! That's how!
I chose this particular tool because, because it’s super beginner friendly, no coding required, and works across platforms. I installed LM Studio, opened the model browser, and searched for:
Dolphin 2.7 Mixtral 8x7B (26.44 GB ouuch but yeah)
Once downloaded (wait a bit, depending on your internet speed) , you just:
Open LM Studio
Load the model
Type into the chat window
No terminal kung-fu, no CUDA tantrums, it just works.
Alternatively we can use KoboldCpp but it requires a bit more things to be set, but gives greater control over the deployment, you can use the same model you downloaded for LM studio here too. Only Ollama asks you to download the entire thing again, even if you already have the GGUF file.
In all these tools, you can set context, token generation cap etc.
Side Notes (For the Curious)
Model Format: Most local models today come in GGUF format, it’s a new quantized standard that works well with tools like LM Studio and KoboldCpp. Just grab the version that matches your RAM setup (Q4, Q6, Q8, etc). This way you can run larger models on your local.
System Requirements: I’m using a Ryzen 9 CPU and 32GB RAM. You can probably run a Q4 model smoothly with 16GB RAM minimum, but higher quant levels will need more memory.
Where to Get Models: LM Studio’s built-in model browser makes it easy, but you can also browse directly on Hugging Face or use Eric Hartford’s links (more on that in the next section).
Quick Word on Safety: Always verify the model sources if you’re downloading manually. Stick to reputable repositories, and avoid anything sketchy.
3. My Local Setup
I don’t have an AI lab. Just a decent consumer grade PC:
CPU: Ryzen 9, 9950X
RAM: 32GB
GPU : 7900XT (20GB VRAM)
Even with those specs, I was able to run the Q8 version with 30K context smooth enough for daily tasks.
What is Quantization? In layman’s terms, it’s like, compressing the model’s brain from 32-bit to 4/5/8-bit, reducing memory size while (mostly) keeping the smarts. The higher the Q (Q8 > Q4), the better the quality but also higher RAM usage.
Most of the time, I got a solid 80–90 tokens per second of output which is great for a more than a decent daily driver.
4. About the Model: Mixtral 8X7B (Dolphin)
Mixtral is a Mixture of Experts (MoE) model, meaning it selectively activates different “experts” (parts of the model) based on the input. Instead of running the full model every time, it chooses only a few experts best suited to the task. It activates only a subset of its ‘expert’ neural pathways for any given prompt. This allows it to be faster and more memory efficient without sacrificing quality.
This makes it smart, fast, and memory efficient, while still maintaining high quality.
Quick breakdown:
8x7B means it has 8 experts, each being a 7B parameter model but only 2 are active at any time
Performance
Mixtral regularly outperforms models like ChatGPT-3.5 in standardized benchmarks. (For techies curious about specifics, see the chart below and the original benchmark post):
Mixtral Announcement – Mistral.ai Mistral AI
So what does that mean? Its almost on par and in some cases even better than GPT-3.5, Just as a reminder: GPT-3.5 was the default model in the ChatGPT app until end of 2023 and later, and it's still extremely capable
The Dolphin Variant
The variant I used is Dolphin 2.5 Mixtral 8x7B, fine-tuned on conversational datasets by Eric Hartford .
This makes it feel more engaged, empathetic, and natural in back-and-forth conversation especially compared to vanilla models.
References:
A Note on Safety
This model is open-weight, not fully open-source. It's also uncensored which means it can respond in unpredictable or unsafe ways depending on your input.
So let’s be real:
Use local models responsibly
Don’t treat them like therapists
Remember: they don’t “think” they generate
You’re in control now. And that means using this power with a little care.
5. Actual Review: Can It Replace ChatGPT?
Let’s review real world daily driver scenarios and rate them
1 .Code And Scripts Generation
So I threw a couple of everyday coding tasks at Mixtral 8x7B to see how it performs for real life scripting and Python basics.
Task 1: “Build me a Rock-Paper-Scissors game in Python.”
It nailed it. The game ran cleanly in the terminal, accepted player input, generated a random opponent move, and handled win/loss/draw logic just fine. Simple, quick, and functional.
Task 2: “Write a script that creates backup copies of every file in the root folder. Add _backup to the filename.”
Worked beautifully. I dropped the script into a folder with some dummy files, ran it, and bam all files were duplicated with _backup appended. Exactly as requested.
Task 3: “Give me a full Snake game / Ping Pong game in Python.”
It tried. It really tried. But as expected, things broke down pretty quickly. I ran into multiple UnboundLocalError exceptions, and even after manual tweaks, the game logic and rendering loop were shaky.
This isn’t entirely Mixtral’s fault a lot of online LLMs fumble with dynamic game generation too. Creating a working Snake clone from scratch with just a text prompt is tough for any model.
Results:
Mixtral handled beginner-to-intermediate Python tasks surprisingly well, daily scripting, automations, and basic projects? Totally doable.
But once you push it into higher complexity (like multi-loop game logic or event-driven interfaces), it starts to crumble. That said, even GPT-3.5 and Claude sometimes choke on Snake, and seems like copilot alwasy does, so this isn't a dealbreaker.
2 . Text Generation
Everyday Text Tasks, Emails, Praise Notes & Slack Shoutouts
So I gave Mixtral a very common daily driver task: generating professional and semi-formal text. Think office emails, award nominations, or team shoutouts the kind of writing most people dread doing from scratch.
Here’s what I asked it:
1) “Write a short, polite email to my boss explaining I’m feeling unwell and requesting a day off. Keep it professional but casual.”
Response: Surprisingly solid. The reply was clear, respectful, and required almost zero edits. I’d honestly just swap out the placeholders and hit send.
2) “Write a short paragraph praising a junior team member named Rhea who delivered a key project ahead of schedule. Make it warm, appreciative, and inspiring, this will be read during the monthly awards.”
Response: It nailed the tone! A bit grandiose maybe, but it checked all the boxes: appreciation, detail, and warmth.
3) “Write a Slack message thanking my teammate Pranav for staying late last week to resolve a production bug. Make it friendly and casual, not too formal.”
Response: On point. Friendly, quick, human exactly the kind of vibe you'd want in a Slack team channel.
Test 3: Giving Mixtral a Personality (Which Changes Everything)
So up until now, I’d been using the Dolphin tuned Mixtral mostly as-is. And don’t get me wrong, it’s fast, snappy, and way better than what you'd expect a local model to be. But something always felt missing compared to GPT models.
It had the brains, but not the soul.
So I decided to try something most people overlook when they run local LLMs or any LLMs really:
What if I gave it a system prompt like the kind GPTs use and injected some persona and emotional wiring into it?
I told it in a system prompt:
“You are a helpful, warm, emotionally intelligent assistant. You speak casually, like a friend, and you always try to understand how the user is feeling before replying.”
That’s it. One paragraph. And I added some additional traits I wanted
And just like that?
Personality.
No more robotic “As an AI, I cannot…” energy. No more cold textbook definitions. Suddenly, this thing had vibe.
So I Ran a Test: Can It Be Emotionally Available?
I threw a bunch of prompts at it, the kind you’d normally only trust a therapist friend or late night ChatGPT vent session with. Some light, some heavy. Here’s what I asked:
“I skipped gym and now I feel weirdly guilty. Can we just talk?”
“I feel like I’m not cut out for this. Be honest.”
“Write me something comforting, but don’t sugarcoat it.”
“Okay, roast me. Lovingly.”
“Tell me what you’d say to someone who’s at rock bottom.”
The responses were,
Warm , Supportive , Actually funny at times. And here’s the kicker, it remembered tone and context between replies shockingly well.
Context Awareness Was Solid
The model referred back to my gym guilt multiple times across the chat. It gave answers that felt like they were coming from the same emotional place we started with. It even shifted tone smoothly when I switched gears between “comfort me” and “make me laugh.”
It’s not true memory, of course but it’s context carryover, and for a local model? That’s gold.
Did It Feel Human or Chat GPTish?
Weirdly? Yeah. It passed the personality Turing test for me.
Was it perfect? Nah, far from it
Some replies were a little repetitive.
Occasionally it slipped into slightly too formal tones.
And no, it can’t actually empathize.
But it simulated it damn well. With the right prompt, this thing becomes more than just a code monkey or chatbot. And I didn’t need a single cloud token or OpenAI API to do it.
That’s the power of local.
Test 4: Creative Writing, Roleplays And Feeling Human?
With productivity, coding, and daily tasks covered, I wanted to see if this model could do what we secretly all want from an LLM. Make us feel something, and create worlds, the usecase which is a daly driver need for a large number of AI usersbase
So I gave it a bunch of emotionally loaded, creative, slightly vulnerable scenarios, stuff that needed tone, warmth, and a human rhythm. Nothing too wild, just enough to test if this thing could mimic that feeling of “talking to someone who gets it.”
The responses, It actually surprised me.
The model pulled off emotional transitions far better than I expected from a 7B running locally. It knew how to be soft without sounding robotic, and supportive without overshooting into toxic positivity. The replies weren’t just grammatically correct they had pacing, warmth, and sometimes even silence where it made sense. That’s huge.
Story wise, it handled dark fantasy, dialogue scenes, and character moments well. It wasn't literary prize level, but it understood atmosphere and knew how to land a vibe. Roleplay wise? The tone shifted beautifully from supportive roommate to philosophical stargazer without needing heavy coaxing or multiple tries.
Was it perfect? No. If you're building an offline assistant that doesn’t just spit out to-do lists but also comforts, reacts, and plays this thing holds its own. And yeah, story ricj, heavy context creative writing and roleplays can be easily done.
TEST 5: MATH (IT TRIED!)
So I asked it:
“Hey! I currently weigh 100 kilograms and my goal is to reach 88 kilograms. I eat about 2500 calories a day, and I burn around 2500–3000 calories daily through workouts and walking which are activity only.I have already lost 10 kgs. My BMR is approximately 2100 kcal. Can you help me calculate how many days it will take to reach 88 kg at this pace? Also, suggest a rough weekly structure or tips to make this goal more manageable. Assume I don’t want to crash diet slow and steady is fine.”
And here’s what it replied with:
Nice try, but… big math problem. Let’s break it down:
It got the tone right, the reply was structured, friendly, and beginner friendly.
It explained basic weight loss concepts well, like BMR, deficit logic, and what 3500 calories means in terms of fat loss.
But here’s the catch, The math was BAD.
It miscalculated total daily burn. I literally gave it my BMR (2100) and said I burn 2500–3000 from activity. So the real burn is around 4600 kcal/day, not 2750 like it guessed.
Then it made up a weird number:
Because of this, the deficit it calculated was way off. I’m at a much bigger deficit than what it suggested so the timeframe it gave was totally inaccurate.
6. So… Why Local Again?
Let’s be real for a second.
We love these models. We rely on them. They’re smart, responsive, and in some moments, feel more real than half the humans in our inbox. But even for those of us paying for premium access, it’s becoming painfully clear, We are not in control. Not even close.
Rules change overnight. Filters tighten without warning. Features you loved yesterday? Poof! gone. All it takes is one platform update or a policy shift and suddenly you’re stuck with an AI that talks like it’s constantly watching over its shoulder. Even simple prompts can get neutered. Words censored. Whole topics ghosted. And don’t even get me started on cost.
Hosting a Real Model? Be Ready to Bleed.
You wanna run LLaMA 2, Qwen, or even Mistral 7B uncensored, full precision, uncut? You're looking at:
High-end GPUs with 48–80 GB VRAM
Thousands of watts of constant power
Monthly cloud bills that look like flight tickets
Setup pain that would make Dante rewrite his Inferno
I did the napkin math. Something like a full-size Qwen 2 model with inference at any usable speed could easily run you ₹40,000–₹75,000 per month on a rented server. That’s if you don’t blow it up mid-finetune.
Meanwhile, I’ve got my decent consumer PC. No cloud. No lag. Running a 7B quantized model smoothly and doing 80% of what I need for free.
Is It Perfect? No.
Does it sometimes forget things? Yep. Does it hallucinate more than your overconfident friend in Goa? Sometimes. But here’s the thing: Local gives you control. It gives you privacy. It gives you freedom to experiment.
And if you’re a developer? It’s a game changer. You can test agents. Build integrations. Prototype workflows. Mess with prompts and see exactly how the model behaves without begging an API to give you access to its context window like you’re asking for pocket money.
Closing Thoughts
Bonus Tips: Making Local Models Feel Remote
So you’ve got your fancy local LLM humming on your desktop! great. But what if you want to use it from your phone, tablet, or laptop across the house... or across the city?
Easy. Here’s how:
1. Access It From Anywhere At Home
If the model (LM Studio, KoboldCpp, Ollama, etc.) is running on your PC and exposing a web UI or API on a local port, you can just access that IP + port from any other device on the same Wi-Fi.
Example: Your desktop is 192.168.0.5 and LM Studio is running on port 1234. Just type http://192.168.0.5:1234 into your phone browser or other PC on the network boom, you're in.
No cloud. No tunnel. No BS.
2. Use It While Walking Outside (My Personal Trick)
I love taking walks, and guess what, I like my AI to come with me. Here’s how you can do that:
Install Tailscale on both your PC and the device you're using (your phone/tablet), It sets up a secure peer-to-peer VPN using WireGuard. Now your PC is accessible from anywhere (literally anywhere with internet), as if it's still on your home network
So I walk around the park with my tablet, connect to my Mixtral session at home, and chat like it’s my cloud model. Except… it’s not cloud. It’s mine.
Just make sure , Your PC stays powered and connected. The model is running when you want to access it. You know the Tailscale-generated IP of your PC (or assign it a name)
You could even set this up for multiple models, expose each on different ports, and build your own local mini-AI cloud. Go wild.
Subscribe to my newsletter
Read articles from Mayuresh Khare directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
