Richland, WA  ·  Serving the Tri-Cities & Pacific Northwest
Field Notes

What Local AI Does That Cloud AI Can't (And Where Cloud Still Wins)

May 6, 2026

Most conversations about AI assume cloud. Open ChatGPT, open Claude, open Gemini. Pay $20 a month, get a chat box, done.

That covers most use cases. It doesn't cover all of them.

I run AI on my own hardware at home. Not as a hobby project, as the way I do real work every day. That gives me a perspective most consultants don't have. Here's what local AI actually wins at, what cloud still wins at, and how to decide which problem belongs where.

What "local AI" actually means

When people say "local AI" they usually mean one of three things:

  1. An open-weight model (Llama, Qwen, Mistral, DeepSeek, IBM Granite, others) running on your own computer instead of in someone else's data center.
  2. A privately-hosted version of a commercial model on infrastructure you control.
  3. An AI assistant that lives on your machine and never makes outbound calls.

For the rest of this post I'm talking about the first one: open-weight models running on hardware you own.

Where local wins

1. Your data never leaves your building

This is the obvious one and it matters more than most people realize. Cloud providers all have privacy policies. Some are excellent. Some are vague. None of them are "your data physically does not exist on our servers."

For most businesses this is fine. For some businesses (healthcare with PHI, law firms with privileged matters, financial firms with client data, government-adjacent work, anything covered by export controls) it's the difference between using AI and not using AI.

Local AI doesn't make that problem easier. It removes it. The data was on your machine. The processing happened on your machine. The output stayed on your machine. There is no third party to subpoena.

2. No rate limits, no usage caps, no surprise bills

If you've ever burned through a month of Claude Max or hit an API rate limit mid-job, you know the feeling. Cloud AI is metered. Use too much and you wait, or you pay more.

Local AI runs at the speed of your hardware. The bill is electricity. You can run a model overnight processing 50,000 documents and the cost is whatever your power company charges per kilowatt-hour. For high-volume repetitive work, the math eventually flips in local AI's favor.

3. Offline operation

The internet goes out. The cloud provider has a regional outage. You're on a job site with bad cell signal. You're flying. You're at a remote facility.

Cloud providers themselves go offline more than people realize. Here's Anthropic's last 90 days from their public status page:

Anthropic public status page showing 90-day uptime: claude.ai 98.73%, Claude Console 99.15%, Claude API 99.05%. Multiple yellow and red bars across the timeline indicate days with partial or significant outages.
Anthropic's public status page, 90-day view (status.anthropic.com).

99% uptime sounds excellent until you do the math. 99% means about seven hours of downtime per month, and 98.73% (claude.ai) is closer to nine. The colored bars are the days the downtime actually landed on, and if your business runs on cloud AI, those are days you can't ship. OpenAI and Google have similar variance over any given 90-day window. This isn't a knock on any one provider; it's the reality of running services at scale.

Local AI works through all of that because it doesn't need a connection to anything. For mobile professionals (contractors, inspectors, field service, traveling consultants) and for businesses in places with unreliable internet, this is a real feature, not a theoretical one.

4. Full control over the model

With cloud AI you get whatever the provider gives you. They change the model behind the API, you adapt. They deprecate the version you depend on, you migrate. They add a content filter that blocks legitimate use cases for your industry, you work around it.

With local AI, the version you have is the version you have, forever. You can fine-tune it on your own data. You can shape its behavior at a level cloud APIs don't expose: more verbose or more brief, more thoughtful and methodical or faster and more direct, anchored to your voice from your own writing instead of the generic "AI assistant" tone, with sampling parameters and stop conditions tuned for the specific task. Cloud APIs give you a temperature slider and a system prompt. Local gives you everything underneath.

5. Genuine permanence

If Anthropic, OpenAI, or Google change their pricing tomorrow, raise their floor, or shut down a product, you have no recourse. You either accept the new terms or you migrate.

The open-weight model running on your hardware will run on your hardware in five years just like it does today.

6. Use the right model for each job

Cloud AI is one model at a time, decided by the provider. Local AI is a library you swap from at will. A new open-weight model drops on a Tuesday and you can be running it Wednesday. No migration plan, no API contract change, no waiting on the provider's roadmap. You can run a small fast model for routine work (summaries, categorization, simple drafts) and a bigger heavier model for the hard tasks. You can run several at once and route each piece of work to whichever fits best. Cloud charges you per token on every request; local lets you compose models freely.

Where cloud wins

1. Cloud is still the safer default at the top end

The biggest open-weight models are excellent. On many benchmarks they match or beat frontier cloud. Qwen3.6-27B matches Claude 4.5 Opus on Terminal-Bench 2.0, and Qwen3-Max-Thinking outperforms Claude Opus 4.5 and GPT-5.2 on several agentic tasks. The "open-weight always lags" framing from 2024 is dead.

What cloud still has is consistency at the very top. For specific known workloads, open-weight often wins outright. For everyday work, the gap doesn't matter either way. The case for cloud is the catch-all: the hardest 5% of tasks where you want the maximum chance of success on the first try and don't yet know what kind of hard you're dealing with.

Benchmarks measure what they measure. They don't capture the harder-to-quantify feel of working with a model on open-ended problems: judgment calls, taste, whether it knows when to push back. On those, regular users often report frontier cloud still feels a step ahead even when specific benchmark numbers say otherwise. Treat the scores as one signal among several.

2. Zero setup

Cloud AI is one credit card and a browser tab. Local AI is hardware sizing, model selection, inference engine setup, prompt format quirks, GPU drivers, and ongoing maintenance.

For a business owner who just wants to use AI, cloud is the right answer almost always. Local AI is a real ongoing project, not a tool you install once and forget.

3. Lower upfront cost

$20 a month for ChatGPT Plus or Claude Pro covers most individual users. A capable local AI rig is $2,000 at the low end and $10,000+ if you want to run the bigger models well. That payback period only makes sense above certain usage thresholds.

For most small businesses, the math doesn't justify local hardware. For some it absolutely does, but it's a real calculation, not a default answer.

4. Ecosystem and polish, not the models themselves

The multimodal models tell the same story as #1: open-weight has caught up. Vision and document benchmarks (OmniDocBench, MMMU) now sit alongside the closed ones. What cloud still wins is the ecosystem around the model: voice with low-latency turn-taking, real-time APIs for live transcription, browser automation, dozens of pre-built integrations, and the polish on top. Open-weight equivalents exist for almost all of these (Whisper for voice, Ollama and vLLM for serving, OpenWebUI for the chat surface, plenty more) but stitching them into one fluid experience is real engineering work, not a one-afternoon setup.

If you need a polished AI experience that handles voice, images, code, and text fluidly out of the box, cloud still delivers it today. If you're willing to assemble the pieces, local now competes on the model itself.

5. Updates happen automatically

The cloud provider improves the model and you get the improvement. With local AI, you choose to upgrade, you download the new model, you test it, you migrate. That's a feature for some businesses (you control when things change) and a burden for others (you're now in the loop forever).

How to decide which one fits which problem

Three questions. Run them through any AI use case in your business.

Question 1: Does the data have to stay private?

If the data is regulated (HIPAA, attorney-client privilege, financial records, government work) or under a contract that promises confidentiality, the answer points toward local AI or an enterprise cloud tier with proper contractual protections. If the data is generic or already public, cloud is fine.

Question 2: How much volume?

If you're doing a few hundred AI interactions a month, cloud is cheaper than the hardware and time investment in local. If you're doing tens of thousands of interactions a month, local starts to pay for itself. Somewhere between those two, the math flips.

Question 3: How hard is the task, and how predictable?

For everyday tasks (drafting emails, summarizing meetings, answering customer questions) any modern model handles it well. For specific complex workloads where you can pick the right model for the job, open-weight often matches or beats cloud. For open-ended hard problems where you don't yet know what you're up against, frontier cloud still has a slight consistency edge.

The realistic answer for most businesses

You'll use both.

Cloud AI for general productivity. Local AI for the specific workflows where privacy, volume, or control matter enough to justify the investment.

The answer is, as with most complicated things, complicated. It entirely depends on your needs, and in 2026, almost all businesses using AI have a mixed strategy.

The mistake is assuming one tool fits everything. Cloud-only businesses are leaving real privacy and cost wins on the table. Local-only businesses are making everyday work harder than it needs to be.

If you want help figuring out which problems in your business belong on which side of that line, that's part of what an AI Readiness Audit covers. Reach out and we'll talk.

Want the next one in your inbox?

No newsletter ambush.

We'll email you when each post is up. Two emails a month, max. Easy unsubscribe.

Direct Line

(517) 917-0330
Office hours
Mon to Fri · 8 AM to 5 PM Pacific
In-person meetings by appointment, anywhere in the Tri-Cities.