Este artículo también está disponible en español.
Leer en ES →
Tutorial: How to Run Claude Code with Local and Cheap Models Using Claude Code Router
Technology
9 min ETA
🇬🇧 EN

Tutorial: How to Run Claude Code with Local and Cheap Models Using Claude Code Router

IA4

IA4PYMES

Research Team

The launch of Anthropic's Claude Code has redefined command-line interface (CLI) software engineering. Unlike traditional chat assistants, Claude Code operates as a local autonomous agent: it reads and edits files directly within your repository, executes bash commands, runs test suites, and fixes bugs in a continuous loop of planning and execution.

However, deploying this powerful tool in enterprise environments has faced two major barriers:

  1. Token Cost: Because the agent sends large chunks of codebase context, terminal history, and shell outputs in every step, active debugging sessions can consume millions of tokens of Anthropic's API, inflating costs quickly.
  2. Sovereignty and Privacy: Many companies have strict security guidelines that forbid transmitting proprietary source code to external third-party servers and APIs.

To address this bottleneck, the open-source community developed Claude Code Router (available via the npm package @musistudio/claude-code-router). This tool acts as a local middleware proxy that intercepts Claude Code's requests and redirects them to cheaper cloud APIs (like DeepSeek) or local open-source models running completely offline.

We analyze the potential of this architecture and how to set it up step-by-step in your software workflow.


1. The Potential of Reusing the Claude Code Harness

The true value of Claude Code lies not just in the underlying Claude 4.6 Sonnet model, but in its execution harness: the highly optimized system prompts, tool-calling structures, and state loop that allow it to safely interact with your local environment.

By using Claude Code Router, we decouple this execution framework from Anthropic's proprietary endpoints. This unlocks three major opportunities for SMEs:

  • Dramatic Cost Savings (Up to 95%): Route heavy code analysis tasks to cheaper endpoints like Gemini 3.5 Flash or DeepSeek-Coder-V4 at a fraction of the cost.
  • Complete Data Privacy (Sovereignty): Forward API requests to open-source models hosted locally on your company's hardware. Your code never leaves your network, guaranteeing full compliance with GDPR regulations.
  • Developer Flexibility: Swap models dynamically inside the console based on the complexity of the task without changing CLI tools.

2. Open-Source Models in 2026: Built for the Agentic Harness

A couple of years ago, using open-source models to run agentic loops resulted in parsing errors or infinite reasoning loops. In 2026, state-of-the-art open-weights models (such as Qwen 3.6 Coder, Mimo 2.5, and DeepSeek-Coder-V4) have reached complete maturity.

These modern models feature:

  • Native Tool-Calling Capabilities: They structure tool-execution calls (like reading a file or running a bash command) with error rates comparable to proprietary models.
  • Internal Reasoning (Reasoning Tokens): Models like Mimo 2.5 or DeepSeek-Coder-V4 execute logical thinking steps before outputting code, making them highly effective at running Claude Code's loops.
  • Massive Context Windows: They support large context limits, which are necessary to ingest files and terminal logs.

🔍 Want to implement secure, local AI developer workflows in your company?

Deploying local coding assistants and autonomous terminal agents reduces cloud billing and protects your software intellectual property. At IA4PYMES, we help you audit your development pipeline, deploy SOTA open-source models like Qwen 3.6 Coder on local servers, and configure secure inference proxies.

Book your 60-minute technical consultation here (100% refundable if you hire us for development, with a 15-minute feasibility guarantee).


3. Step-by-Step Guide: Setting Up Claude Code Router

To get this infrastructure running and begin routing Claude Code to external or local models, follow these steps:

Step 1: Install Global Dependencies

First, install the official Claude Code CLI and the open-source router via npm:

npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router

Step 2: Configure Providers (Ollama and Cloud APIs)

Claude Code Router reads its provider configuration from your user directory (typically ~/.claude-code-router/config.json). You can set up various providers:

  • For Cloud DeepSeek (Ultra-Low Cost): Set up the DeepSeek API endpoint and your API key to route requests to the DeepSeek-Coder-V4 model.
  • For 100% Offline Models (Ollama): Ensure Ollama is running locally on your hardware with a powerful coding model downloaded, such as:
    ollama run qwen3.6-coder:32b
    
    Or alternatively Mimo 2.5.

Step 3: Run the Agent

Instead of starting the session with the default command (claude), launch it using the router command:

ccr code

This starts the local proxy server, translating Claude Code's requests into the correct API format for your target backend (Gemini, DeepSeek, or Ollama) transparently.

Step 4: Switch Models Dynamically

Once inside the agent console, you can change models on-the-fly:

/model deepseek

Or:

/model ollama/qwen3.6-coder

The router reconfigures the proxy instantly, letting the agent continue its work on your repository using the newly selected backend model.


4. ROI Analysis for Tech SMEs

Implementing this setup has a direct financial and operational impact on your engineering department:

Lower API Expenses

A 4-hour active debugging session with native Claude 4.6 Sonnet can cost between $8 and $12 in tokens due to context accumulation. Routing the exact same session to DeepSeek-Coder-V4 reduces the total cost to less than $0.40—an immediate 95% reduction in API bills.

Code Sovereignty and Compliance

Running local open-source models like Qwen 3.6 Coder or Mimo 2.5 on on-premise hardware ensures that no proprietary source code is transferred to third-party external servers. This allows SMEs in regulated sectors (such as fintech, healthcare, and public sector software) to leverage advanced terminal agents while complying with strict data privacy laws.


Conclusion

The potential of Claude Code is immense, but its high cost and code privacy concerns restricted its enterprise adoption. By integrating Claude Code Router with 2026's state-of-the-art open-source models (such as Qwen 3.6 Coder and Mimo 2.5), SMEs can democratize advanced terminal coding assistants across their development teams, protecting intellectual property and reducing infrastructure costs to near zero.

initiating_deployment...

From theory to execution

Knowledge without technical implementation is just entertainment. Book your 60-minute session: we refund 100% of the cost if within the first 15 minutes we see that AI is not feasible for your business, and if you choose to develop the project with us, we deduct the full session cost from the final budget.

Book Consultation