How do I set up DeepSeek locally with Bodega One?

Pull the model in Ollama using ollama pull deepseek-r1:14b, then open Bodega One and navigate to Settings > Providers > Ollama. The model will automatically appear in the model selector. The whole process takes under 10 minutes once Ollama is installed.

What's the difference between DeepSeek-R1 and DeepSeek-V3?

DeepSeek-R1 is a reasoning model that thinks before answering, making it slower but stronger on complex debugging and architecture decisions. DeepSeek-V3 is faster and better for general-purpose code, but the full 671B version requires multi-GPU setups. The distilled 14B versions of both run on consumer hardware.

Which DeepSeek model should I run with 12GB of VRAM?

DeepSeek-R1 14B quantized as Q4_K_M is the recommended choice for 12GB VRAM. It offers noticeably stronger reasoning capability than the 7B versions and generates tokens at 15-30 tokens per second on typical GPUs, making it fast enough for interactive chat and agent loops.

What are thinking tokens and why do they appear in DeepSeek-R1 responses?

DeepSeek-R1 generates a chain-of-thought reasoning process before answering, which shows up as think blocks in responses. Some Ollama versions strip these automatically while others pass them through. For Bodega One's agent loop, this usually isn't a problem since the agent extracts the final answer from the response.

How to run DeepSeek locally with Bodega One

Quick answer

To run DeepSeek locally with Bodega One: pull the model in Ollama (ollama pull deepseek-r1:14b), then in Bodega One go to Settings → Providers → Ollama. DeepSeek-R1 14B runs well on 12GB+ VRAM. See all 15+ supported providers.

DeepSeek attracted attention in early 2025 with models that matched frontier performance at a fraction of the training cost. DeepSeek-R1 in particular, a reasoning model trained with reinforcement learning, showed that the compute gap between US and Chinese AI labs was smaller than most had assumed.

For developers running local AI, DeepSeek's models are compelling for a specific reason: they are open-weight, quantization-friendly, and capable on coding and reasoning tasks. Here's how to run them with Bodega One.

Which DeepSeek model should you use?

DeepSeek has released several model families. For coding work:

DeepSeek-R1: A reasoning model. Slower (it “thinks” before answering), but noticeably stronger on complex tasks: debugging, architecture decisions, multi-step code generation. Available in 1.5B, 7B, 8B, 14B, 32B, and 70B parameter sizes.
DeepSeek-V3: A general-purpose model. Faster than R1, strong on code. The full V3 is 671B parameters (MoE) and only runs well on multi-GPU setups. The distilled versions (7B, 8B, 14B) run on consumer hardware.
DeepSeek-Coder-V2: An earlier coding-specific model. Still solid, but DeepSeek-R1 and V3 have largely superseded it for general coding tasks.

Which size to run by VRAM

8GB VRAM: DeepSeek-R1 7B or 8B (Q4_K_M). Functional, good for everyday tasks.
12GB VRAM: DeepSeek-R1 14B (Q4_K_M). Noticeably stronger reasoning.
16-24GB VRAM: DeepSeek-R1 32B or DeepSeek-V3 distilled 14B, strong on complex code.
48GB+ VRAM: DeepSeek-R1 70B, approaches frontier performance locally.
Apple Silicon 16GB: DeepSeek-R1 7B or 8B MLX, good balance of speed and quality.

For a full hardware reference, see the GPU guide for local AI.

Option 1: Run via Ollama (recommended)

Ollama has DeepSeek-R1 in its model library. Pull a specific size:

ollama pull deepseek-r1:7b (7B parameter model)
ollama pull deepseek-r1:14b (14B parameter model)
ollama pull deepseek-r1:32b (32B parameter model, needs ~20GB+ VRAM)

Ollama handles quantization automatically. The default pull gives you Q4_K_M, which is a good balance of quality and size.

Once the model is pulled and Ollama is running, connect Bodega One: Settings → Providers → Ollama. The model will appear in the model selector.

Option 2: Run via LM Studio

LM Studio's model browser includes DeepSeek-R1 in multiple sizes. Search for “DeepSeek-R1” in the model browser, pick your size, and download. Load it and start the local server. Then connect Bodega One to LM Studio at http://localhost:1234/v1.

For the full LM Studio setup guide, see LM Studio + Bodega One setup.

A note on the thinking tokens

DeepSeek-R1 is a reasoning model. It generates a chain of thought before giving its final answer. This shows up in responses as a <think>...</think> block before the actual answer. Some Ollama versions strip this automatically; others pass it through.

For the agentic coding loop in Bodega One, this usually isn't a problem. The agent extracts the final answer from the response. But if you see thinking tokens in unexpected places in the UI, it's worth checking whether your Ollama version handles the R1 reasoning format correctly.

Performance expectations

DeepSeek-R1 14B is a strong all-round model for coding. On a 12GB VRAM machine with Ollama, expect token generation in the 15-30 tokens/second range depending on GPU. That's fast enough for interactive chat and agent loops without feeling slow.

For comparison: Qwen2.5-Coder-32B at the same quality level requires ~22GB VRAM. If you have less than 16GB VRAM and want strong coding performance, DeepSeek-R1 14B is worth trying first.

See the full BYOLLM provider list for all local and cloud options supported in Bodega One. If you want to try cloud DeepSeek (via API) for comparison, the custom provider preset supports any OpenAI-compatible endpoint.

Common questions

How do I set up DeepSeek locally with Bodega One?: Pull the model in Ollama using ollama pull deepseek-r1:14b, then open Bodega One and navigate to Settings > Providers > Ollama. The model will automatically appear in the model selector. The whole process takes under 10 minutes once Ollama is installed.
What's the difference between DeepSeek-R1 and DeepSeek-V3?: DeepSeek-R1 is a reasoning model that thinks before answering, making it slower but stronger on complex debugging and architecture decisions. DeepSeek-V3 is faster and better for general-purpose code, but the full 671B version requires multi-GPU setups. The distilled 14B versions of both run on consumer hardware.
Which DeepSeek model should I run with 12GB of VRAM?: DeepSeek-R1 14B quantized as Q4_K_M is the recommended choice for 12GB VRAM. It offers noticeably stronger reasoning capability than the 7B versions and generates tokens at 15-30 tokens per second on typical GPUs, making it fast enough for interactive chat and agent loops.
What are thinking tokens and why do they appear in DeepSeek-R1 responses?: DeepSeek-R1 generates a chain-of-thought reasoning process before answering, which shows up as think blocks in responses. Some Ollama versions strip these automatically while others pass them through. For Bodega One's agent loop, this usually isn't a problem since the agent extracts the final answer from the response.

Ready to own your tools?

Beta is live now. Join the waitlist for full launch.

Join the Waitlist →See Pricing

← Back to the blog