Tired of paying $20/month for GitHub Copilot? Or perhaps you are working with sensitive, proprietary code that you cannot upload to third-party cloud servers? In this step-by-step tutorial, we will show you how to set up a 100% free, fully offline, and highly capable local AI coding assistant directly inside Visual Studio Code using Ollama and Continue.dev.
Recent advances in open-source Large Language Models (LLMs) mean that compact models like Qwen2.5-Coder and DeepSeek-Coder-V2 can now run smoothly on standard consumer laptops, offering autocomplete, code refactoring, and chat features that rival commercial cloud alternatives. Let's get started!
Why Go Local? The Key Advantages
- 100% Free: No subscriptions, no token limits, and no usage fees. Once set up, it costs nothing.
- Complete Privacy: Your code never leaves your machine. Perfect for enterprise environments, sensitive client work, or proprietary personal projects.
- Offline Availability: Code on a plane, in a coffee shop with bad Wi-Fi, or in secure offline environments.
- Highly Customizable: Switch between different models (e.g., lightweight models for fast tab-completion, larger models for complex logic) at the click of a button.
Prerequisites
To run a local LLM comfortably, your system should meet the following minimum recommendations:
- RAM: At least 8 GB of RAM (16 GB or more recommended).
- Processor: Apple Silicon (M1/M2/M3/M4) Mac, or a Windows/Linux PC with an Intel/AMD CPU.
- GPU (Optional but highly recommended): A dedicated graphics card (NVIDIA RTX series or Apple Unified Memory) dramatically speeds up response times.
Model Selection Guide (Based on System Specs)
Running models that are too large for your hardware will result in extremely slow response times ("token generation speed"). Use this reference table to select the best model size according to your system's hardware configuration:
| Model Size | Min RAM | Recommended Specs | Recommended Model |
|---|---|---|---|
| 1.5B Parameters | 4 GB - 8 GB | Standard office laptops, Intel Core i3/i5, base Macbooks. | qwen2.5-coder:1.5b |
| 3B - 7B Parameters | 8 GB - 16 GB | Apple Silicon M1/M2/M3/M4 (8GB+ Unified Memory) or PCs with NVIDIA GTX/RTX dedicated GPU (4GB+ VRAM). | qwen2.5-coder:7b |
| 14B Parameters | 16 GB - 24 GB | Apple Silicon Mac (16GB+ Unified Memory) or Windows PC with RTX 3060/4060 GPU (8GB+ VRAM). | qwen2.5-coder:14b |
| 32B+ Parameters | 32 GB+ | High-end workstations, Apple Mac Studio/Pro (32GB+ Unified Memory) or dual GPU setups. | deepseek-coder-v2:16b |
Popular Local AI Coding Models: What Developers Prefer Today
The open-source AI community moves incredibly fast. While there are dozens of models available on the Ollama library, a few key options stand out as the absolute favorites among developers:
- Qwen 2.5 Coder (Recommended): Developed by Alibaba, this is currently the undisputed king of open-source coding assistants. It possesses state-of-the-art understanding of over 40 programming languages. The 1.5B version is super fast for tab completion, while the 7B version outperforms many models twice its size on coding benchmarks.
- DeepSeek-Coder-V2: An extremely popular Mixture-of-Experts (MoE) coding model. It is the first open-source model to match or beat GPT-4 on multiple code generation and math benchmarks. The 16B parameter version is a developer favorite for heavy-duty code generation.
- Meta Llama 3.1 & 3.2: While Llama is a general-purpose model rather than a code-only model, it has excellent reasoning and natural language capabilities. It is highly preferred by developers who want an assistant that can both write code and write excellent documentation.
- Codegemma (by Google): Google's lightweight open-source offering. It is optimized specifically for code completion and code infilling tasks, making it a very solid option for low-resource environments.
Step 1: Install Ollama on Your Machine
Ollama is an open-source tool that allows you to run open-source large language models locally.
- Go to the official download page at Ollama.com/download and download the installer for your Operating System (macOS, Windows, or Linux).
- Run the installer and follow the on-screen prompts.
- Once installed, open your system terminal (Command Prompt, PowerShell, or macOS Terminal) and type the following command to verify it is running:
ollama --version
Step 2: Download Your Coding Models
We need two types of models for a complete AI coding experience:
- Autocomplete Model: A very small, fast model designed to suggest code inline as you type.
- Chat Model: A slightly larger model designed for writing code, explaining logic, and debugging issues inside a chat panel.
For the best balance of speed and performance, we recommend using the Qwen2.5-Coder series:
1. Autocomplete Model (Qwen2.5-Coder 1.5B)
Run this command in your terminal to download the 1.5-billion parameter model (highly optimized for fast tab completion):
ollama pull qwen2.5-coder:1.5b
2. Chat Model (Qwen2.5-Coder 7B)
Run this command to download the 7-billion parameter model (an incredibly smart model for reasoning, chat, and large refactoring tasks):
ollama pull qwen2.5-coder:7b
Step 3: Install the Continue Extension in VS Code
Now, we need a user interface in VS Code to interact with Ollama. Continue is the leading open-source autopilot extension.
- Open VS Code.
- Click on the Extensions icon (or press
Ctrl+Shift+X/Cmd+Shift+X). - Search for "Continue" (published by Continue).
- Click Install.
Once installed, you will see a new **Continue logo** (a small play-button shape) in your VS Code sidebar.
Step 4: Configure Continue to Use Ollama
Let's connect Continue to the models running in your local Ollama instance.
- Click the **Continue icon** in your VS Code sidebar to open the panel.
- Click the **Gear/Settings icon** at the bottom right of the Continue panel. This will open a configuration file named
config.json. - Replace the entire contents of
config.jsonwith the following configuration:
{
"models": [
{
"title": "Qwen 2.5 Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
],
"tabAutocompleteModel": {
"title": "Qwen 2.5 Coder 1.5B",
"provider": "ollama",
"model": "qwen2.5-coder:1.5b"
},
"customCommands": [
{
"name": "test",
"prompt": "{{{ input }}}\\n\\nWrite a complete suite of unit tests for the code above.",
"description": "Write unit tests"
}
],
"contextProviders": [
{
"name": "code",
"params": {}
},
{
"name": "docs",
"params": {}
}
]
}
Ctrl+S or Cmd+S).Step 5: How to Use Your Local AI Assistant
Your local assistant is now ready! Here is how to use it in your daily workflow:
1. Inline Code Autocomplete
As you open a file and start typing code, the small Qwen2.5-Coder 1.5B model will automatically suggest code in gray text. Simply press the Tab key to accept the suggestion.
2. Ask Questions in Chat
Highlight any block of code, press Ctrl+L (or Cmd+L on Mac), and the code will be loaded into the Continue chat window. Type your question (e.g., "How does this function work?" or "Convert this code to TypeScript") and press Enter. The smart Qwen2.5-Coder 7B model will answer locally.
3. Inline Code Refactoring
Highlight a piece of code and press Ctrl+I (or Cmd+I on Mac) to open the inline edit prompt. Type instructions like "Add error handling" or "Optimize this loop", and the model will write code edits directly into your active file. You can review the diff and accept or reject the changes!
Still Having Trouble? Connect with an Expert!
If you are stuck on a step, facing installation errors with Ollama, or need help configuring your config.json in VS Code, don't worry! You can reach out directly to our technical support team. Head over to our Contact Us page or email us at support@alerts24x7.com, and an expert will help you get set up.
Conclusion
Setting up a local AI coding assistant is a game-changer for developers looking to save money, secure their source code, and work independently of cloud infrastructure. You now have a state-of-the-art coding copilot running entirely on your own CPU/GPU!
Have you set up your local AI setup yet? Which open-source model is your favorite? Let us know in the comments below!

