Ollama (Local)
Use Ollama to run AI models locally — no cloud API key required. This is ideal for air-gapped environments, on-premise deployments, or teams with strict data residency requirements.
Prerequisites
- Ollama installed and running on a machine accessible from QA Hub
- At least one model pulled (e.g.,
llama3,mistral,qwen2.5-coder)
Install and start Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Pull a recommended model
ollama pull qwen2.5-coder:14b
# Start the server (runs on port 11434 by default)
ollama serve
Configure in QA Hub
- Go to Settings → AI Model.
- Select Ollama as the provider.
- Set the Base URL to your Ollama server (e.g.,
http://localhost:11434orhttp://192.168.1.50:11434). - Enter the model name exactly as it appears in
ollama list(e.g.,qwen2.5-coder:14b). - Click Test connection, then Save.
Browser mode
If QA Hub runs in a browser and your Ollama server is on localhost, browser CORS restrictions may block the request. Enable Browser mode in Settings to route Ollama calls through QA Hub's server process instead of directly from the browser.
Settings → AI Model → Use browser for Ollama requests — toggle off to route via server.
Recommended models
| Model | VRAM | Quality | Speed |
|---|---|---|---|
qwen2.5-coder:14b | 8 GB | Excellent for structured output | Medium |
llama3.1:8b | 5 GB | Good general purpose | Fast |
mistral:7b | 4 GB | Lightweight, decent quality | Very fast |
Thinking mode
Some models (e.g., qwen3) support extended thinking. Disable it in Settings if you experience slow responses:
Settings → AI Model → Disable thinking mode (Ollama)
Limitations
- Generation quality depends on the model and hardware
- No billing quota — but hardware constraints apply
- Not recommended for production cloud deployments where latency matters