Ollama (Local)

Use Ollama to run AI models locally — no cloud API key required. This is ideal for air-gapped environments, on-premise deployments, or teams with strict data residency requirements.

Prerequisites

Ollama installed and running on a machine accessible from QA Hub
At least one model pulled (e.g., llama3, mistral, qwen2.5-coder)

Install and start Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull a recommended model
ollama pull qwen2.5-coder:14b

# Start the server (runs on port 11434 by default)
ollama serve

Configure in QA Hub

Go to Settings → AI Model.
Select Ollama as the provider.
Set the Base URL to your Ollama server (e.g., http://localhost:11434 or http://192.168.1.50:11434).
Enter the model name exactly as it appears in ollama list (e.g., qwen2.5-coder:14b).
Click Test connection, then Save.

Browser mode

If QA Hub runs in a browser and your Ollama server is on localhost, browser CORS restrictions may block the request. Enable Browser mode in Settings to route Ollama calls through QA Hub's server process instead of directly from the browser.

Settings → AI Model → Use browser for Ollama requests — toggle off to route via server.

Recommended models

Model	VRAM	Quality	Speed
`qwen2.5-coder:14b`	8 GB	Excellent for structured output	Medium
`llama3.1:8b`	5 GB	Good general purpose	Fast
`mistral:7b`	4 GB	Lightweight, decent quality	Very fast

Thinking mode

Some models (e.g., qwen3) support extended thinking. Disable it in Settings if you experience slow responses:

Settings → AI Model → Disable thinking mode (Ollama)

Limitations

Generation quality depends on the model and hardware
No billing quota — but hardware constraints apply
Not recommended for production cloud deployments where latency matters

Prerequisites​

Install and start Ollama​

Configure in QA Hub​

Browser mode​

Recommended models​

Thinking mode​

Limitations​