Skip to main content

Ollama (Local)

Use Ollama to run AI models locally — no cloud API key required. This is ideal for air-gapped environments, on-premise deployments, or teams with strict data residency requirements.

Prerequisites

  • Ollama installed and running on a machine accessible from QA Hub
  • At least one model pulled (e.g., llama3, mistral, qwen2.5-coder)

Install and start Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull a recommended model
ollama pull qwen2.5-coder:14b

# Start the server (runs on port 11434 by default)
ollama serve

Configure in QA Hub

  1. Go to Settings → AI Model.
  2. Select Ollama as the provider.
  3. Set the Base URL to your Ollama server (e.g., http://localhost:11434 or http://192.168.1.50:11434).
  4. Enter the model name exactly as it appears in ollama list (e.g., qwen2.5-coder:14b).
  5. Click Test connection, then Save.

Browser mode

If QA Hub runs in a browser and your Ollama server is on localhost, browser CORS restrictions may block the request. Enable Browser mode in Settings to route Ollama calls through QA Hub's server process instead of directly from the browser.

Settings → AI Model → Use browser for Ollama requests — toggle off to route via server.

ModelVRAMQualitySpeed
qwen2.5-coder:14b8 GBExcellent for structured outputMedium
llama3.1:8b5 GBGood general purposeFast
mistral:7b4 GBLightweight, decent qualityVery fast

Thinking mode

Some models (e.g., qwen3) support extended thinking. Disable it in Settings if you experience slow responses:

Settings → AI Model → Disable thinking mode (Ollama)

Limitations

  • Generation quality depends on the model and hardware
  • No billing quota — but hardware constraints apply
  • Not recommended for production cloud deployments where latency matters