Most search engines now bolt an AI answer box onto the top of the results page. That can be useful, but it also means your query and whatever the model does with it are happening on somebody else’s infrastructure.
This project builds the same basic workflow locally:
- SearXNG handles normal web search.
- Ollama runs a local model.
- A tiny Flask wrapper shows search results immediately.
- AI answers are optional. You check a box when you want them.
- Apache or other reverse proxy can publish the whole thing under
/search/on an existing site.
What the final setup looks like
The public page is:
https://YOUR_DOMAIN/search/
The local services are:
SearXNG: http://127.0.0.1:8080
Ollama: http://127.0.0.1:11434
AI search: http://0.0.0.0:5001
Of course, you can also use subdomains instead of directory based; I started using directories ages ago and have too much momentum to care about changing now.
Not every search needs a summary. Sometimes you just want results. The browser hits the AI search wrapper which loads normal SearXNG results and only calls Ollama when the user requests it.
Assumptions
This guide is based on my experience setting this up on my own rig. It should apply broadly to current Ubuntu and derivatives, possibly with some tinkering:
- Ubuntu 25.10 or close enough.
- Docker Engine and Compose v2.
- Apache 2.4 as the reverse proxy (used in this doc, can be easily adapted for other RPs)
- A machine that can run Ollama locally (my machine for reference: Ryzen9 3900X, 128GiB RAM, NVIDIA 4060Ti 16GiB VRAM).
- A reverse proxy path of
/search/.
The commands use /opt/ai-search. Change that path if you want, but don’t scatter the files around. Future-you will hate present-you.
Install Docker from the Docker repository
Ubuntu’s Docker packages and Docker’s official packages can conflict. Pick one lane. I use Docker’s repository here.
sudo apt update
sudo apt install -y ca-certificates curl gnupg apache2
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
Check it:
sudo docker version
docker compose version
Create the project directory
sudo mkdir -p /opt/ai-search/{searxng,ollama}
sudo chown -R "$USER:$USER" /opt/ai-search
cd /opt/ai-search
Docker Compose: SearXNG and Ollama
This setup uses Docker host networking. That is deliberate.
I use a userspace VPN and systemwide TailScale on the same rig (getting my money’s worth out of a gaming machine when I’m not gaming). Docker bridge networking and Docker’s embedded DNS can get weird and create frustrating, time wasting conflicts. Host networking removes that whole layer for this project. The tradeoff is that ports bind directly on the host, so do not run this blindly on a shared machine.
Create /opt/ai-search/docker-compose.yml:
services:
searxng:
image: docker.io/searxng/searxng:latest
container_name: searxng
restart: unless-stopped
network_mode: host
volumes:
- ./searxng:/etc/searxng
ollama:
image: docker.io/ollama/ollama:latest
container_name: ollama
restart: unless-stopped
network_mode: host
volumes:
- ./ollama:/root/.ollama
Start it:
cd /opt/ai-search
sudo docker compose up -d
SearXNG settings
Create /opt/ai-search/searxng/settings.yml:
use_default_settings: true
general:
debug: false
instance_name: "Search"
search:
safe_search: 0
autocomplete: duckduckgo
default_lang: en-US
formats:
- html
- json
server:
secret_key: "CHANGE_THIS_TO_A_LONG_RANDOM_VALUE"
base_url: http://127.0.0.1:8080/
limiter: false
image_proxy: false
method: GET
ui:
infinite_scroll: false
query_in_title: true
results_on_new_tab: true
plugins:
searx.plugins.hostnames.SXNGPlugin:
active: true
searx.plugins.tracker_url_remover.SXNGPlugin:
active: true
searx.plugins.calculator.SXNGPlugin:
active: true
engines:
- name: brave
disabled: true
- name: karmasearch
disabled: true
- name: karmasearch videos
disabled: true
- name: mojeek
disabled: true
- name: yahoo
disabled: true
Generate a real secret key:
python3 - <<'PY'
import secrets
print(secrets.token_hex(32))
PY
Replace CHANGE_THIS_TO_A_LONG_RANDOM_VALUE with the generated value.
Restart SearXNG:
cd /opt/ai-search
sudo docker compose restart searxng
curl -I http://127.0.0.1:8080
SearXNG must have JSON enabled because the AI wrapper reads search results through /search?q=...&format=json.
Pull a local model with Ollama
This uses Qwen 2.5 7B because it is small enough for normal local hardware and good enough for short search summaries.
f you use a
curl http://127.0.0.1:11434/api/pull \
-d '{"model":"qwen2.5:7b","stream":false}'
curl http://127.0.0.1:11434/api/tags
Test generation:
curl http://127.0.0.1:11434/api/generate \
-d '{"model":"qwen2.5:7b","prompt":"Reply with exactly: model working","stream":false}'
Install Python dependencies
The wrapper is a small Flask app. For the service, use Gunicorn instead of Flask’s development server.
sudo apt update
sudo apt install -y python3-flask python3-requests gunicorn
The AI search wrapper
Create /opt/ai-search/ai_search_app.py:
from flask import Flask, request
import html
import requests
app = Flask(__name__)
SEARX_UI = "http://127.0.0.1:8080"
SEARX_API = "http://127.0.0.1:8080/search"
OLLAMA_API = "http://127.0.0.1:11434/api/generate"
MODEL = "qwen2.5:7b"
STYLE = """
body {
margin: 0;
font-family: system-ui, sans-serif;
background: #111;
color: #eee;
}
.topbar {
padding: 12px;
background: #181818;
border-bottom: 1px solid #333;
}
form {
display: flex;
gap: 8px;
}
input[type="text"] {
flex: 1;
padding: 10px;
font-size: 16px;
}
button {
padding: 10px 16px;
font-size: 16px;
}
.ai-box {
padding: 14px;
margin: 12px;
border: 1px solid #444;
border-radius: 8px;
background: #1b1b1b;
}
.ai-loading {
opacity: 0.75;
}
#raw-results {
background: #fff;
color: #111;
padding: 12px;
}
#raw-results a {
color: #0645ad;
}
"""
SEARX_CSS = '<link rel="stylesheet" href="/search/raw/static/themes/simple/sxng-ltr.min.css" type="text/css">'
@app.route("/")
def index():
q = request.args.get("q", "").strip()
ai_enabled = request.args.get("ai") == "1"
q_html = html.escape(q)
checked = "checked" if ai_enabled else ""
if not q:
return f"""
<html>
<head><title>AI Search</title>{SEARX_CSS}<style>{STYLE}</style></head>
<body>
<div class="topbar">
<form action="/search/" method="get">
<input type="text" name="q" autofocus placeholder="Search..." />
<label style="display:flex;align-items:center;gap:6px">
<input type="checkbox" name="ai" value="1">
include AI
</label>
<button type="submit">Search</button>
</form>
</div>
</body>
</html>
"""
quoted_q = requests.utils.quote(q)
raw_url = "/search/raw/search?q=" + quoted_q
if ai_enabled:
ai_block = """
<div id="ai" class="ai-box ai-loading">
<b>AI Answer</b><br><br>
Working...
</div>
"""
ai_script = f"""
<script>
fetch("/search/answer?q=" + encodeURIComponent({q!r}))
.then(r => r.text())
.then(t => {{
document.getElementById("ai").classList.remove("ai-loading");
document.getElementById("ai").innerHTML = t;
}})
.catch(() => {{
document.getElementById("ai").innerHTML = "<b>AI Answer</b><br><br>Unavailable.";
}});
</script>
"""
else:
ai_block = f"""
<div class="ai-box">
<b>AI Answer</b><br><br>
<a style="color:#9cf" href="/search/?q={quoted_q}&ai=1">Generate AI summary</a>
</div>
"""
ai_script = ""
return f"""
<html>
<head><title>{q_html} - AI Search</title>{SEARX_CSS}<style>{STYLE}</style></head>
<body>
<div class="topbar">
<form action="/search/" method="get">
<input type="text" name="q" value="{q_html}" />
<label style="display:flex;align-items:center;gap:6px">
<input type="checkbox" name="ai" value="1" {checked}>
include AI
</label>
<button type="submit">Search</button>
<a style="color:#9cf;padding:10px" href="{raw_url}" target="_blank">Open raw SearXNG</a>
</form>
</div>
{ai_block}
<div id="raw-results">Loading search results...</div>
<script>
fetch("/search/raw-html?q=" + encodeURIComponent({q!r}))
.then(r => r.text())
.then(t => {{
document.getElementById("raw-results").innerHTML = t;
}})
.catch(() => {{
document.getElementById("raw-results").innerHTML = "Search results unavailable.";
}});
</script>
{ai_script}
</body>
</html>
"""
@app.route("/raw-html")
def raw_html():
q = request.args.get("q", "").strip()
if not q:
return ""
r = requests.get(SEARX_UI + "/search", params={"q": q}, timeout=45)
r.raise_for_status()
page = r.text
start = page.find('<main id="main_results"')
if start == -1:
return page
end = page.rfind("</main>")
if end == -1:
return page[start:]
return page[start:end + len("</main>")]
@app.route("/answer")
def answer():
q = request.args.get("q", "").strip()
if not q:
return ""
try:
sx = requests.get(SEARX_API, params={"q": q, "format": "json"}, timeout=30)
sx.raise_for_status()
data = sx.json()
lines = []
for r in data.get("results", [])[:6]:
title = r.get("title", "")
content = r.get("content", "")
url = r.get("url", "")
if title or content:
lines.append(f"{title}\n{content}\n{url}")
prompt = (
"User query:\n" + q +
"\n\nSearch results:\n" + "\n\n".join(lines) +
"\n\nWrite a concise answer in 3 bullet points. "
"Use only the provided search results. "
"If the results are weak or unrelated, say so."
)
ol = requests.post(
OLLAMA_API,
json={"model": MODEL, "prompt": prompt, "stream": False},
timeout=120,
)
ol.raise_for_status()
text = ol.json().get("response", "").strip()
safe = html.escape(text).replace("\n", "<br>")
return "<b>AI Answer</b><br><br>" + safe
except Exception as e:
return "<b>AI Answer</b><br><br>Unavailable: " + html.escape(str(e))
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5001)
Two small design choices are doing a lot of work here:
- The search results load first.
- AI only runs when
ai=1is present.
That keeps normal searches quick.
Run it as a service
Create /etc/systemd/system/ai-search.service:
[Unit]
Description=Local AI Search Wrapper
After=network.target docker.service
Wants=docker.service
[Service]
Type=simple
User=YOUR_LOCAL_USER
WorkingDirectory=/opt/ai-search
ExecStart=/usr/bin/gunicorn -w 2 -b 0.0.0.0:5001 ai_search_app:app
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
Set your local username:
sudo sed -i "s/User=YOUR_LOCAL_USER/User=$USER/" /etc/systemd/system/ai-search.service
sudo systemctl daemon-reload
sudo systemctl enable --now ai-search.service
Test locally:
curl -I http://127.0.0.1:5001
curl -L "http://127.0.0.1:5001/?q=linux%20firewall" | head
curl -L "http://127.0.0.1:5001/?q=linux%20firewall&ai=1" | head
Apache reverse proxy
This example publishes the wrapper at /search/ and raw SearXNG at /search/raw/.
Set the backend IP before using the config:
AI_SEARCH_HOST="192.168.1.50"
Use the actual LAN or VPN IP of the machine running the AI search service.
Put this inside your existing Apache TLS vhost:
# Local AI Search
# Public paths:
# /search/ -> AI wrapper
# /search/raw/ -> raw SearXNG assets and search page
RedirectMatch 308 ^/search$ /search/
ProxyPreserveHost On
ProxyRequests Off
# RAW SEARXNG MUST COME FIRST
ProxyPass /search/raw/ http://AI_SEARCH_HOST:8080/
ProxyPassReverse /search/raw/ http://AI_SEARCH_HOST:8080/
# AI WRAPPER SECOND
ProxyPass /search/ http://AI_SEARCH_HOST:5001/
ProxyPassReverse /search/ http://AI_SEARCH_HOST:5001/
<Location /search/raw/>
Require all granted
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set X-Forwarded-Host "YOUR_DOMAIN"
RequestHeader set X-Forwarded-Prefix "/search/raw"
RequestHeader set X-Scheme "https"
RequestHeader set X-Script-Name "/search/raw"
RequestHeader set X-Real-IP %{REMOTE_ADDR}s
RequestHeader append X-Forwarded-For %{REMOTE_ADDR}s
</Location>
<Location /search/>
Require all granted
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set X-Forwarded-Host "YOUR_DOMAIN"
RequestHeader set X-Forwarded-Prefix "/search"
RequestHeader set X-Scheme "https"
RequestHeader set X-Script-Name "/search"
RequestHeader set X-Real-IP %{REMOTE_ADDR}s
RequestHeader append X-Forwarded-For %{REMOTE_ADDR}s
SetEnvIf Request_URI "^/search/" dontlog
</Location>
Replace AI_SEARCH_HOST and YOUR_DOMAIN before reloading Apache.
The order matters. /search/raw/ must come before /search/, or Apache will send raw SearXNG requests to the wrapper.
Enable modules and reload:
sudo a2enmod proxy proxy_http headers rewrite ssl
sudo apache2ctl configtest
sudo systemctl reload apache2
Test the final page
Open:
https://YOUR_DOMAIN/search/
Search normally. Results should load without calling the model.
Then check include AI and search again. Results should still load first, and the AI answer should appear after a few seconds.
Notes from the build
Do not start by hacking SearXNG plugins. That sounds cleaner than it is. The current SearXNG plugin system expects proper importable Python modules and fully qualified plugin class names. A wrapper avoids tying your project to SearXNG internals.
Do not put 127.0.0.1 URLs in HTML that will be loaded by another machine. The user’s browser interprets 127.0.0.1 as the user’s own computer, not your server. Use public paths like /search/raw/search?... and let Apache proxy them.
Do not run AI on every query unless you really want the latency (or you’re keeping warm in winter).
If you use a VPN and Docker networking explodes, try host networking for this stack. It is blunt, but it avoids a lot of route and DNS drama.
Useful references
- SearXNG Search API: https://docs.searxng.org/dev/search_api.html
- SearXNG JSON formats need to be enabled in
settings.yml: https://docs.searxng.org/dev/search_api.html - Ollama generate API: https://ollama.readthedocs.io/en/api/
- Ollama pull API: https://docs.ollama.com/api/pull
- Docker host networking: https://docs.docker.com/engine/network/drivers/host/
- Apache reverse proxy guide: https://httpd.apache.org/docs/2.4/howto/reverse_proxy.html
- Apache mod_proxy docs: https://httpd.apache.org/docs/current/mod/mod_proxy.html