Problem

Traditional full-text search relies entirely on exact keyword matches. If a user searches for “authentication” but the page only uses terms like “OAuth2” or “login”, a standard keyword search engine will fail to find it. This forces writers to perform unnatural keyword-stuffing and leaves readers frustrated when they cannot find what they need.

Why it matters

Modern developers expect natural language interfaces that understand intent, synonyms, and context. Implementing server-side semantic search typically requires setting up complex infrastructure like vector databases (e.g., Pinecone or pgvector), hosting models, and building APIs, which increases maintenance overhead, monthly hosting costs, and introduces security and privacy concerns.

Approach

Use docmd’s native Semantic Search Plugin. It operates entirely client-side using a highly optimized browser runtime. It generates structured vector chunk indices at build time using local Hugging Face model pipelines, then re-ranks matches using hybrid BM25 keyword frequency and vector cosine similarity. No data is ever sent to third-party APIs.

Implementation

1. Enable Semantic Search in Configuration

Add the search plugin options within your docmd.config.json. Configure semantic to true and enable showConfidence to visually identify semantic matching in search results:

{
  "plugins": {
    "search": {
      "semantic": true,
      "showConfidence": true
    }
  }
}

2. Choose the Right Embedding Model

docmd supports both lightweight English-only models and comprehensive multilingual models. Update your model profile using docmd-search --settings or define it explicitly:

Model ID Dimensions Size Languages Best For
Xenova/all-MiniLM-L6-v2 384 ~90 MB English only Fast, high-accuracy English docs
Xenova/LaBSE 768 ~470 MB 100+ languages Absolute best multilingual quality
Xenova/paraphrase-multilingual-MiniLM-L12-v2 384 ~220 MB 50+ languages Excellent multi-language balance

3. Pre-Building Index in CI/CD

To prevent overhead in the browser during first-load, pre-generate the search chunks during your build or CI/CD pipeline using the CLI:

# Build the semantic search index
npx docmd-search --build

# Run docmd build afterwards
npx @docmd/core build

This generates highly optimized static Vecto-JSON chunks in .docmd-search/. When a user performs a search, the client progressively loads these chunks in the background, keeping the UI instantly interactive.

Trade-offs

Initial Asset Size

Client-side vector embeddings require the browser to download a WebAssembly runtime and the pre-trained ONNX model file on the first search. Although these assets are persistently cached in the browser’s Cache Storage, the first-load search latency may be slightly higher on slower connections (~1-2 seconds delay).

Search Quality vs Payload Size

Choosing larger models like LaBSE offers exceptional multilingual quality but results in larger downloads. For standard international documentation websites, the paraphrase-multilingual-MiniLM-L12-v2 model is the recommended sweet spot between accuracy and network payload.