Programmatic API : docmd search

Import docmd-search as a library to build custom indexing pipelines, integrate with CI/CD workflows, or create your own search interfaces.

npm install docmd-search

Core pipeline

indexDirectory

The primary function for indexing a directory. Returns a SearchIndex object containing all chunks and vectors.

import { indexDirectory } from 'docmd-search';

const index = await indexDirectory(
  {
    rootDir: './docs',
    outDir: '.docmd-search',
    model: 'Xenova/all-MiniLM-L6-v2',
    include: ['**/*.md'],
    exclude: ['**/drafts/**'],
    chunkSize: 256,
    chunkOverlap: 32,
  },
  (progress) => {
    console.log(`${progress.phase}: ${progress.current}/${progress.total}`);
  }
);

console.log(`Indexed ${index.chunks.length} chunks`);

Options:

Parameter	Type	Description
`rootDir`	`string`	Directory to index
`outDir`	`string`	Output directory for the index
`model`	`string`	Embedding model identifier
`include`	`string[]`	Glob patterns for files to index
`exclude`	`string[]`	Glob patterns to skip
`chunkSize`	`number`	Max tokens per chunk
`chunkOverlap`	`number`	Token overlap between chunks
`config`	`SearchConfig`	Full config object (overrides individual options)

Progress callback phases:

Phase	Description
`crawling`	Discovering files
`chunking`	Splitting content into chunks
`downloading-model`	Downloading the ONNX model (first run only)
`embedding`	Generating vector embeddings
`saving`	Writing batches to disk
`complete`	Indexing finished

Index I/O

Loading indexes

import { loadAllBatches, loadBatch, loadManifest, hasSearchableIndex } from 'docmd-search';

// Check if a valid index exists
if (hasSearchableIndex('.docmd-search')) {
  // Load everything
  const index = await loadAllBatches('.docmd-search');

  // Or load individual batches
  const manifest = await loadManifest('.docmd-search');
  const batch0 = await loadBatch('.docmd-search', 0);
}

Creating indexes manually

import { createSearchIndex, saveBatch, saveManifest, createEmptyManifest } from 'docmd-search';

// Create an index from existing data
const index = createSearchIndex(chunks, vectors, {
  model: 'Xenova/all-MiniLM-L6-v2',
  dimensions: 384,
});

// Or build batches manually
const manifest = createEmptyManifest('Xenova/all-MiniLM-L6-v2', 384);
await saveBatch('.docmd-search', 0, chunks, vectors, 384);
await saveManifest('.docmd-search', manifest);

Compression

import { compressVectors, decompressVectors, getCompressionType } from 'docmd-search';

// Determine compression based on chunk count
const type = getCompressionType(chunkCount);
// Returns: 'none' | 'ternary' | 'pq'

// Compress vectors
const compressed = compressVectors(vectors, type);

// Decompress
const restored = decompressVectors(compressed, dimensions, type);

Configuration

Resolve config

import { resolveConfig, loadGlobalConfig, loadProjectConfig } from 'docmd-search';

// Full resolution: defaults → global → project → overrides
const config = await resolveConfig('./my-project', {
  chunkSize: 512,
});

// Or load individual layers
const global = await loadGlobalConfig();
const project = await loadProjectConfig('./my-project');

Model profiles

import { AVAILABLE_MODELS, getModelProfile, getDefaultModel } from 'docmd-search';

// List all available models
for (const model of AVAILABLE_MODELS) {
  console.log(`${model.name} (${model.dimensions}d, ${model.size})`);
}

// Get a specific model's profile
const profile = getModelProfile('Xenova/bge-small-en-v1.5');

// Get the default recommended model
const defaultModel = getDefaultModel();

Embedding model

Create and use the model manager

import { createModelManager, checkPeerDeps, formatMissingDepsMessage } from 'docmd-search';

// Check if peer dependencies are installed
const missing = checkPeerDeps();
if (missing) {
  console.error(formatMissingDepsMessage(missing.missing));
  process.exit(1);
}

// Create a model manager
const model = await createModelManager(
  'Xenova/all-MiniLM-L6-v2',
  (progress) => {
    console.log(`Model: ${progress.status} ${progress.progress}%`);
  }
);

// Generate embeddings
const vectors = await model.embed(['Hello world', 'Another text']);

Peer dependencies required

The model manager requires @huggingface/transformers and onnxruntime-node to be installed. These are optional peer dependencies — the rest of the API works without them.

Types

All types are exported from the main package:

import type {
  // Core
  SearchIndex,
  SearchResult,
  Chunk,
  VectorEntry,
  IndexOptions,

  // Config
  SearchConfig,
  ModelProfile,
  GlobalConfig,

  // Index I/O
  IndexManifest,
  BatchMeta,
  NavNode,
  CompressionType,
  FileRecord,

  // Pipeline
  IndexDirectoryOptions,
  IndexProgress,
  IndexPhase,

  // Model
  ModelManager,
  ModelProgress,
} from 'docmd-search';

Last updated:

Core pipeline

indexDirectory

Index I/O

Loading indexes

Creating indexes manually

Compression

Configuration

Resolve config

Model profiles

Embedding model

Create and use the model manager

Types

onThisPage