Search Document Pages
Advanced hybrid search through document pages with configurable search modes.
Search Modes (Alpha Parameter)
Pure Semantic Search (alpha=1.0, default)
- Uses embedding-based similarity matching
- Best for: conceptual queries, finding related topics, semantic understanding
- Example: “machine learning techniques” finds ML content regardless of exact wording
Pure Keyword Search (alpha=0.0)
- Uses PostgreSQL full-text search with ts_rank scoring
- Best for: exact terms, names, dates, codes, specific phrases
- Example: “Q3 2024 revenue” finds exact financial data mentions
Hybrid Search (0.0 < alpha < 1.0)
- Combines both methods with weighted scoring
- Best for: balanced results with both conceptual and exact matching
- Example: alpha=0.5 gives equal weight to semantic and keyword relevance
Score Normalization
All scores are returned in [0, 1] range using stable, absolute normalization:
-
Relevance Score:
alpha × semantic_score + (1-alpha) × keyword_score- This is your primary ranking score
- Higher = better match
- Stable across queries (doesn’t change when documents are added/removed)
-
Semantic Score (0-1): Cosine similarity between query and content embeddings
- Measures conceptual similarity
- 1.0 = identical semantic meaning, 0.0 = completely unrelated
-
Keyword Score (0-1): PostgreSQL ts_rank with built-in normalization
- Measures exact term matching and frequency
- Mapped to 0-1 range using rank/(rank+1) formula
- Higher scores = more keyword matches, 0.0 = no matching terms
Response Fields
Each result includes:
relevance: Combined score (0-1) - use this for rankingsemantic_score: Semantic component (0-1)keyword_score: Keyword component (0-1)content: Full page contentpage_number,document_id,id: Identifiers
Usage Guidelines
When to Use Each Mode
- Semantic (α=1.0): Exploratory research, concept discovery, broad topics
- Keyword (α=0.0): Specific data lookup, exact phrase matching, structured data
- Hybrid (α=0.5): General search, balanced relevance, unknown query types
- Custom (α=0.2-0.8): Fine-tuned weighting based on specific needs
Best Practices
- Start with hybrid search (α=0.5) for unknown query types
- Use keyword search (α=0.0) for exact terms, names, dates, codes
- Use semantic search (α=1.0) for research and concept exploration
- Combine with metadata filters for better performance and precision
- Set min_relevance (e.g., 0.3) to filter low-quality results
- Results are ordered by
relevancescore (highest first)
Input Parameters
- query (string): Text query (required for α < 1.0)
- embedding (array): Pre-computed 1536-dim vector (optional, only for α=1.0)
- alpha (float): Search mode weighting (0.0-1.0, default: 1.0)
- top_k (int): Max results to return (default: 10)
- min_relevance (float): Minimum relevance score 0-1 (default: 0.0)
- metadata_filter (object): MongoDB-like filter syntax
- created_at_filter (object): Date range filter
- updated_at_filter (object): Date range filter
- optimize_query (bool): Enable query optimization (default: false)
- optimize_metadata (bool): Enable metadata optimization (default: false)
Performance Features
- Single optimized database query for all search modes
- Workspace-based partitioning for scalability
- GIN indexes for fast full-text search
- HNSW indexes for fast vector similarity
- Database-level filtering and scoring
Example Use Cases
Finding exact product codes:
{"query": "SKU-12345", "alpha": 0.0, "top_k": 5}
Exploring concepts:
{"query": "customer retention strategies", "alpha": 1.0, "top_k": 10}
Balanced search:
{"query": "Q3 revenue growth", "alpha": 0.5, "min_relevance": 0.3}
Authorizations
Headers
Body
Request model for searching document pages with configurable search modes.
Supports three search modes controlled by the alpha parameter:
- alpha = 1.0: Pure semantic search using embeddings (default)
- alpha = 0.0: Pure keyword search using PostgreSQL full-text search
- 0.0 < alpha < 1.0: Hybrid search combining both methods with weighted scoring
The search uses proper score normalization to ensure meaningful alpha weighting, returning both combined relevance scores and individual component scores.
The search query (required if embedding not provided)
Pre-computed embedding vector for search (required if query not provided)
Optional metadata filters using MongoDB-like query syntax
Filter by created_at database field. Supports operators: $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin. Use ISO date format (e.g., '2024-01-01T00:00:00')
Filter by updated_at database field. Supports operators: $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin. Use ISO date format (e.g., '2024-01-01T00:00:00')
Number of results to return
1 <= x <= 1000Minimum relevance score (0-1) to filter results. Only results above this threshold will be returned.
0 <= x <= 1Whether to optimize metadata filter (only works with query, not embedding)
Whether to optimize search query (only works with query, not embedding)
Search weighting: 0.0=pure keyword, 1.0=pure semantic, 0.5=balanced hybrid
0 <= x <= 1Response
Successful Response