Redis 8.4 Hybrid Search — Combining Full-Text and Vector Search for AI

Posted on: 4/20/2026 9:08:53 AM

1. The Problem — Why a Single Search Mode Isn't Enough

In the era of AI and LLMs, accurate information retrieval is the foundation of every RAG (Retrieval-Augmented Generation) application. However, the two popular search methods each have their own weaknesses:

  • Full-text search (BM25): Excellent at exact keyword matching but "blind" to semantics. Searching "car" won't return results containing "motor vehicle."
  • Vector search: Deeply understands semantics via embeddings, but tends to "hallucinate" when you need to match proper names, product codes, or specific technical terms exactly.
49% Reduction in context failure rate with hybrid vs single-mode search
3.5x Retrieval recall improvement in Blended RAG systems
15% End-to-end accuracy gain on complex reasoning tasks
87% Faster command execution vs Redis 7.x

Research from Anthropic (2025) and Apple ML Research (2024) has proven that combining both methods is the optimal answer. Redis 8.4 realizes this with the FT.HYBRID command — performing score fusion inside the engine, with no external post-processing needed.

2. Redis 8.x — The Evolution Journey

Before diving into FT.HYBRID, let's look at the bigger picture of Redis 8.x:

Redis 8.0 (2025)
"One Redis" — merging Redis Stack + Community Edition. Vector Sets (beta), hash field expiration (HGETEX/HSETEX/HGETDEL), 30+ performance optimizations, BM25 replacing TF-IDF as the default scorer.
Redis 8.4 (Q1/2026)
FT.HYBRID — native hybrid search command. Score fusion (RRF + Linear), atomic cluster slot migration for zero-downtime scaling, SIMD-driven optimizations.
Vector Sets (ongoing)
A new data type for vector similarity — lighter than the Redis Query Engine, suitable for simple use cases. Supports 8-bit quantization, dimensionality reduction, and JSON attribute filtering.

3. Vector Sets — A Brand-New Data Type

Vector Sets are an invention of Redis creator Salvatore Sanfilippo himself, inspired by Sorted Sets but storing string + multi-dimensional vector embedding instead of string + score.

3.1 Basic Commands

# Add a vector to a set
VADD product-vectors "laptop-001" VALUES 768 0.12 -0.34 0.56 ...
    SETATTR '{"category":"electronics","price":1299}'

# Find the 5 most similar vectors
VSIM product-vectors VALUES 768 0.11 -0.33 0.55 ... COUNT 5

# Find with attribute filters
VSIM product-vectors VALUES 768 0.11 -0.33 0.55 ... COUNT 5
    FILTER '.category == "electronics" && .price < 2000'

# Update attributes
VSETATTR product-vectors "laptop-001"
    '{"category":"electronics","price":1199,"on_sale":true}'

3.2 Quantization and Dimensionality Reduction

Vector Sets support 3 quantization modes to balance memory against accuracy:

ModeSize/vectorAccuracyUse case
Full precision (default)100%HighestMaximum recall required
8-bit quantization~25%Nearly losslessMainstream production
Binary quantization~3%Slight dropMassive datasets, coarse search

Additionally, dimensionality reduction via random projection lets you shrink vector dimensions while preserving similarity relationships — particularly useful when embedding models output 1536+ dimensional vectors but your application only needs moderate discriminative power.

3.3 Vector Sets vs Redis Query Engine

When to use what?

Vector Sets: When you only need pure similarity search, a minimal API, single-node. Think "Sorted Sets for vectors."
Redis Query Engine (FT.SEARCH/FT.HYBRID): When you need to combine full-text + vector + geo + numeric filtering, horizontal scaling, and enterprise features.

FeatureVector SetsRedis Query Engine
Vector searchNative, simpleAdvanced, scalable
Full-text searchNoBM25, stemming, fuzzy
Hybrid queriesNoFT.HYBRID
Geo/Numeric filterJSON attributesNative indexing
ScalabilitySingle-nodeCluster, Active-Active
ComplexityVery lowMedium

4. FT.HYBRID — The Hybrid Search Command in Redis 8.4

FT.HYBRID is the new star of Redis 8.4. Rather than running 2 separate queries and merging results at the application layer, this command performs score fusion inside the Redis engine with O(N+M) complexity.

graph TD
    A["Input Query"] --> B["FT.HYBRID Engine"]
    B --> C["SEARCH Component
BM25 Full-Text"] B --> D["VSIM Component
Vector Similarity"] C --> E["Score Fusion
RRF / Linear"] D --> E E --> F["Ranked Results"] F --> G["LOAD / APPLY / SORTBY
Post-processing"] G --> H["Final Results"] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#2c3e50,stroke:#fff,color:#fff style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style E fill:#e94560,stroke:#fff,color:#fff style F fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style G fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style H fill:#4CAF50,stroke:#fff,color:#fff
FT.HYBRID processing flow — score fusion happens in the engine, no external post-processing needed

4.1 Full Syntax

FT.HYBRID index
  SEARCH query
    [SCORER scorer]
    [YIELD_SCORE_AS name]
  VSIM vector_field $vector_param
    [KNN count K k [EF_RUNTIME ef]]
    [RANGE count RADIUS radius [EPSILON epsilon]]
    [YIELD_SCORE_AS name]
    [FILTER filter]
  [COMBINE RRF count [CONSTANT c] [WINDOW w] [YIELD_SCORE_AS name]]
  [COMBINE LINEAR count [ALPHA a] [BETA b] [WINDOW w] [YIELD_SCORE_AS name]]
  [LIMIT offset num]
  [SORTBY count field [ASC | DESC]]
  [LOAD count field ...]
  [GROUPBY ... REDUCE ...]
  [APPLY expression AS name]
  PARAMS nargs vector_param vector_blob [name value ...]
  [TIMEOUT timeout]

4.2 Main Components

Each FT.HYBRID command has 3 required parts:

graph LR
    A["SEARCH
Full-text query
BM25 scorer"] --> D["COMBINE
Score Fusion"] B["VSIM
Vector similarity
KNN or RANGE"] --> D D --> E["Ranked
Results"] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#2c3e50,stroke:#fff,color:#fff style D fill:#4CAF50,stroke:#fff,color:#fff style E fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
The three required components of FT.HYBRID

5. Score Fusion — RRF vs Linear Combination

This is the most important part of hybrid search: how do you combine scores from two different search systems into a single unified ranking?

5.1 Reciprocal Rank Fusion (RRF)

RRF is the default method, scoring based on rank position rather than absolute score values:

# RRF formula for each document d:
RRF_score(d) = 1/(rank_text(d) + k) + 1/(rank_vector(d) + k)

# Where k = CONSTANT (default 60)
# rank_text(d) = position of d in the full-text results
# rank_vector(d) = position of d in the vector results

Advantages of RRF

No score normalization needed: BM25 scores and cosine similarity live on completely different scales. RRF ignores absolute values and only looks at ranks — solving this problem entirely.
Robust to outliers: A document with an abnormally high BM25 score doesn't "dominate" the results.
Parameter k = 60: The default value based on Cormack et al. (2009), which favors balance between the two sources.

# Example: RRF with window = 40 and constant = 80
FT.HYBRID products-idx
  SEARCH "@category:electronics laptop"
  SCORER 4 BM25 1.5 0.8
  YIELD_SCORE_AS text_score
  VSIM @embedding $query_vec
  KNN 4 K 20 EF_RUNTIME 200
  YIELD_SCORE_AS vector_score
  COMBINE RRF 4 WINDOW 40 CONSTANT 80
  YIELD_SCORE_AS hybrid_score
  LOAD 3 @title @price @category
  LIMIT 0 10
  PARAMS 2 query_vec "\x00\x01..."

5.2 Linear Combination

When you want to control the weights between full-text and vector, Linear combination lets you set specific ratios:

# Formula:
Linear_score(d) = alpha * normalized_text_score(d) + beta * normalized_vector_score(d)

# Example: prioritize semantic search (70% vector, 30% text)
FT.HYBRID docs-idx
  SEARCH "machine learning optimization"
  VSIM @content_vector $query_vec
  KNN 2 K 15
  COMBINE LINEAR 4 ALPHA 0.3 BETA 0.7
  LOAD *
  PARAMS 2 query_vec "\x00\x01..."

5.3 Which Method to Choose?

CriterionRRFLinear
Needs parameter tuning?Minimal (constant, window)A lot (alpha, beta need experimentation)
Robust to new data?High — rank-basedMedium — depends on score distribution
Weight control?Not directYes — explicit alpha/beta
When to useDon't know the optimal ratio upfrontAlready A/B tested, know the weights
RecommendationStart with RRFOptimize once you have metrics

6. Building a RAG Pipeline with FT.HYBRID

Hands-on: integrating Redis 8.4 hybrid search into a complete RAG pipeline.

sequenceDiagram
    participant U as User
    participant App as Application
    participant Emb as Embedding Model
    participant R as Redis 8.4
    participant LLM as LLM (Claude/GPT)

    U->>App: Question: "How to optimize PostgreSQL queries for large tables?"
    App->>Emb: Generate embedding
    Emb-->>App: query_vector [768d]
    App->>R: FT.HYBRID docs-idx
SEARCH "optimize PostgreSQL queries"
VSIM @embedding $vec
COMBINE RRF R-->>App: Top 5 documents (ranked) App->>LLM: Prompt + context from 5 docs LLM-->>U: Answer with citations
RAG flow with FT.HYBRID — hybrid retrieval provides higher-quality context

6.1 Step 1 — Create the Index

# Create an index supporting both full-text and vector search
FT.CREATE docs-idx ON HASH PREFIX 1 doc:
  SCHEMA
    title TEXT WEIGHT 2.0
    content TEXT
    category TAG
    created_at NUMERIC SORTABLE
    embedding VECTOR HNSW 6
      TYPE FLOAT32
      DIM 768
      DISTANCE_METRIC COSINE

6.2 Step 2 — Index Documents

import redis
from openai import OpenAI
import numpy as np

client = OpenAI()
r = redis.Redis(host='localhost', port=6379)

def index_document(doc_id, title, content, category):
    # Generate embedding
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=f"{title} {content}"
    )
    embedding = np.array(response.data[0].embedding, dtype=np.float32)

    # Store in a Redis Hash
    r.hset(f"doc:{doc_id}", mapping={
        "title": title,
        "content": content,
        "category": category,
        "created_at": int(time.time()),
        "embedding": embedding.tobytes()
    })
def hybrid_search(query, top_k=5):
    # Generate query embedding
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    query_vec = np.array(
        response.data[0].embedding, dtype=np.float32
    ).tobytes()

    # FT.HYBRID — combine BM25 + vector similarity
    result = r.execute_command(
        "FT.HYBRID", "docs-idx",
        "SEARCH", query,
        "SCORER", "4", "BM25", "1.2", "0.75",
        "YIELD_SCORE_AS", "text_score",
        "VSIM", "@embedding", "$query_vec",
        "KNN", "2", "K", str(top_k * 3),
        "YIELD_SCORE_AS", "vector_score",
        "COMBINE", "RRF", "4",
        "WINDOW", "30", "CONSTANT", "60",
        "YIELD_SCORE_AS", "hybrid_score",
        "LOAD", "4", "@title", "@content",
        "@text_score", "@vector_score",
        "LIMIT", "0", str(top_k),
        "PARAMS", "2", "query_vec", query_vec
    )
    return parse_results(result)

6.4 Step 4 — Combine with the LLM

import anthropic

client_ai = anthropic.Anthropic()

def rag_answer(question):
    # Hybrid retrieval
    docs = hybrid_search(question, top_k=5)

    # Build context
    context = "\n---\n".join(
        f"[{d['title']}] {d['content']}" for d in docs
    )

    # Call Claude with the context
    response = client_ai.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""Based on the following documents, answer the question.

Documents:
{context}

Question: {question}"""
        }]
    )
    return response.content[0].text

7. Advanced FT.HYBRID Use Cases

7.1 Boosting Recent Documents (Recency Boost)

# Combine hybrid search + time-based sorting
FT.HYBRID docs-idx
  SEARCH "kubernetes deployment strategy"
  VSIM @embedding $query_vec
  KNN 2 K 20
  COMBINE RRF 2 WINDOW 30
  SORTBY 2 created_at DESC
  LOAD 3 @title @created_at @category
  LIMIT 0 10
  PARAMS 2 query_vec "\x00\x01..."
# Hybrid search + geo filter for location-based apps
FT.HYBRID places-idx
  SEARCH "quiet cafe with wifi"
  VSIM @description_vec $query_vec
  KNN 2 K 15
  FILTER "@location:[106.6 10.7 5 km]"
  COMBINE LINEAR 4 ALPHA 0.4 BETA 0.6
  LOAD 4 @name @address @rating @distance
  PARAMS 2 query_vec "\x00\x01..."

7.3 Fuzzy Matching Combined with Semantic

# Find both typos and semantically similar terms
FT.HYBRID products-idx
  SEARCH "%%samsuung%% %%gallaxy%% ~phone ~smartphone"
  VSIM @features_vec $query_vec
  KNN 2 K 10
  COMBINE RRF 2 WINDOW 20
  LOAD *
  PARAMS 2 query_vec "\x00\x01..."

# %%word%% = fuzzy matching (Levenshtein distance)
# ~word = optional term (boost if matched, not excluded otherwise)

8. Redis 8.4 vs Dedicated Vector Databases

CriterionRedis 8.4PineconeWeaviateMilvus
Hybrid searchNative (FT.HYBRID)Sparse + Dense vectorsBM25 + VectorSparse + Dense
Score fusionRRF + Linear (in-engine)Client-sideIn-engineRRF (in-engine)
LatencySub-ms (in-memory)~10-50ms~10-100ms~10-50ms
Integrated cachingNative (it's a cache)NoNoNo
Full-text qualityBM25, stemming, fuzzyBasicBM25Basic
Data structuresHash, JSON, Stream, TS...Vectors onlyObjects + vectorsCollections
OperationalFamiliar, big ecosystemManaged onlySelf-host / CloudSelf-host / Cloud
PricingFree + EnterpriseFreemiumFree + CloudFree + Cloud

When is Redis the best choice?

You already use Redis: No new infrastructure needed. Vector search + caching + sessions + pub/sub in one system.
Ultra-low latency: In-memory engine for sub-millisecond responses — critical for real-time RAG, autocomplete, and recommendations.
Complex hybrid queries: FT.HYBRID with geo, time, and numeric filtering in a single command.

When should you go with a dedicated vector DB?

Billion-scale vectors: If the dataset exceeds RAM capacity, disk-based systems like Milvus are a better fit.
Vector search only: If you don't need full-text, caching, or other data structures, a dedicated vector DB is simpler.
Complex multi-tenancy: Pinecone has better namespace isolation for multi-tenant SaaS.

9. FT.HYBRID Performance Tuning

9.1 Tuning Parameters

ParameterDefaultRecommendedImpact
EF_RUNTIME1050-200Higher = better recall, higher latency
K (KNN)102-3× desired top_kCandidate pool for score fusion
WINDOW2020-50Candidates considered during fusion
CONSTANT (RRF)6060 (leave as-is)Balance between top and lower ranks
ALPHA/BETATry 0.3/0.7 → 0.5/0.5 → 0.7/0.3Weights between text and vector

9.2 Pre-filtering Strategies

# FILTER inside VSIM narrows the vector space BEFORE search
# → significantly reduces compute time
FT.HYBRID products-idx
  SEARCH "gaming laptop RTX"
  VSIM @features_vec $query_vec
  KNN 2 K 20
  FILTER "@category:{electronics} @price:[500 3000]"
  COMBINE RRF 2 WINDOW 30
  LOAD 3 @title @price @rating
  LIMIT 0 10
  PARAMS 2 query_vec "\x00\x01..."

# Two pre-filter policies:
# ADHOC_BF — brute-force scan, good for selective filters (<10% data)
# BATCHES — batch processing, good for less-selective filters

10. Conclusion

Redis 8.4 with FT.HYBRID marks an important shift: from a pure cache / data-structure tool to a comprehensive AI-native platform. Integrating score fusion directly in the engine not only simplifies architecture but delivers superior performance — sub-millisecond latency for hybrid queries that previously required orchestrating multiple services.

For systems already using Redis for caching or sessions, extending into hybrid search for a RAG pipeline is nearly "free" — no new infrastructure, no complex migration. That's the power of the "One Redis" philosophy the development team has pursued since version 8.0.

Get Started Now

1. Upgrade to Redis 8.4
2. Create an index with a VECTOR field (HNSW or FLAT)
3. Start with FT.HYBRID ... COMBINE RRF — default parameters are already optimal for most use cases
4. A/B test RRF vs Linear once you have enough metrics
5. Tune EF_RUNTIME and WINDOW based on your latency budget

References: