Friday, 4 July 2025

ES search with embedding

 

How “Search by Embedding” Works

1. Index Phase

  1. Choose an embedding model, for example text-embedding-3-small.

  2. For each new question, compute its embedding and store both text and vector in Elasticsearch:

    go
    emb := EmbedText("How do I apply for a mortgage in Canada?") // emb is []float32, e.g. [0.20, 0.80, -0.10, …] // Then index into ES: PUT /qa_memory/_doc/<uuid> { "text": "How do I apply for a mortgage in Canada?", "embedding": [0.20, 0.80, -0.10, …] }

2. Query Phase

  1. Embed the incoming question the same way:

    go
    qEmb := EmbedText("What's the process to get a home loan in Canada?") // qEmb might be [0.19, 0.79, -0.05, …]
  2. Ask Elasticsearch for the most similar vector using cosine similarity:

    json
    GET /qa_memory/_search { "size": 1, "query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "cosineSimilarity(params.vec, 'embedding') + 1.0", "params": { "vec": qEmb } } } } }

    Why + 1.0?
    Cosine similarity ranges from –1 to +1; adding 1.0 shifts it to [0, 2], keeping all scores positive.

3. Example (3-Dimensional)

IDTextEmbedding
doc1“Apply for a mortgage in Canada?”[0.20, 0.80, –0.10]
doc2“Best restaurants in Vancouver”[–0.50, 0.10, 0.70]
  • New query embedding: [0.19, 0.79, –0.05]

  • Cosine similarity with doc1 ≈ 0.99 → very high → reuse doc1

  • Similarity with doc2 is much lower → ignore doc2

4. Decision Threshold

  • If top score > 1.8 (i.e. cosine + 1.0)
    → consider it “already answered” and reuse that stored Q&A.

  • Otherwise, send the question to GPT, then embed & index its answer for future reuse.



1.8 (shifted) – 1.0 = 0.8 (raw cosine)
80%

No comments:

Post a Comment