Stay Hungry: ES search with embedding

Friday, 4 July 2025

ES search with embedding

How “Search by Embedding” Works

1. Index Phase

Choose an embedding model, for example text-embedding-3-small.

For each new question, compute its embedding and store both text and vector in Elasticsearch:

go
emb := EmbedText("How do I apply for a mortgage in Canada?")
// emb is []float32, e.g. [0.20, 0.80, -0.10, …]

// Then index into ES:
PUT /qa_memory/_doc/<uuid>
{
  "text":      "How do I apply for a mortgage in Canada?",
  "embedding": [0.20, 0.80, -0.10, …]
}

2. Query Phase

Embed the incoming question the same way:

go
qEmb := EmbedText("What's the process to get a home loan in Canada?")
// qEmb might be [0.19, 0.79, -0.05, …]

Ask Elasticsearch for the most similar vector using cosine similarity:

json
GET /qa_memory/_search
{
  "size": 1,
  "query": {
    "script_score": {
      "query": { "match_all": {} },
      "script": {
        "source": "cosineSimilarity(params.vec, 'embedding') + 1.0",
        "params": { "vec": qEmb }
      }
    }
  }
}

Why + 1.0?
Cosine similarity ranges from –1 to +1; adding 1.0 shifts it to [0, 2], keeping all scores positive.

3. Example (3-Dimensional)

ID	Text	Embedding
doc1	“Apply for a mortgage in Canada?”	[0.20, 0.80, –0.10]
doc2	“Best restaurants in Vancouver”	[–0.50, 0.10, 0.70]

New query embedding: [0.19, 0.79, –0.05]
Cosine similarity with doc1 ≈ 0.99 → very high → reuse doc1
Similarity with doc2 is much lower → ignore doc2

4. Decision Threshold

If top score > 1.8 (i.e. cosine + 1.0)
→ consider it “already answered” and reuse that stored Q&A.
Otherwise, send the question to GPT, then embed & index its answer for future reuse.

1.8 (shifted) – 1.0 = 0.8 (raw cosine)

80%

Stay Hungry