How “Search by Embedding” Works
1. Index Phase
-
Choose an embedding model, for example
text-embedding-3-small
. -
For each new question, compute its embedding and store both text and vector in Elasticsearch:
2. Query Phase
-
Embed the incoming question the same way:
-
Ask Elasticsearch for the most similar vector using cosine similarity:
Why
+ 1.0
?
Cosine similarity ranges from –1 to +1; adding 1.0 shifts it to [0, 2], keeping all scores positive.
3. Example (3-Dimensional)
ID | Text | Embedding |
---|---|---|
doc1 | “Apply for a mortgage in Canada?” | [0.20, 0.80, –0.10] |
doc2 | “Best restaurants in Vancouver” | [–0.50, 0.10, 0.70] |
-
New query embedding:
[0.19, 0.79, –0.05]
-
Cosine similarity with doc1 ≈ 0.99 → very high → reuse doc1
-
Similarity with doc2 is much lower → ignore doc2
4. Decision Threshold
-
If top score > 1.8 (i.e. cosine + 1.0)
→ consider it “already answered” and reuse that stored Q&A. -
Otherwise, send the question to GPT, then embed & index its answer for future reuse.
1.8 (shifted) – 1.0 = 0.8 (raw cosine)
80%
No comments:
Post a Comment