Embeddings for AI Models

Quote

An Embedding is a relatively low-dimensional vector into which you can translate high-dimensional vectors. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space.

— Google Developers, 2019

In other words, an embedding is a vector representing the semantic similarities among texts. It can be used for classification, Q&A, chatbots, and other purposes. RAG is among the most popular and practical use cases of embeddings.

Dense Embeddings vs Sparse Embeddings

Two types of embeddings are often used in NLP: dense and sparse embeddings.

Dense Embeddings

Dense embeddings use vectors that mostly contain non-zero values for their dimensions (densely populated arrays, e.g., [0.2, 0.3, 0.12, 0, 0.3, 0.9, …, 0.1]) to represent the semantic meaning of the text.

Dense embeddings are often used for semantic similarity searches.

Sparse Embeddings

Sparse embeddings use high-dimensional arrays that contain mostly zeros (e.g., [0, 0.3, 0, 0, 0, 0, …, 0]) to describe their similarities to other embeddings. They typically represent how many times each word or sub-word appears in the text instead of the semantics of the words. Therefore, sparse embeddings represent text syntax.

Sparse embeddings are used for keyword searches (token-based).

References

Google Developers. (2019). Embeddings | Machine Learning Crash Course | Google Developers. [online] Available at: https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture.
Google Cloud. (2024). Overview of Vertex AI Vector Search. [online] Available at: https://cloud.google.com/vertex-ai/docs/vector-search/overview [Accessed 2 Dec. 2024].

‌

Liwen's Notes

Recent Notes

Python Programming Language

LLM Explainability & Interpretability

Accountability & Motivation

Reasoning Models

Gen AI: The Future Is Agentic

Hallmarks of Practical Agentic Systems

Google Agentspace

Model Context Protocol (MCP)

Magentic One

Manual - Obsidian

Explorer

Embeddings for AI Models

Dense Embeddings vs Sparse Embeddings

Dense Embeddings

Sparse Embeddings

References

Graph View

Recent Notes

Python Programming Language

LLM Explainability & Interpretability

Accountability & Motivation

Reasoning Models

Gen AI: The Future Is Agentic

Table of Contents

Backlinks