Skip to main content

FastEmbed by Qdrant

FastEmbed from Qdrant is a lightweight, fast, Python library built for embedding generation.

  • Quantized model weights
  • ONNX Runtime, no PyTorch dependency
  • CPU-first design
  • Data-parallelism for encoding of large datasets.

Dependenciesโ€‹

To use FastEmbed with LangChain, install the fastembed Python package.

%pip install --upgrade --quiet  fastembed

Importsโ€‹

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings

API Reference:

Instantiating FastEmbedโ€‹

Parametersโ€‹

  • model_name: str (default: "BAAI/bge-small-en-v1.5")

    Name of the FastEmbedding model to use. You can find the list of supported models here.

  • max_length: int (default: 512)

    The maximum number of tokens. Unknown behavior for values > 512.

  • cache_dir: Optional[str]

    The path to the cache directory. Defaults to local_cache in the parent directory.

  • threads: Optional[int]

    The number of threads a single onnxruntime session can use. Defaults to None.

  • doc_embed_type: Literal["default", "passage"] (default: "default")

    "default": Uses FastEmbed's default embedding method.

    "passage": Prefixes the text with "passage" before embedding.

embeddings = FastEmbedEmbeddings()

Usageโ€‹

Generating document embeddingsโ€‹

document_embeddings = embeddings.embed_documents(
["This is a document", "This is some other document"]
)

Generating query embeddingsโ€‹

query_embeddings = embeddings.embed_query("This is a query")

Was this page helpful?


You can leave detailed feedback on GitHub.