Article

Building a RAG System with Microsoft.Extensions.VectorData

Building a RAG System with Microsoft.Extensions.VectorData

This blog post is based on my talk at NET Conf Tbilisi 2025 – Exploring the Next Era of .NET.
I presented this topic to explain how RAG works and how we can build it in .NET.

The talk is also available on YouTube:
https://www.youtube.com/watch?v=17ynh2o21-M

What this post covers

In this post I talk about:

  • AI and LLMs in 2025: what they are good at and where they fail
  • Embeddings, vectors, and vector stores
  • Why RAG exists and how it works
  • Microsoft.Extensions.VectorData (ME.VD) and its main abstractions
  • A demo story about grounding the model with real context

AI & LLMs in 2025 — superpowers and limits

LLMs changed the AI world a lot. Today they can understand and generate text almost like humans. They can summarize text, explain code, reformat content, and answer questions in many domains.

But there is an important limitation. LLMs are trained on data only up to a certain date. They are like frozen snapshots of books, websites, and code at that time. They cannot see new data or your private data.

Because of this, models can give generic answers or sometimes hallucinate information that sounds correct but is not.

This gap between strong language generation and limited knowledge is the main reason why RAG exists.

Embeddings, vectors, and vector stores

Before going deeper into RAG, we need to talk about the building blocks.

Computers do not understand words. What they understand are numbers, and in this case, vectors. This is a very important concept in the AI and machine learning world.

Vectors are just numbers. You can think of them as points in a very high-dimensional space.

Another important concept is embeddings. An embedding is also created by a model.
Embeddings are vector representations of real data like text, images, audio, or code. The main goal of embeddings is to capture meaning, not just individual words.

In this vector space, data with similar meaning is located close to each other. Because of that, we can search for “things that are most similar to this vector”. This is the basic idea behind semantic search.

Now we need a place to store all these vectors.

Vector databases are systems that store and search these high-dimensional vectors efficiently. Popular vector databases include Qdrant, Pinecone, and Milvus. At the same time, many traditional databases like Postgres, MongoDB, and Redis also support vector search today.

Retrieval-Augmented Generation (RAG)

RAG adds a retrieval step before the model generates an answer:

  1. A user asks a question
  2. The system searches a vector database for the most relevant documents
  3. The retrieved content is added to the prompt

Then the LLM generates an answer using this context, instead of relying only on its training data.

Why this is important:

  • Answers are more accurate because they are based on real data
  • Models can use up-to-date information without retraining
  • You can adapt the system to your own domain very quickly
  • Sensitive data stays inside your system

A simple mental model looks like this:

Query → Embedding Model → Vector Store → Relevant Context
Relevant Context + Prompt → LLM → Response

Microsoft.Extensions.VectorData (ME.VD) — the .NET way

Microsoft.Extensions.VectorData provides .NET abstractions for working with vector stores. You can create collections, upsert records, and run similarity search.

The library is GA since May 2025. It is integrated with Semantic Kernel and designed to be extensible, so you can plug in different vector store implementations.

Core components

  • VectorStore
    This is the entry point for vector operations. It manages collections and handles creation, deletion, and connections.

  • VectorStoreCollection<TKey, TRecord>
    A strongly typed collection where you store and search records. A record usually contains an ID, some metadata, and one or more vector embeddings.

Attributes

  • [VectorStoreKey] — marks the primary key
  • [VectorStoreData] — marks the main data payload
  • [VectorStoreVector] — marks the embedding used for similarity search

Demo story: “What LLMs don’t know”

In the demo I used Lacrimosa as the example topic. The idea was simple: even a very smart model does not know your internal facts or new information.

Without RAG, the model answers based only on its training. With RAG, we first retrieve the relevant context and then let the model answer using that data. This makes the response grounded and more trustworthy.

Demo Repository

Resources

Top