How long does it take to build a RAG system?

A focused prototype over a bounded corpus (1,000–10,000 documents, one language, one use case) is typically live in 4–6 weeks. A production RAG with access control, hybrid retrieval, evaluation pipeline and UI is typically 8–14 weeks. We always recommend starting with a prototype so you can see early whether quality is where it needs to be.

Which vector database should we choose?

pgvector if you already have Postgres — it's the simplest path and scales fine to millions of chunks. Pinecone or Qdrant Cloud if you want managed service and high scale. Self-hosted Qdrant or Weaviate if you have strict data residency requirements. We choose together in discovery based on your data volume, operations team and compliance.

Can you use OpenAI or does it need to be open source?

Both work. OpenAI (or Azure OpenAI for EU residency) gives state-of-the-art models and simple operations. Open source via vLLM or Bedrock gives more control and can be cheaper at high volume. Embeddings and generation can be different models — we choose per layer based on quality and cost.

How do we know if the RAG system is good enough?

We build an evaluation dataset together with you in discovery: 50–200 questions with known good answers from a domain expert. That dataset is run against the system on every change and gives a measurable quality number. Plus production monitoring: empty retrievals, low confidence and user feedback logged. If the numbers drop, we know before the users complain.

What about GDPR and personal data in the documents?

We don't index sensitive data without explicit decision. Where necessary, we redact PII before embedding (names, national IDs, account numbers replaced with placeholders). Access control ensures the model only sees documents the requesting user is allowed to see. For especially sensitive corpora we run models self-hosted in EU.

RAG

RAG and knowledge systems

We build RAG systems that let language models answer based on your own documents — with citations, quality evaluation and an architecture that can be maintained.

Book a conversation Talk through a use case

3D illustration of a transparent glass sphere at the centre surrounded by a ring of empty glass panels, connected by an orange light beam through the centre — symbolising retrieval over your own documents.

Retrieval first, model second

How we build RAG that survives production

Vector + keyword + re-ranker: Hybrid
Every answer points back to the source: Cite
Permissions filtered at query time: ACL
Quality pipeline runs on every change: Eval

How we build RAG

Retrieval first — not prompts first.

RAG systems are the most practical AI architecture right now — it's where language models actually deliver on the promises they want to live up to. The prerequisite is that the retrieval layer does its job: if the model gets bad documents in, the user gets a confident, but wrong, answer. Most RAG systems that don't work fail at retrieval, not at the model. That's where we spend our time.

We build RAG over your own data: documentation, contracts, cases, support articles, internal wiki. Text is chunked intelligently (not just on character boundaries), embedded with a model appropriate for your language (we have good experience with multilingual models for Nordic languages), and indexed in a vector store — typically pgvector if you already have Postgres, otherwise Pinecone or Qdrant. Retrieval is combined with classic search (BM25) to catch what semantic search alone doesn't necessarily cover.

Just as important as the architecture is the evaluation. We set up an evaluation pipeline from day one: a collection of questions with known good answers, run regularly against the system, so you can measure when a change improves or worsens quality. Without that kind of measurement, RAG becomes a demo that sometimes works.

What we deliver

A RAG system that survives production.

Hybrid retrieval, citations, access control and an evaluation pipeline that measures quality over time.

Smart chunking and embedding
We chunk by structure — sections, paragraphs, tables — not by blind character boundaries. Chunks get metadata (source, title, date, author) so retrieval can filter by context. Embeddings generated with models that handle your languages equally well.
Hybrid retrieval (vector + keyword)
Vector search catches semantic similarity; BM25/keyword search catches exact terms (product names, paragraph numbers, error codes). We combine both with a re-ranker so the top 5–10 chunks go to the model.
Citations and source links
The model is instructed to always cite which document and which section the answer came from. Users can click back to the source and verify — critical when answers are used in cases, support or legal contexts.
Vector store: pgvector, Pinecone or Qdrant
pgvector if you already have Postgres (no extra system to operate). Pinecone if you want a fully managed service. Qdrant if you want to self-host with rich filtering. We choose based on your data volume and operations tolerance.
Indexing pipeline with versioning
When documents change, only the affected content is re-indexed. We version the index so you can test a new embedding model on a secondary index before switching — without taking the system down.
Evaluation, monitoring and quality metrics
An evaluation dataset with known good answers run in CI on every change to prompts, retriever or model. Plus production monitoring: empty retrievals, low confidence scores and user complaints logged so you find quality gaps before they become a pattern.

Before you start

What you should consider first.

RAG or fine-tuning?
RAG is almost always the right starting point. It's faster to build, easier to update when documents change, and gives verifiable citations. Fine-tuning becomes interesting only when you have a large, stable corpus of examples of good output form — and even then it's typically used together with RAG, not instead of.
Data quality is the prerequisite
A RAG system over messy, contradictory or outdated data becomes a confident liar. We like to spend time in discovery looking at a sample of your documents — and if the quality requires cleanup, we say so before we start, not after the pilot phase.
Per-document access control
If not every user is allowed to see every document, retrieval needs to respect that. We index permissions alongside the document and filter at query time — so the model never sees a document the requesting user isn't allowed to see. Not an afterthought; an architectural choice from day one.
Language and Nordic-specific models
Most embedding models are trained primarily on English. For Danish we use multilingual models (E5-multilingual, BGE-M3) that have shown good results on Scandinavian languages. We test on your actual data in discovery — that's the only reliable way to know if a model is good enough.

FAQ

What people usually ask.

How long does it take to build a RAG system?
A focused prototype over a bounded corpus (1,000–10,000 documents, one language, one use case) is typically live in 4–6 weeks. A production RAG with access control, hybrid retrieval, evaluation pipeline and UI is typically 8–14 weeks. We always recommend starting with a prototype so you can see early whether quality is where it needs to be.
Which vector database should we choose?
pgvector if you already have Postgres — it's the simplest path and scales fine to millions of chunks. Pinecone or Qdrant Cloud if you want managed service and high scale. Self-hosted Qdrant or Weaviate if you have strict data residency requirements. We choose together in discovery based on your data volume, operations team and compliance.
Can you use OpenAI or does it need to be open source?
Both work. OpenAI (or Azure OpenAI for EU residency) gives state-of-the-art models and simple operations. Open source via vLLM or Bedrock gives more control and can be cheaper at high volume. Embeddings and generation can be different models — we choose per layer based on quality and cost.
How do we know if the RAG system is good enough?
We build an evaluation dataset together with you in discovery: 50–200 questions with known good answers from a domain expert. That dataset is run against the system on every change and gives a measurable quality number. Plus production monitoring: empty retrievals, low confidence and user feedback logged. If the numbers drop, we know before the users complain.
What about GDPR and personal data in the documents?
We don't index sensitive data without explicit decision. Where necessary, we redact PII before embedding (names, national IDs, account numbers replaced with placeholders). Access control ensures the model only sees documents the requesting user is allowed to see. For especially sensitive corpora we run models self-hosted in EU.

Related services

Ready to get started?

Let's have a no-pressure conversation.

We'll get back within one business day with concrete input — not a stock proposal.

Book a call Email us

RAG and knowledge systems

How we build RAG that survives production

Retrieval first — not prompts first.

A RAG system that survives production.

Smart chunking and embedding

Hybrid retrieval (vector + keyword)

Citations and source links

Vector store: pgvector, Pinecone or Qdrant

Indexing pipeline with versioning

Evaluation, monitoring and quality metrics

What you should consider first.

RAG or fine-tuning?

Data quality is the prerequisite

Per-document access control

Language and Nordic-specific models