ragnar 0.2
We’re happy to announce the release of ragnar 0.2, a new R package for building trustworthy Retrieval-Augmented Generation (RAG) workflows.
You can install it from CRAN with:
install.packages("ragnar")
What’s retrieval-augmented generation (RAG)?
Large language models (LLMs) tend to generate fluent confident text completely detached from facts and reality. We politely call untrue statements from an LLM hallucinations. RAG reduces the risk of hallucinations by grounding LLMs in your factual, trusted documents.
With RAG, instead of asking an LLM to respond from its own memory, we:
- Retrieve relevant passages from trusted sources.
- Ask the model to answer using those passages.
RAG shifts the LLMs job from open ended generation towards summarizing and paraphrasing, an easier task where LLMs make substantially fewer fabrications.
Meet ragnar
ragnar is a tidy interface for building a RAG pipeline. Use ragnar to:
- Convert documents from the web or local filesystem into Markdown.
- Chunk documents using meaningful semantic boundaries.
- Augment chunks with a short context string that situates each chunk.
- Embed chunks with commercial or open-source models.
- Store embeddings in DuckDB for fast, local queries.
- Retrieve relevant chunks using both vector and text search.
Quick start: collect, convert, chunk, embed, and store your documents
Here is how to build a RAG knowledge store from the Quarto docs.
-
Create a knowledge store.
store <- ragnar_store_create( "./quarto.ragnar.duckdb", embed = \(x) ragnar::embed_openai(x, model = "text-embedding-3-small"), name = "quarto_docs" )
-
Generate a list of relevant web page URLs from quarto.org. We can consult the sitemap, or, if a sitemap wasn’t available, we could also crawl the site.
pages <- ragnar_find_links("https://quarto.org/sitemap.xml")
-
Convert, chunk, augment, embed, and store each page.
for (page in pages) { chunks <- page |> # Convert to markdown read_as_markdown() |> # Split document into chunks and generate 'context' for each chunk. markdown_chunk() # Embed and store chunks with context and metadata ragnar_store_insert(store, chunks) }
-
Build the retrieval index.
ragnar_store_build_index(store)
Once the store is built, you can access it for fast retrieval.
Retrieve relevant chunks
Pass a query string to
ragnar_retrieve()
to perform both semantic search using vector embeddings and conventional text search to retrieve the most relevant chunks.
store <- ragnar_store_connect("./quarto.ragnar.duckdb", read_only = TRUE)
query <- "{.python} or {python} code chunk header"
ragnar_retrieve(store, query, top_k = 5)
#> # A tibble: 9 × 9
#> origin doc_id chunk_id start end cosine_distance bm25 context text
#> <chr> <list> <list> <int> <int> <list> <lis> <chr> <chr>
#> 1 https://quart… <int> <int> 14318 16132 <dbl [1]> <dbl> "# Dia… "###…
#> 2 https://quart… <int> <int> 869 2386 <dbl [1]> <dbl> "# ASA… "Hom…
#> 3 https://quart… <int> <int> 1 2497 <dbl [2]> <dbl> "" "# U…
#> 4 https://quart… <int> <int> 3156 4928 <dbl [1]> <dbl> "# v1.… "## …
#> 5 https://quart… <int> <int> 5365 7389 <dbl [1]> <dbl> "# Cre… "## …
#> 6 https://quart… <int> <int> 7319 8804 <dbl [1]> <dbl> "# HTM… "## …
#> 7 https://quart… <int> <int> 11096 12763 <dbl [1]> <dbl> "# HTM… "## …
#> 8 https://quart… <int> <int> 9426 11250 <dbl [1]> <dbl> "# Rev… "###…
#> 9 https://quart… <int> <int> 5236 6904 <dbl [1]> <dbl> "# Hel… "###…
Equip an LLM chat with your store
You can equip an ellmer chat with a tool that lets the LLM search your knowledge store automatically.
library(ellmer)
chat <- chat_openai(
system_prompt = glue::trim("
You are a Quarto documentation search agent and summarizer.
You are concise.
For every user question, perform between one and three searches.
Include links to the source documents in your response.
")
) |>
ragnar_register_tool_retrieve(store, top_k = 10)
#> Using model = "gpt-4.1".
The model can now search the store on demand. It has the ability to rewrite the search query and do repeated searches. The model’s responses will also cite and link back to your source documents, so users can easily follow links to learn more.
chat$chat(
"What's the difference between {.python} and {python}
in a code chunk header?"
)
#> ◯ [tool call] rag_retrieve_from_store_001(text = "difference between {.python}
#> and {python} in a code chunk header")
#> ● #> [{"origin":"https://quarto.org/docs/authoring/diagrams.html","doc_id"…
#> ◯ [tool call] rag_retrieve_from_store_001(text = "chunk header options quarto
#> curly braces dot notation")
#> ● #> [{"origin":"https://quarto.org/docs/authoring/lipsum.html","doc_id":2…
#> The difference between `{.python}` and `{python}` in a code chunk header is:
#>
#> - `{python}`: This syntax is used for executable code blocks. Quarto will run
#> the Python code inside the block and include its output in the rendered
#> document.
#> ```markdown
#> ```{python}
#> print(1 + 1)
#> ```
#> ```
#> This is for running code, capturing output, figures, etc.
#>
#> - `{.python}`: This syntax (note the leading dot) is used for a code block that
#> is purely for display (not executed), with `.python` indicating the code should
#> be syntax-highlighted as Python. This is the Pandoc Markdown convention for
#> indicating the language for syntax highlighting only:
#> ```markdown
#> ```{.python}
#> # This code is just displayed, not executed by Quarto
#> print(1 + 1)
#> ```
#> ```
#> Or equivalently, you can use triple backticks followed by the language name:
#> ```
#> ```python
#> print(1 + 1)
#> ```
#> ```
#> In both forms, the code is not executed.
#>
#> To summarize:
#> - `{python}` → Executed code block.
#> - `{.python}` or ```python → Non-executed code block with syntax highlighting
#> only.
#>
#> Sources:
#> - [Quarto documentation: Using
#> Python](https://quarto.org/docs/computations/python.html)
#> - [Quarto documentation: HTML Code
#> Blocks](https://quarto.org/docs/output-formats/html-code.html)
Inspect and iterate
Use
ragnar_store_inspect()
to interactively preview which text chunks are retrieved for different search queries. This helps identify issues like poor document conversion, chunking, or context augmentation, so you can refine your store creation pipeline. By making retrieval results easy to explore, ragnar
lets you iterate and tune your knowledge store before connecting it to an LLM.
You can also launch the store inspector with just a single chunked document using
ragnar_chunks_view()
. This is particularly useful when deciding what chunking approach is most appropriate for your content.

Additional features
- Works with many document types:
read_as_markdown()
uses MarkItDown, which means it can ingest an extremely wide variety of files: HTML, PDF, docx, pptx, epubs, compressed archives, and more. - Flexible embeddings: Use embedding models from providers like OpenAI, Google Vertex or Gemini, Bedrock, Databricks, Ollama or LM Studio, or easily supply your own embedding function.
- DuckDB native: Extremely fast local indexing and retrieval. Native support for MotherDuck if you need to serve the store.
- Customizable chunk augmentation: Customize how chunks are augmented with context (headings, links, titles), and easily attach additional metadata to chunks.
- Not a black box: Easily inspect the store contents and retrieval results.
Get started
- Install:
install.packages("ragnar")
- Read the vignette: Getting Started
- Explore more examples: ragnar GitHub repository
Acknowledgements
A big thanks to all contributors who helped out with ragnar development through thoughtful discussions, bug reports, and pull requests.
@app2let, @arnavchauhan7, @atheriel, @bowerth, @cboettig, @Christophe-Regouby, @dfalbel, @dingying85, @gadenbuie, @hadley, @JCfly3000, @jrosell, @kaipingyang, @mattwarkentin, @PauloSantana2019, @pedrobtz, @RichardHooijmaijers, @schochastics, @sikiru-atanda, @SimonEdscer, @smach, @t-kalinowski, and @topepo.