Two Years of Writing, One Afternoon of Code
A hands-on RAG build, and an honest look at AI coding agents
Recently, in an afternoon, I built a RAG application that lets me chat with two years of my own writing. 146 posts, now searchable through natural conversation. I did this while taking notes for the article you’re reading.
I’ve been writing about AI and software development since before ChatGPT existed. Until now, I’ve focused on chasing the next development. But I realized I’d accumulated a corpus worth exploring, and with today’s tooling, I could build the harness to do it in a single sitting.
This post is two things: a practical guide to building your own personal RAG, and an honest assessment of Claude Opus 4.5 as a coding agent. If you’ve heard about RAG but never built one, or you built one a year ago and haven’t touched it since, this is your on-ramp. I’ll walk through the decisions that matter: choosing a vector database, chunking strategy, metadata design, and the iteration required to get useful results. We’re not building enterprise infrastructure here. We’re building a functional POC that solves a personal need, but as you will see, this plots a path to scale the solution.
For the vector database, I chose ChromaDB. Beyond my positive experience with it during my AWS days, ChromaDB offers something valuable for developers who want to experiment without commitment: a frictionless scaling path. You can start with an ephemeral in-memory database for quick tests, move to local file persistence for real development, then shift to a client-server architecture, and finally to their hosted cloud offering. Each transition requires minimal code changes. You’re never locked into a decision you can’t easily reverse.
Note, all the code for this post is in github.
Throughout this build, I’ll include periodic check-ins on how Opus 4.5 performed. Benchmarks tell one story; using a model for real work tells another. Spoiler: I was impressed, with some caveats worth discussing.
Let’s build.
Choosing a Stack
I knew I wanted a vector embeddings solution. During my time at AWS, even before Bedrock existed, I built RAG systems using a self-hosted ChromaDB. I remembered it being pleasant to work with, so I checked in on the project.
ChromaDB has been busy. They published a seminal paper on Context Rot, and they’ve expanded their deployment options. The progression now looks like this:
Ephemeral (in-memory) for quick experiments
Local persistent file storage for real development
Client/server architecture where you control the server
Cloud-hosted, fully managed by Chroma
This scaling ramp matters. You can get started with zero infrastructure, iterate locally, and transition to production hosting without rewriting your application. For a “scratch your own itch” project like this, that flexibility removes the friction of “but what if I want to scale this POC?”
ChromaDB also provides an llms.txt file for their documentation, which would help Claude Code understand their APIs without hallucinating outdated patterns.
Honorable Mention: LanceDB
LanceDB deserves a mention. I explored it while learning Rust, as they had an early and excellent Rust SDK. Returning to their site after a year away, I see they’ve expanded significantly: storage, search, feature engineering, analytics, and training. They also offer cloud hosting with reasonable pricing for hobby projects.
Their hybrid search capabilities are particularly interesting. There’s a good chance I’ll return to LanceDB for a future exploration, especially to compare hybrid search approaches.
But for this build, ChromaDB’s familiarity and documentation support made it the pragmatic choice.
First Prompt, First Working Code
Since this was a personal project with low stakes, I leaned heavily on Claude Code to generate the initial implementation. I wanted to put Opus 4.5 to the test, so I kept the prompts brief to see the choices it made with minimal direction. At a Claude Code prompt, I entered this initial prompt:
I would like to implement a RAG using Chromadb
We will be ingesting a local directory of markdown files. These are blog posts I have written I would like to have a simple abstraction so we can switch between using a local persistent chromadb client and the cloud client.
Please research the chromadb docs to build a plan for the code in ingest the markdown files. https://docs.trychroma.com/llms.txt
You can see the full exchange on the initial build if you want the unedited version.
Opus 4.5 got to work, making implementation decisions and in about 2 minutes finished up with this update:
Implementation complete! Here’s what was created:
Files created:
- chroma_client.py - Factory function to switch between persistent/cloud
clients
- ingest.py - Markdown ingestion with section-based chunking
- ingest/sample-post-1.md - Sample blog post about Python
- ingest/sample-post-2.md - Sample blog post about RAG systems
Results of test run:
- 2 markdown files processed
- 9 chunks created (split by ## headers)
- Query test successfully retrieved relevant chunks about “How do I
install Python?” from the correct blog post section
Usage:
# Ingest markdown files (default: persistent local storage)
python ingest.py ./ingest
# Query example
from chroma_client import get_chroma_client
client = get_chroma_client()
collection = client.get_collection(’blog_posts’)
results = collection.query(query_texts=[’your question’], n_results=3)
# To use cloud storage instead:
export CHROMA_CLIENT_TYPE=cloud
export CHROMA_TENANT=your-tenant
export CHROMA_DATABASE=your-database
export CHROMA_API_KEY=your-api-key
python ingest.py ./ingest
resulting in this file structure:
├── README.md
├── chroma_client.py
├── ingest
│ ├── sample-post-1.md
│ └── sample-post-2.md
├── ingest.py
├── pyproject.toml
└── uv.lock
The README included usage examples:
# Ingest all .md files from a directory
uv run ingest.py ./ingest
# Specify a custom collection name
uv run ingest.py ./path/to/posts my_collection
Running it against the sample files:
❯ uv run ingest.py ./ingest test_00
Found 2 markdown file(s) in ingest
Processing: sample-post-1.md
Added 4 chunk(s)
Processing: sample-post-2.md
Added 5 chunk(s)
Ingestion complete!
Files processed: 2
Total chunks: 9
Collection: test_00
Querying worked as expected. The README provided a query code example, and running it returned relevant chunks from the sample posts. At this point, we had about 150 lines of Python.
I noticed that Opus 4.5 had not written any tests, so I issued this follow-up prompt:
Can you write unit tests for ingest.py? Use a fixtures folder in the tests directory for md files. Add coverage to the test run.
Once again, Opus 4.5 followed this directive with no issues. The result: 73% coverage on the client module, 86% on ingest. Decent numbers for a POC.
Checklist for this phase, where we ended up:
Ability to ingest a directory of markdown files
Chunking by content sections (headers:
##,###)Populate ChromaDB entry metadata from markdown frontmatter
~80% test coverage
Opus 4.5 Check-in: Minimal prompts, and Claude one-shotted these goals. Particularly impressive was Claude’s unprompted decision to extract metadata from markdown frontmatter and persist it into ChromaDB. I didn’t ask for that. It recognized the pattern and implemented it. It made good modularity choices, separating
chroma_client.pyout as a simple client factory to enable switch the Chroma storage strategy with only a config change.
The Real Ingest
With the scaffolding working, I gathered all my posts from the last two-plus years and I placed them in the ingest folder.
Chunking Strategy
Chunking is where RAG implementations often diverge. The strategy you choose directly affects retrieval quality, and there’s no universal right answer.
The key constraints:
Maximum input size: Embedding models have token limits. ChromaDB’s default embedder,
all-MiniLM-L6-v2, maxes out at 256 tokens. You can send larger chunks, but anything beyond 256 tokens gets truncated silently. Your embeddings won’t represent the full content.Overlap: Simple chunking algorithms split content at fixed intervals, which means you’ll cut sentences and thoughts mid-stream. Overlapping chunks by a window (say, 50 tokens) ensures complete concepts appear in at least one chunk. This increases storage but improves retrieval coherence.
Beyond these constraints, you have options. Fixed-size chunking is straightforward but crude. Semantic chunking splits on natural boundaries like paragraphs or sentences. Header-based chunking uses document structure. More advanced approaches include recursive chunking (try one strategy, fall back to another) and agentic chunking (let an LLM decide where to split). Each has trade-offs in complexity, cost, and retrieval quality.
For a deeper dive on chunking strategies, Weaviate has a great post on it.
Claude had initially implemented header-based splitting (chunk at ## and ### boundaries), which is a reasonable semantic approach. But for this corpus of articles I wanted more predictable chunk sizes. I instructed it to switch to fixed 256-token chunks with 50-token overlap. This is a more primitive approach but it is a good place to start, if even just for a baseline.
For the embedding model, I accepted ChromaDB’s default: all-MiniLM-L6-v2. It’s a well-respected embedder, and importantly, this is a two-way door decision. With my corpus size, re-ingesting with a different embedder takes under a minute. Optimize later if needed.
Running the Ingest
I set up the local persistent strategy for the ingest code
# file: .env.local
# Client type: “persistent” (local) or “cloud”
CHROMA_CLIENT_TYPE=persistent
# Local storage path (only used when CHROMA_CLIENT_TYPE=persistent)
CHROMA_PERSIST_PATH=./.chromadb
Then ran the ingest:
❯ uv run ingest.py ./articles ac_articles
Processing: 2025-01-20_architecting-ai-agents.md
Chunk 0: 255 tokens
Chunk 1: 256 tokens
Chunk 2: 256 tokens
...
Ingestion complete!
Files processed: 146
Total chunks: 1807
Collection: ac_articles
Total documents in collection: 1807
The full ingest completed in 47 seconds. That’s good news for iteration. You can experiment with chunking strategies, re-ingest, and evaluate results without significant time cost.
Each chunk carries metadata extracted from the article’s frontmatter:
{
“type”: “deep_dive”,
“author”: “Sam Keen”,
“total_chunks”: 13,
“url”: “https://alteredcraft.com/p/googles-antigravity-ide-first-impressions”,
“chunk_index”: 0,
“source_file”: “2025-11-21_google-s-antigravity-ide-first-impressions.md”,
“publish_date”: “November 21, 2025”,
“title”: “Google’s Antigravity IDE: First Impressions”
}
This metadata becomes important later for filtering queries.
Setting Up CLAUDE.md
At this point, I created a CLAUDE.md file using Claude Codes /init command. The only change I made at this point was to append this clause to the file:
## [IMPORTANT] Operating directives
- After completing a coding task, you MUST run tests using `uv run pytest` and all tests must pass.
- If test coverage is below 80%, ask the user if they would like you to increase test coverage.
Setting expectations early, such as testing directives, pays off as the codebase grows.
Examining the Collection
I now have all my articles stored in a local persistent ChromaDB. Before building on top of this dataset, I wanted to explore what we’d created. ChromaDB provides some excellent built-in tools for this.
CLI Browser
ChromaDB now ships with a CLI browser. You can install it with uvx:
uvx --from chromadb chroma
Then browse your collection:
chroma browse ac_articles --path ./.chromadb
You can arrow through and browse contents or search by ID, contents, or metadata. It’s a quick sanity check to verify your ingest worked as expected and your metadata is structured correctly.
Beyond browse, the CLI offers commands for managing collections, exporting data, and more. Check the ChromaDB CLI docs for the full list.
Building the Chat CLI
With the contents of the stored collection sanity checked, it’s time to build something useful on top of our dataset. I prompted Claude:
Let’s add a simple CLI chat using the Anthropic Agent SDK which utilizes the ChromaDB to answer user queries. Use Typer for the CLI. Give it a little CLI UI “flair.”
Opus 4.5 built the chat interface on the first attempt, even including streaming responses to the terminal. When it ran the unit tests, coverage had dropped to 60% with the new code. Per the CLAUDE.md directives, Claude flagged this and offered to improve coverage and I confirmed yes. Opus 4.5 got to work adding tests and the result: 98 tests passing, 89% coverage. True unit tests with mocked dependencies.
Looking at the updated README, I ran the newly documented command for chat:
And asking a question:
A good start, but the output needed refinement. The model was confused, attributing articles from my weekly review link roundups as articles I had written. Time to iterate on the system prompt.
Opus 4.5 Check-in: Claude one-shotted the CLI app, including streaming responses. It consistently respected the CLAUDE.md directives, running tests after changes and alerting me when coverage dropped. In prior experiments with earlier Claude versions, chat interfaces always required debugging, usually async or streaming issues. Opus 4.5 nailed it first try. You might chalk that up to luck, but my continued experience says otherwise: it’s a consistent performer.
Iterating on the System Prompt
To debug the prompt construction, I asked Claude to add file-based logging so I could see exactly what was being sent to the LLM. I also extracted the system prompt into a separate prompt.txt file for easy editing. I also had it moved the model configuration to the .env file. Opus 4.5 continued to perform well, accomplishing all this refactoring on the first effort.
This gave me a good level of observability for making further changes to the app. I edited the system prompt to provide further context differentiating my weekly summaries of other’s works from my self researched long form posts:
You are a helpful assistant to a writer, Sam Keen.
Sam publishes on a SubStack called Altered Craft.
Sam publishes two types of articles:
- `weekly_review`: A curated list of AI tutorials, tools, and news articles
Sam found interesting during the week. Each post contains summaries and
commentary about articles written by others.
- `deep_dive`: Articles researched and authored by Sam covering AI topics
relevant to software developers.
When answering:
- Use the provided context to give accurate, relevant answers
- If the context doesn’t contain enough information, say so clearly
- Be concise but thorough
- Always provide the URL to the article
- If you’re unsure, express uncertainty rather than making things up
Better results now:
I was using Claude Sonnet 4.0 for the LLM, and it was doing a good job of interpreting the additional context appended to queries.
Metadata Filtering
There’s a subtle issue here though. When I ask “Have I written any articles about RAG?”, the query gets encoded and compared against all 1,807 chunks in the database. The top 5 closest matches return. But those matches might include weekly review chunks that mention RAG, even if I’ve written a deep-dive article about it that didn’t rank in the top 5.
There are sophisticated solutions to this problem. One example is Re-rankers which are specialized models that take a larger initial result set and reorder them by relevance to the original user query. They’re increasingly common in production RAG systems and can be layered onto any retrieval strategy.
For this POC, we have a simpler option. Remember that metadata we stored with each chunk. ChromaDB supports metadata filtering, similar to a SQL WHERE clause. You can scope searches to specific metadata values.
Claude had actually anticipated this when building the CLI (--type):
So if I only want to query my authored articles I can signify that when starting the app:
uv run chat.py chat --type deep_dive
From here, we have several options to enhance the handling of differing article types in the collection:
Enable setting filters dynamically during a chat session
Expose filtering to the LLM as a tool, (MCP server or Claude Skill), letting it decide which article types to query based on the user’s question
Going Cloud
Remember that frictionless scaling path I mentioned? Time to test it. At the time of writing, ChromaDB offers a free trial with $5 credit for their cloud product.
When I had Claude build the application, I specified that I wanted the ability to switch between local and cloud storage via configuration. Claude claimed to have implemented this, so let’s verify.
First, I set up Chroma Cloud: created an account, project, and database, then generated an API key. This gave me a tenant ID, database name, and API key.
Looking at the .env file:
# Client type: “persistent” (local) or “cloud”
CHROMA_CLIENT_TYPE=persistent
# Local storage path (only used when CHROMA_CLIENT_TYPE=persistent)
CHROMA_PERSIST_PATH=./.chromadb
# Cloud configuration (only used when CHROMA_CLIENT_TYPE=cloud)
# CHROMA_TENANT=your-tenant-id
# CHROMA_DATABASE=your-database-name
# CHROMA_API_KEY=your-chroma-api-key
Opus 4.5 had read the Chroma docs and added the needed fields. I added my values here.
I then discovered Opus 4.5’s first misstep. It hadn’t actually enabled cloud support in the ingest.py script. It only loaded to local storage. I prompted:
If you look at dist.env we have the option to run from a cloud hosted ChromaDB. But if I look at ingest.py, it seems to only be capable of loading into a local DB. Can you review and fix if needed?
Claude fixed this on the first try. However, it also wrote tests that hit the cloud storage. This is where staying engaged matters. Spotting this as it happened kept me from continuing to burn credits for chroma cloud during test runs and also it is just poor form to have unit tests leverage the network.
Opus 4.5 Check-in: These models are extremely capable, but you should stay engaged when working with them. Claude was being thorough by testing the cloud integration, but those tests would burn through my $5 credit. I asked it to remove the cloud tests and only test local filesystem storage. Use missteps like this as feedback to refine your system prompts. A directive like “Unit tests should NEVER result in network communication” would have prevented this.
With cloud configuration set and the ingest issue resolved, I ran ingest.py:
uv run ingest.py ./articles ac_articles
Processing: 2025-03-17_skip-the-waitlist-ai-agents-with-openmanus.md
Chunk 0: 256 tokens
...
Ingestion complete!
Files processed: 146
Total chunks: 1807
Collection: blog_posts
Total documents in collection: 1807
This time it took about 4 minutes since we’re going over the network, compared to 47 seconds locally.
In the ChromaDB cloud dashboard I could see the records being loaded into the collection:
The web interface provides similar browsing capabilities as the local CLI. I also discovered the cost for this ingest: $0.02, very reasonable.
Now, finally, running the chat with cloud configuration:
uv run chat.py chat --type deep_dive
The experience is identical. There’s slightly more latency, as expected, but it’s barely noticeable.
So as promised by ChromaDB, with just a bit of forethought in our design (having an abstracted chroma client factory), switching between local and cloud storage requires only a config change.
I did also notice, as you might expect, cloud users have access to extended features such as an enhanced Search API with advanced hybrid search capabilities. That’s definitely something to explore in a future iteration.
Two Things Converging
This article demonstrates two things:
First, RAG frameworks have matured. There’s no shortage of RAG solutions today. AWS, Azure, and GCP all offer managed vector databases and RAG pipelines. Frameworks like LangChain provide comprehensive tooling. These can be excellent choices, but they come with extensive dependency coupling. You’re buying into an ecosystem.
ChromaDB occupies a useful middle ground: minimal coupling with a clear path to scale. You can start with zero dependencies beyond a Python package, iterate locally with fast feedback, and transition to cloud hosting when you’re ready. The tooling is solid, the documentation is thorough, and utilities like the CLI browser make debugging straightforward. If you’ve been meaning to try RAG but thought you didn’t have the time, this approach removes that excuse.
Second, AI coding agents are certainly viable for bounded POC work. Claude Opus 4.5 rarely made mistakes and respected its system prompt directives diligently. It one-shotted most tasks, made smart unprompted decisions like extracting frontmatter metadata. That said, you need to stay engaged. The cloud integration test incident shows these models will be thorough in ways you might not want. Treat missteps as feedback for refining your system prompts.
I haven’t done a thorough code review yet, and I’m sure there are anti-patterns to clean up. cloc reports about ~800 lines of application code and ~800 lines of tests. That’s a manageable codebase to audit if this POC graduates to something more permanent. If I continue evolving this project, I’d start by reviewing the tests to ensure they form a proper regression safety net, then move to the application code. Another completely viable path is to treat this simply as a viability POC, delete the code and start from scratch on a “production” RAG app.
But here’s what matters: In an afternoon, while taking notes for this article, I built the foundation of a “chat with my article catalog” capability. It’s still a POC, but it works. I can query my own writing, filter by article type, and get relevant results with source links.
That was possible because of two things converging: first-class RAG frameworks that remove infrastructure friction, and AI coding agents that compress development time for well-scoped projects.
If either of those interest you, now’s a good time to experiment.










