I Built a RAG-Powered Search for My Blog — Here's Why and How
I have been writing on this blog for a while now. Posts about AWS, Terraform, GitOps, investing, health, and life in general. Over time the archive has grown to 54 posts — and I started running into a problem I didn't expect.
I couldn't find my own content.
If I wanted to reference something I wrote about S3 six months ago, I had to scroll. If someone asked me what I'd written about a topic, I had to think. A search bar would have helped, but Blogger's built-in search is basic at best.
So I built something better. A question-answering widget powered by RAG — Retrieval Augmented Generation — running entirely on AWS, embedded on my portfolio at jayanthkatta.com.
This post is about why I built it, what I had to learn, and how I put it together.
Why I Built It
The honest reason: I wanted to actually use what I was learning.
I had been reading about RAG for a while — how it lets you build AI that answers questions from your own content without fine-tuning a model or doing any training. My blog felt like the perfect test case. The content is mine. The domain is narrow. The stakes are low if something breaks.
The secondary reason: I wanted something on my portfolio that wasn't just a list of technologies. Something a visitor could actually interact with and immediately understand what I do.
What I Had to Learn
RAG — Retrieval Augmented Generation
This was the core concept. The idea is straightforward: instead of asking an AI to answer from general knowledge, you find the most relevant pieces of your own content first, then hand those pieces to the AI and ask it to answer only from what you've given it. No fine-tuning. No training data. Just your content, an embedding model, and a language model.
Two models are involved:
- An embedding model — converts text into a list of numbers that captures the meaning of that text. Two pieces of text about the same topic will produce similar numbers.
- A language model — reads the relevant chunks and generates a plain-English answer.
I used Amazon Titan Embed v2 for embeddings and Amazon Nova Micro on Bedrock for answer generation. Both are Amazon's own models — instantly available, no third-party access forms required.
Vector Similarity
Once text is converted to numbers, finding the most relevant chunks comes down to cosine similarity — think of it like comparing directions. If your question and a blog chunk are pointing in the same direction in a mathematical space, they're about the same thing. The closer they are, the more relevant the chunk. I implemented this with numpy in the Lambda function, comparing the question's numbers against every chunk's numbers and returning the top 4 matches.
Serverless Pipeline Design
I had worked with Lambda and API Gateway before, but this was the first time I built a two-Lambda pipeline where one feeds the other indirectly through S3. The indexer writes a vector index to S3. The query Lambda loads it into memory on cold start and serves from RAM on every subsequent request. No database. No vector store. Just a JSON file and cosine math.
Windows + Git Bash Packaging Pain
Lambda runs on Linux. If you build your zip on Windows using Git Bash, pip will install Windows-specific wheels that crash immediately on Lambda with os.add_dll_directory errors.
The fix is one flag that forces pip to download the Linux-compatible binary regardless of your host OS:
pip install -r requirements.txt -t ./dist \
--platform manylinux2014_x86_64 \
--only-binary=:all: \
--python-version 3.12
Small thing. Cost me more time than I'd like to admit.
How I Built It
The architecture has two flows — one that runs daily automatically, and one that runs on demand when someone asks a question.
Daily Indexing — Automatic
An EventBridge rule fires every day and triggers the indexer Lambda. It fetches all posts from my Blogger JSON feed, breaks each post into chunks of around 400 words with a 50-word overlap, sends each chunk to Titan Embed v2 to get a vector, and writes the full index — chunks plus vectors — to an S3 bucket as a single JSON file.
The whole job takes about 30 seconds. New posts I publish show up in search automatically within 24 hours — no manual steps.
Query Flow — On Demand
When someone types a question on jayanthkatta.com, a POST request goes to API Gateway, which triggers the query Lambda. The Lambda loads the index from S3 into memory on cold start, embeds the question using Titan Embed v2, runs cosine similarity to find the 4 most relevant chunks, and passes them to Nova Micro on Bedrock with a prompt that says: answer only from this context. The response comes back with an answer and links to the source posts.
The whole round trip takes under 2 seconds.
Infrastructure
Everything is Terraform. The S3 bucket, both Lambdas, API Gateway, IAM roles, EventBridge rule, and CloudWatch logs — all defined as code, reproducible from scratch with a single terraform apply. No console clicks involved in the setup.
What It Looks Like
A floating terminal-style button sits in the bottom-right corner of jayanthkatta.com. Click it and a dark terminal overlay opens. Type a question, press Run, and the answer appears with source links back to the original posts.
Try it yourself — go to jayanthkatta.com and click Ask my blog in the bottom right.
What's Next
This was a personal project but the pattern scales. The same architecture — embed your content, store vectors, query at runtime — works for documentation, internal wikis, and support knowledge bases. The only thing that changes is the content source.
If you're looking to build something similar, the full stack is: Lambda + S3 + API Gateway + EventBridge + Bedrock. All serverless. All pay-per-use. Monthly cost at my traffic: effectively zero.
AWS Lambda Amazon S3 API Gateway EventBridge Amazon Bedrock Titan Embed v2 Nova Micro Terraform RAG Serverless
Comments
Post a Comment