You are browsing as a guest. Sign up (or log in) to start making projects!

RAG App

  • 4 Devlogs
  • 6 Total hours

An app that can read a document and answer users' questions using an RAG pipeline. By using a Google API and a small local embedding model, online and local space is saved, and there are no compromises in performance. Currently, the algorithm allows the user to only input PDFs and certain types of images.

Open comments for this post

3h 51m 12s logged

For this project, I scrapped the UI and remade it. The base setup is still there. The sidebar and the main content area is still present, but I removed the ability to index and clear history. In the new UI, the sidebar allows the user to upload files, and the UI will show how many sections were indexed automatically. When the file is deleted, history and its related context will also be deleted. The QA part of the algorithm now answers more cleanly and provides follow-up questions as well. Moreover, the chat also shows its sources, the document’s text, when answering a question.

Coming to backend of the algorithm, I remade it as well. I first went to an OpenRouter API, which I had to fix for a long time as the file was not being converted to text properly, so the answers generated by the model were not correct. Then, I switched back to use a free Gemini Model for the answer generation and used a local embedding model to save tokens.

0
Original post
@revankotapati

For this project, I scrapped the UI and remade it. The base setup is still there. The sidebar and the main content area is still present, but I removed the ability to index and clear history. In the new UI, the sidebar allows the user to upload files, and the UI will show how many sections were indexed automatically. When the file is deleted, history and its related context will also be deleted. The QA part of the algorithm now answers more cleanly and provides follow-up questions as well. Moreover, the chat also shows its sources, the document’s text, when answering a question.

Coming to backend of the algorithm, I remade it as well. I first went to an OpenRouter API, which I had to fix for a long time as the file was not being converted to text properly, so the answers generated by the model were not correct. Then, I switched back to use a free Gemini Model for the answer generation and used a local embedding model to save tokens.

Replies

Loading replies…

0
0
Open comments for this post

40m 8s logged

Added an OCR feature to allow handwritten text to be used. I also updated the user-query feature by allowing the user to select their own settings for the RAG pipeline. Moreover, I was able to allow the user to upload files and improved the UI to make it look better.

0
Original post
@revankotapati

Added an OCR feature to allow handwritten text to be used. I also updated the user-query feature by allowing the user to select their own settings for the RAG pipeline. Moreover, I was able to allow the user to upload files and improved the UI to make it look better.

Replies

Loading replies…

0
6
Ship #1

For this project, I engineered a containerized, cloud-native Retrieval-Augmented Generation (RAG) assistant deployed on Hugging Face Spaces that enables users to securely upload documentation and query it in real time through a Streamlit chat interface. Writing native code directly with the modern google-genai and Pinecone Python SDKs allowed me to bypass bulky orchestrator frameworks, resulting in a lightweight, zero-local-storage pipeline that uses text-embedding-004 and gemini-1.5-flash. Navigating this build came with distinct engineering challenges, particularly resolving rigid SDK endpoint routing mismatches and optimizing a vector dimensionality gap by implementing types.EmbedContentConfig to dynamically truncate embedding outputs down to 768 dimensions to match the Pinecone serverless index. I am incredibly proud of how fast, clean, and cost-effective this standalone architecture turned out, as well as the strict system prompt guardrails that prevent hallucinations by defaulting to a clean fallback response whenever a query falls outside the uploaded context. To thoroughly test the live application, users simply need to verify their API keys, paste a technical text block into the ingestion sidebar to index the cloud vectors, and then run a mix of specific domain questions and out-of-domain prompts to observe the system's precise retrieval performance and defensive constraints.

  • 4 devlogs
  • 6h
Try project → See source code →
Open comments for this post

44m 1s logged

Finished backend and frontend connection.

0
Original post
@revankotapati

Finished backend and frontend connection.

Replies

Loading replies…

0
11
Open comments for this post

1h 0m 28s logged

I built the frontend and backend of this RAG pipeline using Python, GoogleGenAI, and Pinecone

0
Original post
@revankotapati

I built the frontend and backend of this RAG pipeline using Python, GoogleGenAI, and Pinecone

Replies

Loading replies…

0
5

Followers

Loading…