An example project to learn how to build application on top of LLMs.

🚀 My Learning Journey: Building the PDF Query App 📄💡

I'm thrilled to share my recent project, the PDF Query App! This web application empowers users to upload PDF files, ask questions about their content, and receive accurate answers through automated text analysis.

The Idea

The PDF Query App simplifies the extraction and comprehension of information from PDFs, enabling users to swiftly access the insights they need.

The Tech Stack

Next.js: Ensuring a responsive and intuitive frontend experience.
FastAPI: Powering a robust and high-performance backend infrastructure.
LLM (Large Language Model): Leveraged for advanced text processing and analysis capabilities.

The Development Journey

Environment Setup: Managed environment variables securely using dotenv.

Loading PDF Documents: Utilized PyPDFLoader to extract text from uploaded PDFs.

Text Splitting

pythonCopy codetext_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    is_separator_regex=False,
)
texts = text_splitter.split_documents(pages)

Generating Embeddings

pythonCopy codeembedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

Storing Embeddings

pythonCopy codepersist_directory = os.path.join('db', filename)
db = Chroma.from_documents(documents=texts, embedding=embedding_function, persist_directory=persist_directory)

Setting Up Retrieval and QA

pythonCopy coderepo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACEHUB_API_TOKEN)

retriever = db.as_retriever(search_kwargs={"k": 3})
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

generated_text = qa(question)

How It Works

Users upload PDFs via the Next.js frontend.
Backend processes and splits PDF text for analysis.
Utilizes advanced embeddings to interpret and understand text chunks.
Efficiently stores embeddings for quick access and retrieval.
Enables users to ask questions, with RetrievalQA generating precise answers based on embedded data.

Key Takeaways

This project highlighted the power of integrating diverse technologies to solve real-world challenges efficiently. It underscored the importance of modularity, seamless integration, and leveraging efficient tools for effective development.

Why PDF Query App Matters

The PDF Query App is pivotal for anyone needing rapid and accurate analysis of extensive PDF documents. From researchers to professionals, it streamlines information retrieval and enhances productivity.

Wrapping Up

You can find the complete code for this project on GitHub: https://github.com/arnab1656/pdfQuery

Excited about its potential applications and eager to witness its impact!

#WebDevelopment #NextJS #FastAPI #PDFProcessing #ContentAnalysis #SoftwareDevelopment #TechInnovation #ProjectShowcase #LearnInPublic

Exploring PDF Analysis with AI: Building the PDF Query App Using Next.js and FastAPI