Spaces:
Running
Running
Hasan-Atris3
commited on
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: CedroPM Bot
|
| 3 |
+
emoji: π€
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
app_port: 7860
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# PM-RAG-ChatBot
|
| 12 |
+
|
| 13 |
+
A powerful RAG (Retrieval Augmented Generation) chatbot for querying Software Requirements Documents (SRDs) and project documentation. Built with Chainlit, Claude AI, and advanced document processing capabilities.
|
| 14 |
+
|
| 15 |
+
## π Features
|
| 16 |
+
|
| 17 |
+
- **π SRD Document Processing**: Upload and index PDF documents with intelligent section-aware text splitting
|
| 18 |
+
- **π¨ Diagram Understanding**: Process diagrams using Claude Vision and/or Qwen2-VL vision models
|
| 19 |
+
- **π¬ Interactive Chat Interface**: Web-based chat interface powered by Chainlit
|
| 20 |
+
- **π Hybrid Search**: Combines dense vector search (semantic) and sparse search (BM25) with cross-encoder reranking
|
| 21 |
+
- **π Multi-Project Support**: Manage multiple projects with isolated knowledge bases
|
| 22 |
+
- **πΎ Persistent Chat History**: SQLite database stores all conversations and messages
|
| 23 |
+
- **π§ Learning from Feedback**: Optional learning mode that improves responses based on user corrections
|
| 24 |
+
- **π User Isolation**: Multi-user support with strict data isolation per user, project, and chat
|
| 25 |
+
- **β‘ Smart Intent Detection**: Automatically detects enumeration queries vs. regular Q&A
|
| 26 |
+
- **π Table Extraction**: Extracts and indexes tables from PDF documents
|
| 27 |
+
|
| 28 |
+
## ποΈ Architecture
|
| 29 |
+
|
| 30 |
+
The system uses a hybrid RAG architecture:
|
| 31 |
+
|
| 32 |
+
1. **Document Ingestion**: PDFs are processed with section-aware splitting, preserving functional/non-functional requirement context
|
| 33 |
+
2. **Vector Storage**: ChromaDB stores embeddings with metadata for filtering and scoping
|
| 34 |
+
3. **Hybrid Retrieval**:
|
| 35 |
+
- Dense retrieval using sentence transformers (all-MiniLM-L6-v2)
|
| 36 |
+
- Sparse retrieval using BM25
|
| 37 |
+
- Cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
|
| 38 |
+
4. **Answer Generation**: Claude Sonnet 4.5 generates context-aware answers
|
| 39 |
+
5. **Vision Processing**: Optional diagram interpretation using Claude Vision or OCR
|
| 40 |
+
|
| 41 |
+
## π Prerequisites
|
| 42 |
+
|
| 43 |
+
- **Python 3.8+**
|
| 44 |
+
- **System Dependencies**:
|
| 45 |
+
- **Poppler** (for PDF to image conversion)
|
| 46 |
+
- Windows: Download from [poppler-windows releases](https://github.com/oschwartz10612/poppler-windows/releases/)
|
| 47 |
+
- Add `bin` folder to PATH or set `POPPLER_PATH` environment variable
|
| 48 |
+
- **Tesseract OCR** (optional, for OCR fallback)
|
| 49 |
+
- Windows: Download from [UB-Mannheim Tesseract](https://github.com/UB-Mannheim/tesseract/wiki)
|
| 50 |
+
- **Anthropic API Key**: Get one from [Anthropic Console](https://console.anthropic.com/)
|
| 51 |
+
|
| 52 |
+
## π§ Installation
|
| 53 |
+
|
| 54 |
+
### 1. Clone the Repository
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
git clone <repository-url>
|
| 58 |
+
cd PM-RAG-ChatBot
|