Hasan-Atris3 commited on
Commit
a3c5ef4
Β·
unverified Β·
1 Parent(s): d3f62c9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: CedroPM Bot
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 7860
9
+ ---
10
+
11
+ # PM-RAG-ChatBot
12
+
13
+ A powerful RAG (Retrieval Augmented Generation) chatbot for querying Software Requirements Documents (SRDs) and project documentation. Built with Chainlit, Claude AI, and advanced document processing capabilities.
14
+
15
+ ## πŸš€ Features
16
+
17
+ - **πŸ“„ SRD Document Processing**: Upload and index PDF documents with intelligent section-aware text splitting
18
+ - **🎨 Diagram Understanding**: Process diagrams using Claude Vision and/or Qwen2-VL vision models
19
+ - **πŸ’¬ Interactive Chat Interface**: Web-based chat interface powered by Chainlit
20
+ - **πŸ” Hybrid Search**: Combines dense vector search (semantic) and sparse search (BM25) with cross-encoder reranking
21
+ - **πŸ“š Multi-Project Support**: Manage multiple projects with isolated knowledge bases
22
+ - **πŸ’Ύ Persistent Chat History**: SQLite database stores all conversations and messages
23
+ - **🧠 Learning from Feedback**: Optional learning mode that improves responses based on user corrections
24
+ - **πŸ” User Isolation**: Multi-user support with strict data isolation per user, project, and chat
25
+ - **⚑ Smart Intent Detection**: Automatically detects enumeration queries vs. regular Q&A
26
+ - **πŸ“Š Table Extraction**: Extracts and indexes tables from PDF documents
27
+
28
+ ## πŸ—οΈ Architecture
29
+
30
+ The system uses a hybrid RAG architecture:
31
+
32
+ 1. **Document Ingestion**: PDFs are processed with section-aware splitting, preserving functional/non-functional requirement context
33
+ 2. **Vector Storage**: ChromaDB stores embeddings with metadata for filtering and scoping
34
+ 3. **Hybrid Retrieval**:
35
+ - Dense retrieval using sentence transformers (all-MiniLM-L6-v2)
36
+ - Sparse retrieval using BM25
37
+ - Cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
38
+ 4. **Answer Generation**: Claude Sonnet 4.5 generates context-aware answers
39
+ 5. **Vision Processing**: Optional diagram interpretation using Claude Vision or OCR
40
+
41
+ ## πŸ“‹ Prerequisites
42
+
43
+ - **Python 3.8+**
44
+ - **System Dependencies**:
45
+ - **Poppler** (for PDF to image conversion)
46
+ - Windows: Download from [poppler-windows releases](https://github.com/oschwartz10612/poppler-windows/releases/)
47
+ - Add `bin` folder to PATH or set `POPPLER_PATH` environment variable
48
+ - **Tesseract OCR** (optional, for OCR fallback)
49
+ - Windows: Download from [UB-Mannheim Tesseract](https://github.com/UB-Mannheim/tesseract/wiki)
50
+ - **Anthropic API Key**: Get one from [Anthropic Console](https://console.anthropic.com/)
51
+
52
+ ## πŸ”§ Installation
53
+
54
+ ### 1. Clone the Repository
55
+
56
+ ```bash
57
+ git clone <repository-url>
58
+ cd PM-RAG-ChatBot