Hasan-Atris3 commited on
Commit
6f052a1
Β·
unverified Β·
1 Parent(s): a3c5ef4

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -58
README.md DELETED
@@ -1,58 +0,0 @@
1
- ---
2
- title: CedroPM Bot
3
- emoji: πŸ€–
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: docker
7
- pinned: false
8
- app_port: 7860
9
- ---
10
-
11
- # PM-RAG-ChatBot
12
-
13
- A powerful RAG (Retrieval Augmented Generation) chatbot for querying Software Requirements Documents (SRDs) and project documentation. Built with Chainlit, Claude AI, and advanced document processing capabilities.
14
-
15
- ## πŸš€ Features
16
-
17
- - **πŸ“„ SRD Document Processing**: Upload and index PDF documents with intelligent section-aware text splitting
18
- - **🎨 Diagram Understanding**: Process diagrams using Claude Vision and/or Qwen2-VL vision models
19
- - **πŸ’¬ Interactive Chat Interface**: Web-based chat interface powered by Chainlit
20
- - **πŸ” Hybrid Search**: Combines dense vector search (semantic) and sparse search (BM25) with cross-encoder reranking
21
- - **πŸ“š Multi-Project Support**: Manage multiple projects with isolated knowledge bases
22
- - **πŸ’Ύ Persistent Chat History**: SQLite database stores all conversations and messages
23
- - **🧠 Learning from Feedback**: Optional learning mode that improves responses based on user corrections
24
- - **πŸ” User Isolation**: Multi-user support with strict data isolation per user, project, and chat
25
- - **⚑ Smart Intent Detection**: Automatically detects enumeration queries vs. regular Q&A
26
- - **πŸ“Š Table Extraction**: Extracts and indexes tables from PDF documents
27
-
28
- ## πŸ—οΈ Architecture
29
-
30
- The system uses a hybrid RAG architecture:
31
-
32
- 1. **Document Ingestion**: PDFs are processed with section-aware splitting, preserving functional/non-functional requirement context
33
- 2. **Vector Storage**: ChromaDB stores embeddings with metadata for filtering and scoping
34
- 3. **Hybrid Retrieval**:
35
- - Dense retrieval using sentence transformers (all-MiniLM-L6-v2)
36
- - Sparse retrieval using BM25
37
- - Cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
38
- 4. **Answer Generation**: Claude Sonnet 4.5 generates context-aware answers
39
- 5. **Vision Processing**: Optional diagram interpretation using Claude Vision or OCR
40
-
41
- ## πŸ“‹ Prerequisites
42
-
43
- - **Python 3.8+**
44
- - **System Dependencies**:
45
- - **Poppler** (for PDF to image conversion)
46
- - Windows: Download from [poppler-windows releases](https://github.com/oschwartz10612/poppler-windows/releases/)
47
- - Add `bin` folder to PATH or set `POPPLER_PATH` environment variable
48
- - **Tesseract OCR** (optional, for OCR fallback)
49
- - Windows: Download from [UB-Mannheim Tesseract](https://github.com/UB-Mannheim/tesseract/wiki)
50
- - **Anthropic API Key**: Get one from [Anthropic Console](https://console.anthropic.com/)
51
-
52
- ## πŸ”§ Installation
53
-
54
- ### 1. Clone the Repository
55
-
56
- ```bash
57
- git clone <repository-url>
58
- cd PM-RAG-ChatBot