Nour commited on
Commit
f32ae52
·
1 Parent(s): abd0cce

Update Dockerfile and add SETUP.md

Browse files
Files changed (2) hide show
  1. Dockerfile +12 -11
  2. SETUP.md +123 -0
Dockerfile CHANGED
@@ -28,19 +28,20 @@ RUN python -m spacy download en_core_web_sm
28
  # 4. Create a non-root user (Required for Hugging Face)
29
  RUN useradd -m -u 1000 user
30
 
31
- # 5. Copy the rest of the application code & Fix Permissions
32
- COPY --chown=user . .
33
-
34
- # 6. Ensure the user can write to the DB directories
35
- # We create the folder and give the user ownership
36
- RUN mkdir -p chroma_global_db && chown -R user:user chroma_global_db
37
- # We also ensure the /app directory is writable for the sqlite file
38
- RUN chown -R user:user /app
39
-
40
- # 7. Switch to the non-root user
41
  USER user
42
 
43
- # 8. Expose the port Chainlit runs on
 
 
 
44
  EXPOSE 7860
45
 
46
  # 9. Run the application
 
28
  # 4. Create a non-root user (Required for Hugging Face)
29
  RUN useradd -m -u 1000 user
30
 
31
+ # 5. Fix Permissions BEFORE switching user
32
+ # Create the folders as root, then give them to the user
33
+ RUN mkdir -p /app/chroma_global_db
34
+ RUN mkdir -p /data
35
+ RUN chown -R 1000:1000 /app
36
+ RUN chown -R 1000:1000 /data
37
+
38
+ # 6. Switch to the non-root user
 
 
39
  USER user
40
 
41
+ # 7. Copy the application code
42
+ COPY --chown=user . .
43
+
44
+ # 8. Expose the port
45
  EXPOSE 7860
46
 
47
  # 9. Run the application
SETUP.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PM-RAG-ChatBot Setup Guide
2
+
3
+ ## Prerequisites
4
+
5
+ ### 1. Python 3.8 or higher
6
+ Make sure Python is installed on your system.
7
+
8
+ ### 2. System Dependencies (for PDF processing)
9
+
10
+ #### Windows:
11
+ - **Poppler** (for PDF to image conversion):
12
+ - Download from: https://github.com/oschwartz10612/poppler-windows/releases/
13
+ - Extract and add `bin` folder to your system PATH
14
+ - Or set `POPPLER_PATH` environment variable to the `bin` folder
15
+
16
+ - **Tesseract OCR** (optional, for OCR):
17
+ - Download from: https://github.com/UB-Mannheim/tesseract/wiki
18
+ - Install and add to PATH
19
+
20
+ #### Alternative (using conda):
21
+ ```bash
22
+ conda install -c conda-forge poppler tesseract
23
+ ```
24
+
25
+ ### 3. API Keys
26
+
27
+ Create a `.env` file in the project root with:
28
+
29
+ ```env
30
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
31
+ POPPLER_PATH=C:\path\to\poppler\bin # Optional, if not in PATH
32
+ CLAUDE_VISION_MODEL=claude-sonnet-4-5-20250929 # Optional, defaults to this
33
+ ```
34
+
35
+ **Get your Anthropic API key:**
36
+ - Sign up at https://console.anthropic.com/
37
+ - Create an API key
38
+ - Add it to your `.env` file
39
+
40
+ ## Installation Steps
41
+
42
+ ### 1. Navigate to the project directory
43
+ ```bash
44
+ cd PM-RAG-ChatBot
45
+ ```
46
+
47
+ ### 2. Create a virtual environment (recommended)
48
+ ```bash
49
+ python -m venv venv
50
+ ```
51
+
52
+ ### 3. Activate the virtual environment
53
+
54
+ **Windows (PowerShell):**
55
+ ```powershell
56
+ .\venv\Scripts\Activate.ps1
57
+ ```
58
+
59
+ **Windows (Command Prompt):**
60
+ ```cmd
61
+ venv\Scripts\activate.bat
62
+ ```
63
+
64
+ ### 4. Install Python dependencies
65
+ ```bash
66
+ pip install -r requirements.txt
67
+ ```
68
+
69
+ **Note:** Some packages may require additional setup:
70
+ - `camelot-py[cv]` requires `ghostscript` and `tcl-tk` on some systems
71
+ - `spacy` may need a language model: `python -m spacy download en_core_web_sm`
72
+
73
+ ### 5. Install Spacy language model (if needed)
74
+ ```bash
75
+ python -m spacy download en_core_web_sm
76
+ ```
77
+
78
+ ## Running the Application
79
+
80
+ ### Option 1: Web Interface (Chainlit) - Recommended
81
+ ```bash
82
+ chainlit run app.py
83
+ ```
84
+
85
+ This will start a web server (usually at http://localhost:8000)
86
+
87
+ ### Option 2: Command Line Interface
88
+ ```bash
89
+ python main_final.py
90
+ ```
91
+
92
+ ## Project Structure
93
+
94
+ - `app.py` - Main Chainlit web application
95
+ - `main_final.py` - Command-line interface
96
+ - `srd_engine_v2.py` - Main RAG engine with knowledge base
97
+ - `srd_engine_final.py` - Core SRD chatbot engine
98
+ - `db.py` - SQLite database models
99
+ - `requirements.txt` - Python dependencies
100
+
101
+ ## Features
102
+
103
+ - Upload SRD PDF documents
104
+ - Process diagrams with vision models (Qwen2-VL and/or Claude Vision)
105
+ - Chat interface for querying documents
106
+ - Persistent chat history
107
+ - Learning from user feedback
108
+
109
+ ## Troubleshooting
110
+
111
+ 1. **"ghostscript not found" error:**
112
+ - Install Ghostscript: https://www.ghostscript.com/download/gsdnld.html
113
+
114
+ 2. **"Poppler not found" error:**
115
+ - Make sure Poppler is installed and in PATH, or set `POPPLER_PATH` in `.env`
116
+
117
+ 3. **"ANTHROPIC_API_KEY not set" error:**
118
+ - Create a `.env` file with your API key
119
+
120
+ 4. **Import errors:**
121
+ - Make sure virtual environment is activated
122
+ - Reinstall requirements: `pip install -r requirements.txt --force-reinstall`
123
+