---
title: ML Starter MCP Server
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "6.0.0"
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server that exposes a problem-specific ML codes
tags:
  - building-mcp-track-enterprise
  - gradio
  - mcp
  - retrieval
  - embeddings
  - python
  - knowledge-base
  - semantic-search
  - sentence-transformers
  - huggingface
---

# ML Starter MCP Server
<p align="center">
  <img src="https://dummyimage.com/1000x180/020617/ffffff&text=ML+Starter+MCP+Server" height="90px" alt="ML Starter Banner">
</p>

Gradio-powered **remote-only** MCP server that exposes a curated ML knowledge base through deterministic, read-only tooling. Ideal for editors like Claude Desktop, VS Code (Kilo Code), or Cursor that want a trustworthy retrieval endpoint with **no side-effects**.

![Python](https://img.shields.io/badge/python-3.10%2B-blue) ![License](https://img.shields.io/badge/license-Apache%202.0-green) ![Status](https://img.shields.io/badge/Status-Active-success) ![MCP](https://img.shields.io/badge/MCP-enabled-brightgreen) ![Retrieval](https://img.shields.io/badge/Retrieval-pure-lightgrey) ![SentenceTransformers](https://img.shields.io/badge/Embeddings-all--MiniLM--L6--v2-6f42c1)

---

## 🧩 Overview

The **ML Starter MCP Server** indexes the entire `knowledge_base/` tree (audio, vision, NLP, RL, etc.) and makes it searchable through:

* `list_items` – enumerate every tutorial/script with metadata.
* `semantic_search` – vector search over docstrings and lead context to find the single best code example for a natural-language brief.
* `get_code` – return the full Python source for a safe, validated path.

The server is deterministic (seeded numpy/torch), write-protected, and designed to run as a **Gradio MCP SSE endpoint** suitable for Hugging Face Spaces or on-prem deployments.

---

## 📚 ML Starter Knowledge Base

* Root: `knowledge_base/`
* Domains:
  * `audio/`
  * `generative/`
  * `graph/`
  * `nlp/`
  * `rl/`
  * `structured_data/`
  * `timeseries/`
  * `vision/`
* Each file stores a complete, runnable ML example with docstring summaries leveraged during indexing.

### Features exposed via MCP

* ✅ Vector search via `sentence-transformers/all-MiniLM-L6-v2` with cosine similarity.
* ⚙️ Safe path resolution ensures only in-repo `.py` files can be fetched.
* 🧮 Metadata-first outputs (category, filename, semantic score) for quick triage.
* 🛡️ Read-only contract; zero KB mutations, uploads, or side effects.
* 🌐 Spaces-ready networking with auto `0.0.0.0` binding when environment variables are provided by the platform.

---

## 🎬 Demo


[![Watch the video](https://img.youtube.com/vi/THTQLhsiFl8/0.jpg)](https://www.youtube.com/watch?v=THTQLhsiFl8)


---

## 🚀 Quick Start

### Installation

```bash
pip install -r requirements.txt
```


### MCP Settings

```json
{
  "mcpServers": {
    "ML-Starter": {
      "url": "https://mcp-1st-birthday-ml-starter.hf.space/gradio_api/mcp/"
    }
  }
}
```

### Environment Variables

```bash
export TOKENIZERS_PARALLELISM=false
export PYTORCH_ENABLE_MPS_FALLBACK=1  # optional, improves macOS stability
```

---

## 🧠 MCP Usage

Any MCP-capable client can connect to the SSE endpoint to:

* Browse the full inventory of ML tutorials.
* Submit a markdown problem statement and receive the best-matching file path plus relevance score.
* Fetch the code immediately and render it inline (clients typically syntax-highlight the response).

The Gradio UI mirrors these capabilities via three tabs (List Items, Semantic Search, Get Code) for manual exploration.

---

## 🔤 Supported Embeddings

* `sentence-transformers/all-MiniLM-L6-v2`

### Configuration Example

```yaml
embedding_model: sentence-transformers/all-MiniLM-L6-v2
batch_size: 32
similarity: cosine
```

---

## 🔍 Retrieval Strategy

| Component            | Description                                                  |
|----------------------|--------------------------------------------------------------|
| Index Type           | In-memory cosine index backed by numpy vectors               |
| Chunking             | File-level (docstring + prefix)                              |
| Similarity Function  | Dot product on L2-normalized vectors                         |
| Results Returned     | Top-1 match (deterministic)                                  |

### Configuration Example

```yaml
retriever: cosine
max_results: 1
```

---

## 🧩 Folder Structure

```
ml-starter/
├── app.py                  # Optional Gradio hook
├── mcp_server/
│   ├── server.py           # Remote MCP entrypoint & UI builder
│   ├── loader.py           # KB scanning + safe path resolution
│   ├── embeddings.py       # MiniLM wrapper + cosine index
│   └── tools/
│       ├── list_items.py   # list_items()
│       ├── semantic_search.py  # semantic_search()
│       └── get_code.py     # get_code()
├── knowledge_base/         # ML examples grouped by domain
├── requirements.txt
└── README.md
```

---

## 🔧 MCP Tools (`mcp_server/server.py`)

| MCP Tool       | Python Function                    | Description                                                                             |
|----------------|------------------------------------|-----------------------------------------------------------------------------------------|
| `list_items`   | `list_items()`                      | Enumerates every KB entry with category, filename, absolute path, and summary metadata. |
| `semantic_search` | `semantic_search(problem_markdown: str)` | Embeds the prompt and returns the single best match plus cosine score.                  |
| `get_code`     | `get_code(path: str)`              | Streams back the full Python source for a validated KB path.                            |

`server.py` registers these functions with Gradio's MCP adapter, wires docstrings into tool descriptions, and ensures the SSE endpoint stays read-only.

---


## 📥 Inputs

### 1. `list_items`

No input parameters; returns the entire catalog.

### 2. `semantic_search`

<details>
<summary>Input Model</summary>

| Field            | Type   | Description                                             | Example                                                         |
|------------------|--------|---------------------------------------------------------|-----------------------------------------------------------------|
| problem_markdown | str    | Natural-language description of the ML task or need.    | "I need a transformer example for multilingual NER."           |
</details>

### 3. `get_code`

<details>
<summary>Input Model</summary>

| Field | Type | Description                                   | Example                                              |
|-------|------|-----------------------------------------------|------------------------------------------------------|
| path  | str  | KB-relative or absolute path to a `.py` file. | "knowledge_base/nlp/text_classification_from_scratch.py" |
</details>

---

## 📤 Outputs

### 1. `list_items`

<details>
<summary>Response Example</summary>

```json
[
  {
    "id": "nlp/text_classification_with_transformer.py",
    "category": "nlp",
    "filename": "text_classification_with_transformer.py",
    "path": "knowledge_base/nlp/text_classification_with_transformer.py",
    "summary": "Fine-tune a Transformer for sentiment classification."
  }
]
```
</details>

### 2. `semantic_search`

<details>
<summary>Response Example</summary>

```json
{
  "best_match": "knowledge_base/nlp/text_classification_with_transformer.py",
  "score": 0.89
}
```
</details>

### 3. `get_code`

<details>
<summary>Response Example</summary>

```json
{
  "path": "knowledge_base/vision/grad_cam.py",
  "source": "<full Python source>"
}
```
</details>

Each response is deterministic for the same corpus and embeddings, allowing MCP clients to trust caching and diffing workflows.

---

## 👥 Team

**Team Name:** Hepheon

**Team Members:**
- **Tutkum Akyildiz** - [@Tutkum](https://huggingface.co/Tutkum) - Product
- **Emre Atilgan** - [@emreatilgan](https://huggingface.co/emreatilgan) - Tech

---

## 📣 Social Media Post

- https://www.reddit.com/r/mcp/comments/1p8cqcv/built_an_mcp_server_that_semantically_searches/

---

## 🛠️ Next Steps

Today the knowledge base focuses on curated **Keras** walkthroughs. Upcoming updates will expand coverage to include:

* TensorFlow
* PyTorch 
* scikit-learn
* ...

These additions will land in the same deterministic retrieval flow, making mixed-framework discovery as seamless as the current experience.

---

## 📘 License

This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for full terms.

---

<p align="center">
  <sub>Built with ❤️ for the ML Starter knowledge base • Apache 2.0</sub>
</p>