Hub documentation
Using Xet Storage
Using Xet Storage
Python
To access a Xet-aware version of the huggingface_hub, simply install the latest version:
pip install -U huggingface_hub
As of huggingface_hub 0.32.0, this will also install hf_xet. The hf_xet package integrates huggingface_hub with xet-core, the Rust client for the Xet backend.
If you use the transformers or datasets libraries, it’s already using huggingface_hub. So long as the version of huggingface_hub >= 0.32.0, no further action needs to be taken.
Where versions of huggingface_hub >= 0.30.0 and < 0.32.0 are installed, hf_xet must be installed explicitly:
pip install -U hf-xet
And that’s it! You now get the benefits of Xet deduplication for both uploads and downloads. Team members using a version of huggingface_hub < 0.30.0 will still be able to upload and download repositories through the backwards compatibility provided by the LFS bridge.
To see more detailed usage docs, refer to the huggingface_hub docs for:
Git
Git users can access the benefits of Xet by downloading and installing the Git Xet extension. Once installed, simply use the standard workflows for managing Hub repositories with Git - no additional changes necessary.
Prerequisites
Install on macOS or Linux (amd64 or aarch64)
Install using an installation script with the following command in your terminal (requires curl and unzip):
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | shOr, install using Homebrew:
brew install git-xet
git xet installTo verify the installation, run:
git xet --versionWindows (amd64)
Using winget:
winget install git-xetUsing an installer:
- Download
git-xet-windows-installer-x86_64.zip(available here) and unzip. - Run the
msiinstaller file and follow the prompts.
Manual installation:
- Download
git-xet-windows-x86_64.zip(available here) and unzip. - Place the extracted
git-xet.exeunder aPATHdirectory. - Run
git xet installin a terminal.
To verify the installation, run:
git xet --versionUsing Git Xet
Once installed on your platform, using Git Xet is as simple as following the Hub’s standard Git workflows.
Make sure all prerequisites are installed and configured, follow the setup instructions for working with repositories on the Hub, then commit your changes, and push to the Hub:
# Create any files you like! Then...
git add .
git commit -m "Uploading new models" # You can choose any descriptive message
git pushUnder the hood, the Xet protocol is invoked to upload large files directly to Xet storage, increasing upload speeds through the power of chunk-level deduplication.
Uninstall on macOS or Linux
Using Homebrew:
git xet uninstall brew uninstall git-xet
If you used the installation script (for MacOS or Linux), run the following in your terminal:
git xet uninstall
sudo rm $(which git-xet)Uninstall on Windows
If you used winget:
winget uninstall git-xetIf you used the installer:
- Navigate to Settings -> Apps -> Installed apps
- Find “Git-Xet”.
- Select the “Uninstall” option available in the context menu.
If you manually installed:
- Run
git xet uninstallin a terminal. - Delete the
git-xet.exefile from the location where it was originally placed.
Recommendations
Xet integrates seamlessly with all of the Hub’s workflows. However, there are a few steps you may consider to get the most benefits from Xet storage.
When uploading or downloading with Python:
- Make sure
hf_xetis installed: While Xet remains backward compatible with legacy clients optimized for Git LFS, thehf_xetintegration withhuggingface_hubdelivers optimal chunk-based performance and faster iteration on large files. - Adaptive concurrency is on by default:
hf_xetautomatically adjusts the number of parallel transfer streams based on real-time network conditions — no configuration required. The default settings will saturate most network paths without any tuning. - Advanced tuning: For fine-grained control,
HF_XET_FIXED_DOWNLOAD_CONCURRENCYandHF_XET_FIXED_UPLOAD_CONCURRENCYlet you pin concurrency to a fixed value, bypassing the adaptive controller. Seehf_xet’s environment variables for the full list of options.
When uploading or downloading in Git or Python:
- Leverage frequent, incremental commits: Xet’s chunk-level deduplication means you can safely make incremental updates to models or datasets. Only changed chunks are uploaded, so frequent commits are both fast and storage-efficient.
- Be Specific in .gitattributes: When defining patterns for Xet or LFS, use precise file extensions (e.g.,
*.safetensors,*.bin) to avoid unnecessarily routing smaller files through large-file storage. - Prioritize community access: Xet substantially increases the efficiency and scale of large file transfers. Instead of structuring your repository to reduce its total size (or the size of individual files), organize it for collaborators and community users so they may easily navigate and retrieve the content they need.
Environment Variables
Both hf_xet and Git Xet are powered by xet-core, which can be configured via environment variables. The tables below list the individual variables for fine-grained control. Most users will not need to change any of these — the defaults are tuned to saturate most network paths automatically.
HF_XET_HIGH_PERFORMANCE=1is a convenience flag that adjusts several settings at once (concurrency bounds, buffer sizes, and parallel file limits). It is intended for machines with high bandwidth and at least 64 GB of RAM for buffering. On machines with less memory, it may degrade performance.
Adaptive Concurrency
By default, xet-core uses adaptive concurrency — dynamically adjusting parallelism based on real-time network conditions. These are advanced settings that are unlikely to be needed in most cases. The variables below control the adaptive controller’s behavior:
| Environment Variable | Default | Description |
|---|---|---|
HF_XET_CLIENT_ENABLE_ADAPTIVE_CONCURRENCY | true | Enable or disable adaptive concurrency control. When disabled, concurrency stays at the initial value. |
HF_XET_CLIENT_AC_INITIAL_UPLOAD_CONCURRENCY | 1 | Starting number of concurrent upload streams. HP mode: 16. |
HF_XET_CLIENT_AC_INITIAL_DOWNLOAD_CONCURRENCY | 1 | Starting number of concurrent download streams. HP mode: 16. |
HF_XET_CLIENT_AC_MIN_UPLOAD_CONCURRENCY | 1 | Lower bound for upload concurrency. HP mode: 4. |
HF_XET_CLIENT_AC_MIN_DOWNLOAD_CONCURRENCY | 1 | Lower bound for download concurrency. HP mode: 4. |
HF_XET_CLIENT_AC_MAX_UPLOAD_CONCURRENCY | 64 | Upper bound for upload concurrency. HP mode: 124. |
HF_XET_CLIENT_AC_MAX_DOWNLOAD_CONCURRENCY | 64 | Upper bound for download concurrency. HP mode: 124. |
HF_XET_CLIENT_AC_TARGET_RTT | 60s | Target round-trip time. Concurrency increases as long as the predicted round-trip time for a full transfer is below this value. |
HF_XET_CLIENT_AC_HEALTHY_SUCCESS_RATIO_THRESHOLD | 0.8 | Success ratio above which the controller increases concurrency. |
HF_XET_CLIENT_AC_UNHEALTHY_SUCCESS_RATIO_THRESHOLD | 0.5 | Success ratio below which the controller decreases concurrency. |
HF_XET_CLIENT_AC_LOGGING_INTERVAL_MS | 10000 | Interval (in ms) at which concurrency status is logged. |
To pin concurrency to a fixed value (bypassing the adaptive controller), use the convenience aliases
HF_XET_FIXED_UPLOAD_CONCURRENCYandHF_XET_FIXED_DOWNLOAD_CONCURRENCY. These set the initial, minimum, and maximum concurrency to the same value.
Network and Retry
| Environment Variable | Default | Description |
|---|---|---|
HF_XET_CLIENT_RETRY_MAX_ATTEMPTS | 5 | Maximum number of retry attempts for failed requests. |
HF_XET_CLIENT_RETRY_BASE_DELAY | 3000ms | Base delay between retries (with exponential backoff). |
HF_XET_CLIENT_RETRY_MAX_DURATION | 360s | Maximum total time to spend retrying a request. |
HF_XET_CLIENT_CONNECT_TIMEOUT | 60s | TCP connection timeout. |
HF_XET_CLIENT_READ_TIMEOUT | 120s | Read timeout for HTTP responses. |
HF_XET_CLIENT_IDLE_CONNECTION_TIMEOUT | 60s | Timeout before idle connections are closed. |
HF_XET_CLIENT_MAX_IDLE_CONNECTIONS | 16 | Maximum number of idle connections in the pool. |
Data Transfer
| Environment Variable | Default | Description |
|---|---|---|
HF_XET_DATA_MAX_CONCURRENT_FILE_INGESTION | 8 | Maximum number of files processed concurrently during upload. HP mode: 100. |
HF_XET_DATA_MAX_CONCURRENT_FILE_DOWNLOADS | 8 | Maximum number of files downloaded concurrently. |
HF_XET_DATA_INGESTION_BLOCK_SIZE | 8mb | Size of blocks read during file ingestion. |
HF_XET_DATA_PROGRESS_UPDATE_INTERVAL | 200ms | How often progress bars are updated. |
Download Buffers
These control memory usage during downloads. HF_XET_HIGH_PERFORMANCE=1 raises these significantly.
| Environment Variable | Default | HP Mode | Description |
|---|---|---|---|
HF_XET_RECONSTRUCTION_MIN_RECONSTRUCTION_FETCH_SIZE | 256mb | 1gb | Minimum fetch size for reconstruction requests. |
HF_XET_RECONSTRUCTION_MAX_RECONSTRUCTION_FETCH_SIZE | 8gb | 16gb | Maximum fetch size for reconstruction requests. |
HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_SIZE | 2gb | 16gb | Total download buffer size. |
HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_PERFILE_SIZE | 512mb | 2gb | Per-file download buffer size. |
HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_LIMIT | 8gb | 64gb | Hard limit on total download buffer memory. |
Logging
| Environment Variable | Default | Description |
|---|---|---|
HF_XET_LOG_DEST | (none) | Log destination (e.g. a file path). When unset, logs go to stderr. |
HF_XET_LOG_FORMAT | (none) | Log format. |
HF_XET_LOG_PREFIX | xet | Prefix for log messages. |
Current Limitations
While Xet brings fine-grained deduplication and enhanced performance to Git-based storage, some features and platform compatibilities are still in development. As a result, keep the following constraints in mind when working with a Xet-enabled repository:
- 64-bit systems only: Both
hf_xetand Git Xet currently require a 64-bit architecture; 32-bit systems are not supported.