fedora-copr/granite-4.0-h-tiny-quantized.w8a8

This is a W8A8 INT8 quantized version of ibm-granite/granite-4.0-h-tiny.

Model Details

  • Quantized by: Jiri Podivin jpodivin@redhat.com

  • Architecture: Granite-4.0 Hybrid MoE (Mamba + Transformer)

  • Quantization: INT8 Weight & Activation (W8A8)

  • Engine Support: vLLM (0.6.0+)

Performance & Accuracy

Results of short benchmark run executed with lm_eval are stored in eval_results.json.

Implementation

The quantization was performed using the llm-compressor library.

vLLM Serving

vllm serve fedora-copr/granite-4.0-h-tiny-quantized.w8a8 --quantization compressed-tensors
Downloads last month
153
Safetensors
Model size
7B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fedora-copr/granite-4.0-h-tiny-quantized.w8a8

Quantized
(33)
this model