fedora-copr/granite-4.0-h-tiny-quantized.w8a8
This is a W8A8 INT8 quantized version of ibm-granite/granite-4.0-h-tiny.
Model Details
Quantized by: Jiri Podivin jpodivin@redhat.com
Architecture: Granite-4.0 Hybrid MoE (Mamba + Transformer)
Quantization: INT8 Weight & Activation (W8A8)
Engine Support: vLLM (0.6.0+)
Performance & Accuracy
Results of short benchmark run executed with lm_eval are stored in eval_results.json.
Implementation
The quantization was performed using the llm-compressor library.
vLLM Serving
vllm serve fedora-copr/granite-4.0-h-tiny-quantized.w8a8 --quantization compressed-tensors
- Downloads last month
- 153
Model tree for fedora-copr/granite-4.0-h-tiny-quantized.w8a8
Base model
ibm-granite/granite-4.0-h-tiny