fedora-copr
/

granite-4.0-h-tiny-quantized.w8a8

Text Generation

granitemoehybrid

Mixture of Experts

8-bit precision

compressed-tensors

Model card Files Files and versions

fedora-copr/granite-4.0-h-tiny-quantized.w8a8

This is a W8A8 INT8 quantized version of ibm-granite/granite-4.0-h-tiny.

Model Details

Quantized by: Jiri Podivin jpodivin@redhat.com
Architecture: Granite-4.0 Hybrid MoE (Mamba + Transformer)
Quantization: INT8 Weight & Activation (W8A8)
Engine Support: vLLM (0.6.0+)

Performance & Accuracy

Results of short benchmark run executed with lm_eval are stored in eval_results.json.

Implementation

The quantization was performed using the llm-compressor library.

vLLM Serving

vllm serve fedora-copr/granite-4.0-h-tiny-quantized.w8a8 --quantization compressed-tensors

Downloads last month: 153

Safetensors

Model size

7B params

Tensor type

BF16

·

I8

·

Model tree for fedora-copr/granite-4.0-h-tiny-quantized.w8a8

Base model

ibm-granite/granite-4.0-h-tiny

Quantized

(33)

this model