GLM-4.5-Iceblink-v2-106B-A12B-106B-A12B-GGUF

This repository contains several custom GGUF quantizations of zerofata/GLM-4.5-Iceblink-v2-106B-A12B, to be used with llama.cpp.

The naming scheme for these custom quantizations is as follows:

ModelName-DefaultType-FFN-UpType-GateType-DownType.gguf

Where DefaultType refers to the default tensor type, and UpType, GateType, and DownType refer to the tensor types used for the ffn_up_exps, ffn_gate_exps, and ffn_down_exps tensors respectively.

Quantizations

These quantizations use Q8_0 for all tensors by default, including the dense FFN block. Only the conditional experts are downgraded. The shared expert is always kept in Q8_0. They were quantized using my own imatrix (the calibration text corpus can be found here).

Filename Size (GB) Size (GiB) Average BPW Direct link
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ3_S-IQ4_NL.gguf 60.94 56.76 4.41 Download
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-IQ4_NL.gguf 64.39 59.97 4.66 Download
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf 68.63 63.92 4.97 Download
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-Q4_K-Q4_K-Q8_0.gguf 83.49 77.76 6.05 Download
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf 91.97 85.66 6.66 Download
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf 100.99 94.06 7.31 Download
GLM-4.5-Iceblink-v2-106B-A12B-Q8_0.gguf 117.45 109.38 8.51 Download
GLM-4.5-Iceblink-v2-106B-A12B-bf16.gguf 220.98 205.81 16.00 Download
Downloads last month
1,424
GGUF
Model size
110B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ddh0/GLM-4.5-Iceblink-v2-106B-A12B-GGUF

Quantized
(7)
this model