Sarvam-30B 8-Bit (BitsAndBytes)

This repository provides an 8-bit quantized version of the base model sarvamai/sarvam-30b using bitsandbytes.

8-bit quantization reduces memory usage while maintaining very high model quality.

Base model sarvamai/sarvam-30b

Architecture SarvamMoEForCausalLM


Quantization Details

Quantization method: BitsAndBytes 8-bit

Configuration used:

  • load_in_8bit = True

Approximate GPU memory usage:

Model GPU VRAM
FP16 original ~60 GB
8-bit ~30 GB

This version provides near-FP16 quality while using roughly half the memory.


Installation

Install dependencies.

pip install transformers accelerate bitsandbytes torch safetensors

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "neuralnets/sarvam-30b-8bit",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuralnets/sarvam-30b-8bit",
    trust_remote_code=True
)

Example Inference

prompt = "Explain mixture of experts in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Hardware Requirements

Recommended GPUs:

  • A100 40GB or 80GB
  • RTX 4090
  • RTX 3090

CPU RAM recommendation:

  • 32 GB or more

Notes

  • Uses bitsandbytes 8-bit quantization integrated with Hugging Face Transformers.
  • Requires trust_remote_code=True due to the Sarvam architecture.
  • Suitable for high-quality inference.

Base Model

Original model repository:

sarvamai/sarvam-30b

Refer to the base model page for detailed information about training and architecture.


License

This repository distributes a quantized derivative of the upstream model.

Users must comply with the license of the original model:

sarvamai/sarvam-30b

Downloads last month
113
Safetensors
Model size
32B params
Tensor type
F32
F16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for neuralnets/sarvam-30b-8bit

Quantized
(9)
this model