Sarvam-30B 8-Bit (BitsAndBytes)

This repository provides an 8-bit quantized version of the base model sarvamai/sarvam-30b using bitsandbytes.

8-bit quantization reduces memory usage while maintaining very high model quality.

Base model sarvamai/sarvam-30b

Architecture SarvamMoEForCausalLM

Quantization Details

Quantization method: BitsAndBytes 8-bit

Configuration used:

load_in_8bit = True

Approximate GPU memory usage:

Model	GPU VRAM
FP16 original	~60 GB
8-bit	~30 GB

This version provides near-FP16 quality while using roughly half the memory.

Installation

Install dependencies.

pip install transformers accelerate bitsandbytes torch safetensors

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "neuralnets/sarvam-30b-8bit",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuralnets/sarvam-30b-8bit",
    trust_remote_code=True
)

Example Inference

prompt = "Explain mixture of experts in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Hardware Requirements

Recommended GPUs:

A100 40GB or 80GB
RTX 4090
RTX 3090

CPU RAM recommendation:

32 GB or more

Notes

Uses bitsandbytes 8-bit quantization integrated with Hugging Face Transformers.
Requires trust_remote_code=True due to the Sarvam architecture.
Suitable for high-quality inference.

Base Model

Original model repository:

sarvamai/sarvam-30b

Refer to the base model page for detailed information about training and architecture.

License

This repository distributes a quantized derivative of the upstream model.

Users must comply with the license of the original model:

sarvamai/sarvam-30b

Downloads last month: 113

Safetensors

Model size

32B params

Tensor type

F32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neuralnets/sarvam-30b-8bit

Base model

sarvamai/sarvam-30b

Quantized

(9)

this model