Shipping/serving improvements

#22
by jbakerx - opened

If you plan a public demo:
provide quantized inference configs (int8/int4 where appropriate)
add streaming generation + max length guards
add “safe defaults” decoding presets for creativity vs coherence

Sign up or log in to comment