Burkov's picture

52 8

Burkov

Andriy

·

AI & ML interests

None yet

Organizations

None yet

New activity in google/gemma-2-9b over 1 year ago

Issues with FSDP and DeepSpeed During Distributed Training for Gemma

#30 opened over 1 year ago by

New activity in mistralai/Mistral-7B-Instruct-v0.2 over 1 year ago

How does v0.2 manages to support 32k token context without Sliding Window Attention?

#85 opened over 1 year ago by

What is the max. content length of Mistral-7B-Instruct-v0.2?

#43 opened almost 2 years ago by

New activity in 1bitLLM/bitnet_b1_58-3B over 1 year ago

Longer inference time

#4 opened over 1 year ago by

New activity in 01-ai/Yi-34B-Chat over 1 year ago

What the SFT data?

#7 opened about 2 years ago by

New activity in abacaj/phi-2-super almost 2 years ago

Dataset?

#1 opened almost 2 years ago by

New activity in abacusai/Smaug-72B-v0.1 almost 2 years ago

Questions about architecture (+ LoRA)

#16 opened almost 2 years ago by

New activity in OpenPipe/mistral-ft-optimized-1218 about 2 years ago

Can you tell us the original models that you merged to create this model？

#3 opened about 2 years ago by