Burkov
Andriy
AI & ML interests
None yet
Organizations
None yet
Issues with FSDP and DeepSpeed During Distributed Training for Gemma
π
2
5
#30 opened over 1 year ago
by
anandhperumal
How does v0.2 manages to support 32k token context without Sliding Window Attention?
4
#85 opened over 1 year ago
by
Andriy
What is the max. content length of Mistral-7B-Instruct-v0.2?
17
#43 opened almost 2 years ago
by
hanshupe
Longer inference time
2
#4 opened over 1 year ago
by
dittops
What the SFT data?
π
3
5
#7 opened about 2 years ago
by
Ede-CH
Dataset?
5
#1 opened almost 2 years ago
by
0xbitches
Questions about architecture (+ LoRA)
2
#16 opened almost 2 years ago
by
alex0dd
Can you tell us the original models that you merged to create this modelοΌ
π
3
1
#3 opened about 2 years ago
by
Bruce001