admarcosai 's Collections Alignment: FineTuning-Preference
updated
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
• 2311.03285
• Published
• 31
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper
• 2311.02805
• Published
• 6
Ultra-Long Sequence Distributed Transformer
Paper
• 2311.02382
• Published
• 6
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper
• 2309.11235
• Published
• 15
SiRA: Sparse Mixture of Low Rank Adaptation
Paper
• 2311.09179
• Published
• 9
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published
• 47
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
• 2311.13231
• Published
• 28
Language Models are Super Mario: Absorbing Abilities from Homologous
Models as a Free Lunch
Paper
• 2311.03099
• Published
• 30
Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models
Paper
• 2312.07046
• Published
• 15
"I Want It That Way": Enabling Interactive Decision Support Using Large
Language Models and Constraint Programming
Paper
• 2312.06908
• Published
• 8
Federated Full-Parameter Tuning of Billion-Sized Language Models with
Communication Cost under 18 Kilobytes
Paper
• 2312.06353
• Published
• 7
TOFU: A Task of Fictitious Unlearning for LLMs
Paper
• 2401.06121
• Published
• 20
Patchscope: A Unifying Framework for Inspecting Hidden Representations
of Language Models
Paper
• 2401.06102
• Published
• 22
Tuning LLMs with Contrastive Alignment Instructions for Machine
Translation in Unseen, Low-resource Languages
Paper
• 2401.05811
• Published
• 8
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
• 2401.02412
• Published
• 38
TrustLLM: Trustworthiness in Large Language Models
Paper
• 2401.05561
• Published
• 69
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
• 2310.13639
• Published
• 25
selfrag/selfrag_train_data
Viewer
• Updated
• 146k • 117
• 75
Efficient Exploration for LLMs
Paper
• 2402.00396
• Published
• 22
Structured Code Representations Enable Data-Efficient Adaptation of Code
Language Models
Paper
• 2401.10716
• Published
• 1
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
• 2401.06080
• Published
• 28
Secrets of RLHF in Large Language Models Part I: PPO
Paper
• 2307.04964
• Published
• 30
Transforming and Combining Rewards for Aligning Large Language Models
Paper
• 2402.00742
• Published
• 12
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
SciGLM: Training Scientific Language Models with Self-Reflective
Instruction Annotation and Tuning
Paper
• 2401.07950
• Published
• 4
Generative Representational Instruction Tuning
Paper
• 2402.09906
• Published
• 54
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Paper
• 2402.10210
• Published
• 35
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
• 2402.10893
• Published
• 12