“Predict the next token” not “Obey the instruction”

QLoRA Instruction Tuning on Pythia-1B

This repository provides a Hugging Face–compatible LoRA adapter trained via QLoRA (4-bit quantization + LoRA adapters) on the EleutherAI Pythia-1B-deduped base model.

The project focuses on producing and publishing a reusable LoRA adapter using a modern, memory-efficient instruction-tuning pipeline built with Hugging Face Transformers, PEFT, and BitsAndBytes. It is designed for learning, experimentation, and small-GPU environments (e.g. Colab).


✨ Key Features (Adapter-Centric)

  • 🔒 Frozen base model: Pythia-1B-deduped (not included in this repository)
  • 🧠 QLoRA training with 4-bit NF4 quantization
  • 🧩 LoRA adapters only are trainable (<1% parameters)
  • 💾 Optimized for low GPU memory usage
  • 📚 Clear, minimal pipeline for understanding instruction tuning

🧠 What This Adapter Represents

This adapter demonstrates how to:

  • Load a 4-bit quantized causal language model
  • Prepare it for k-bit training
  • Apply LoRA adapters for parameter-efficient fine-tuning
  • Perform instruction tuning using causal LM loss
  • Train using the Hugging Face Trainer API

Formally, training follows:

Frozen Base Model (4-bit)
+ Trainable LoRA ΔW
→ Instruction-following behavior

🏗️ Model & Training Setup

Base Model

  • Model: EleutherAI/pythia-1B-deduped
  • Architecture: Decoder-only Transformer
  • Quantization: 4-bit NF4 (BitsAndBytes)

LoRA Configuration

Parameter Value Description
r 32 LoRA rank (expressiveness)
lora_alpha 32 Scaling factor
lora_dropout 0.05 Regularization
bias none Only LoRA parameters are trained
task_type CAUSAL_LM Causal language modeling

Only LoRA parameters are trainable; all base model weights remain frozen.


📦 Dataset

  • Type: Instruction-formatted text dataset

  • Format: Each example contains a text field

  • Tokenization:

    • Max length: 512
    • Padding: max_length
    • Truncation enabled

Loss is computed using standard causal language modeling, meaning the model learns to predict the full sequence (instruction + response).


🚀 Adapter Training & Usage Pipeline

1. Load tokenizer and model

  • Load Pythia tokenizer
  • Set pad_token = eos_token
  • Load model with 4-bit quantization

2. Prepare for QLoRA training

  • Enable gradient checkpointing
  • Cast critical layers for numerical stability
  • Freeze base model parameters

3. Apply LoRA adapters

  • Inject LoRA modules into attention and MLP layers
  • Print trainable parameter count

4. Training configuration

Setting Value
Epochs 3
Batch size 6
Gradient accumulation 4
Effective batch size 24
Learning rate 2e-4
Optimizer paged_adamw_8bit
Precision FP16

5. Start

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, LoraConfig

base_model_name = "EleutherAI/pythia-1B-deduped"
lora_repo = "BEncoderRT/Pythia-QLoRA-Instruction-Tuning"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token



# Load the base model with the new quantization configuration
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    dtype=torch.bfloat16 # Corrected: Use dtype instead of torch_dtype
)

# Load the PEFT model (LoRA adapters)
model = PeftModel.from_pretrained(base_model, lora_repo)
python
# Ensure the base model is in evaluation mode
base_model.eval()

# Function to format prompts consistently with training data
def format_prompt(instruction, context=None):
    if context:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Response:\n"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

# Reuse the test_prompts defined previously
# (Assuming test_prompts is available from previous execution or defined globally)

print("\n--- Generating Responses from BASE MODEL ---\n")
with torch.no_grad():
    for i, prompt_data in enumerate(test_prompts):
        instruction = prompt_data["instruction"]
        context = prompt_data["context"]

        formatted_input = format_prompt(instruction, context)

        # Tokenize the input prompt
        inputs = tokenizer(formatted_input, return_tensors="pt").to(base_model.device)

        # Generate response using the BASE MODEL
        outputs = base_model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = generated_text[len(formatted_input):].strip()

        print(f"### Test Prompt {i+1} (BASE MODEL): ###")
        print(f"Instruction: {instruction}")
        if context:
            print(f"Context: {context}")
        print(f"Base Model Response: {response}\n")


--- Generating Responses from BASE MODEL ---

### Test Prompt 1 (BASE MODEL): ###
Instruction: Explain the concept of photosynthesis in simple terms.
Base Model Response: Photosynthesis is a process that uses sunlight and a photosynthetic pigment called chlorophyll to convert carbon dioxide into organic molecules called carbohydrates.  In this process, oxygen is converted to water and carbon dioxide.  The oxygen and carbon dioxide are then split into two molecules called O2 and C2.  The two molecules are then used to split the water into hydrogen and oxygen.  The hydrogen and oxygen are then combined to form glucose, which is the sugar that is used as a source of energy.  The process of photosynthesis is extremely important in the life of plants.  It allows plants to convert sunlight into energy that can be used to make food.  The process of photosynthesis is extremely important to the survival of plants.  It allows

### Test Prompt 2 (BASE MODEL): ###
Instruction: What is the capital of France?
Base Model Response: Paris is the capital of France.

If you are traveling to France, you should be aware of the following facts:

1. The French language is spoken on both sides of the Atlantic Ocean.
2. Paris is a city of over 1 million people.
3. The capital of France is located in Paris, just south of the city of Lyon.
4. The French capital is home to the Eiffel Tower, the Grand Tour, the Louvre, the Champs Elysees, the Arc de Triomphe, and the Pont Neuf.
5. The first French colony was established in the New World in the 16th century.
6. The French Riviera is a beautiful and historic

### Test Prompt 3 (BASE MODEL): ###
Instruction: Summarize the main idea of the following text:
Context: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.
Base Model Response: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

This is an example of a pangram. A pangram is a single-word, single-character word.

A pangram is composed of one or more of the following:

A pangram consists of only the letters in the English alphabet. The letters are always in the order in which they appear in the sentence.

A pangram consists of only the letters in the English alphabet, as in the example.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English

### Test Prompt 4 (BASE MODEL): ###
Instruction: List three benefits of regular exercise.
Base Model Response: - Improve metabolism
- Increase muscle mass
- Improve your immune system
- Increase your overall energy levels
- Improve your mood
- Reduce stress
- Improve your sleep
- Improve your memory
- Reduce your risk of chronic health problems
- Increase your longevity
- Lower your blood pressure
- Lower your cholesterol
- Increase your energy
- Increase your heart rate
- Improve your cognitive function
- Improve your ability to concentrate
- Improve your memory
- Improve your mood
- Improve your overall energy levels
- Improve your sleep
- Improve your memory
- Improve your ability to concentrate
- Improve your cognitive function
- Improve your memory
- Improve your ability to concentrate
- Improve your memory
-

### Test Prompt 5 (BASE MODEL): ###
Instruction: Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.
Base Model Response: The cat, who had been living in the home for many years, was so used to being alone that when he was moved to a new room, he didn't know what to do. In the middle of the night, he heard the sound of something moving under the bed. When he looked under the bed, he saw a strange creature in the middle of the bed. He thought it was a ghost and was scared. He tried to get out of the bed, but his feet were too big. He tried to run, but he got stuck on the floor. He thought he would die if he didn't get out of the bed. So, he got out of bed and he ran around the room, looking for a way to get out

### Test Prompt 6 (BASE MODEL): ###
Instruction: If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).
Base Model Response: The train from New York will arrive at the Chicago station around 10 AM. The train from Chicago will arrive at the New York station around 11 AM. At that time, the trains will meet. The exact time will depend on the speed of the train. If the speed of the train is 60 mph, then the time of arrival will be 11:30 AM. If the speed of the train is 50 mph, then the time of arrival will be 10:30 AM. 
The time of arrival is always the same, regardless of the speed of the train. The only time a train will arrive at the other side of the track is when the speed of the train is zero. 

The train from New York will arrive at the Chicago

### Test Prompt 7 (BASE MODEL): ###
Instruction: What is the capital of Australia?
Base Model Response: Sydney, Australia is the capital of Australia. It is also the largest city in Australia. The city is well known for having the tallest bridge in the world. The city is also home to the Australian Museum which is one of the world’s best museums. It is located in the suburb of Manly, a suburb in the city of Bondi Junction. The city is about 20 minutes away from the Gold Coast. It has a large beach and also has a famous surf board. The city has a lot of shopping opportunities like the famous shopping district of Bondi Junction. The city also has a big number of famous bars and restaurants. The city is a great place to visit when you are in Australia. It has many

### Test Prompt 8 (BASE MODEL): ###
Instruction: Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.
Base Model Response: In supervised learning, a model is trained with labeled data and then used to predict new data. In unsupervised learning, data is not labeled but is learned based on some information. For example, if we are given a large dataset of customers and we have the ability to predict whether a customer is a good customer, we can use this information to train a model to predict whether a customer is a good customer. In addition, data from the training set can be used to determine which customers are good customers, and so on.

In supervised learning, the model is trained on a labeled dataset and then used to predict new data. In unsupervised learning, the data is not labeled but is learned based on some information. For example, if we

### Test Prompt 9 (BASE MODEL): ###
Instruction: Summarize the following passage:
Context: The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.
Base Model Response: Artificial intelligence promises increased efficiency and innovative solutions, but also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.

The main challenge for society today is to balance progress with human values, and AI is a major part of this effort. We will never be able to fully harness the potential of AI, and its applications will always be subject to the same ethical issues that we face today. AI is already changing the way we do things. Its impact is already being
python
# Ensure the LoRA-tuned model is in evaluation mode
model.eval()

# Reuse the format_prompt function and test_prompts

print("\n--- Generating Responses from LO-RA TUNED MODEL ---\n")
with torch.no_grad():
    for i, prompt_data in enumerate(test_prompts):
        instruction = prompt_data["instruction"]
        context = prompt_data["context"]

        formatted_input = format_prompt(instruction, context)

        # Tokenize the input prompt
        inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)

        # Generate response using the LO-RA TUNED MODEL
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = generated_text[len(formatted_input):].strip()

        print(f"### Test Prompt {i+1} (LO-RA MODEL): ###")
        print(f"Instruction: {instruction}")
        if context:
            print(f"Context: {context}")
        print(f"LoRA Model Response: {response}\n")

--- Generating Responses from LO-RA TUNED MODEL ---

### Test Prompt 1 (LO-RA MODEL): ###
Instruction: Explain the concept of photosynthesis in simple terms.
LoRA Model Response: Photosynthesis is the process of photosynthesis and the conversion of light energy into chemical energy in the form of ATP. Photosynthesis is a very complicated process and requires a lot of energy to work. It involves converting light energy into chemical energy in the form of ATP. There are many different types of photosynthesis including photosynthesis of plant cells and photosynthesis of living organisms.

Photosynthesis is also the process that converts energy of the sun into fuel. The most common way that photosynthesis is accomplished is through the use of a process called photosynthesis. It is this process that is responsible for photosynthesis. In photosynthesis, the energy of the sun is converted into chemical energy in the form of ATP. The energy of the sun is converted into

### Test Prompt 2 (LO-RA MODEL): ###
Instruction: What is the capital of France?
LoRA Model Response: Paris is the capital of France. It is also the largest city in the world by population. The city was founded in the 11th century and has been the seat of the French government since 1831. Paris has a population of around 3.7 million people. The city has over 100 museums, including the Louvre, the Eiffel Tower, the Orsay, the Sacré-Coeur, the Pompidou, and the Gare de Lyon. The French capital has a number of famous landmarks including the Arc de Triomphe, the Eiffel Tower, and the Petit Palais. Paris is the most visited city in the world by tourists. The city is home to some of the most famous artists, including

### Test Prompt 3 (LO-RA MODEL): ###
Instruction: Summarize the main idea of the following text:
Context: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.
LoRA Model Response: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

The quick brown fox jumps over the lazy dog. This sentence is often used to display all

### Test Prompt 4 (LO-RA MODEL): ###
Instruction: List three benefits of regular exercise.
LoRA Model Response: Regular exercise is one of the most important things you can do for your health.  Exercise is important for your brain and nervous system.  Regular exercise is also important for your bones.  It will help you sleep better.  It will help you avoid many illnesses, including diabetes.  It will help you get a good night's sleep.  It will help you lose weight.  It will help you keep your muscles strong.  It will help you stay healthy.  It will help you avoid many diseases, including heart disease, cancer, and stroke.  It will help you avoid many illnesses, including asthma, emphysema, and some types of dementia.  It will help you avoid many illnesses, including diabetes, heart

### Test Prompt 5 (LO-RA MODEL): ###
Instruction: Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.
LoRA Model Response: Cat's owner is reading a book on the couch, and her cat is watching her. Suddenly, the cat looks up from the book, and says, "I can see the portal!" The owner is so excited that she reaches for the book, and the cat opens it, revealing the portal. The owner is so excited that she falls into the portal and disappears. The cat is so sad that he missed the portal, but he is glad that he was able to get back. The owner runs through the portal, and the cat watches her disappear, and then he runs through the portal too. He is so happy that he was able to get back, but he is sad that he missed the portal. The cat is sad that he missed the

### Test Prompt 6 (LO-RA MODEL): ###
Instruction: If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).
LoRA Model Response: The answer is: they meet at 8:05 AM.

The reason is because the time it takes for one train to travel a given distance is a function of the speed of the train.  If the speed of the train is 60 mph, then the time it takes for one train to travel 800 miles is 8 hours, or about 8 hours and 50 minutes.

If the speed of the train is 50 mph, then the time it takes for one train to travel 800 miles is 14 hours, or about 14 hours and 20 minutes.

So, by the time the train meets, it will have traveled 800 miles.

In this example, Chicago is at 8:05 AM, New York is at 10:05

### Test Prompt 7 (LO-RA MODEL): ###
Instruction: What is the capital of Australia?
LoRA Model Response: Canberra is the capital of Australia. It is located in the state of Australia's north east and is about 250 miles (400 km) from Sydney. The city of Sydney is the main city in Australia and the fourth most populous city. Australia is one of the largest countries in the world by area and has a population of about 24 million people.

The city of Canberra is known for being home to the Australian Parliament and the National Library of Australia. The Australian Parliament was moved to the National Library in 2016. It is also home to the Australian Capital Territory Library and the Australian National University Library. The National Library of Australia is the largest library in Australia. The National Library of Australia has more than 1.3 million books

### Test Prompt 8 (LO-RA MODEL): ###
Instruction: Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.
LoRA Model Response: In supervised learning, the training data is used to train the model. In unsupervised learning, the model is not trained on the data, but rather, it learns to use data to predict unknown future events.  For example, if you were to use a machine learning algorithm to classify a person's age, you would use a supervised algorithm to train your model to predict when the person will be 30, and you would use an unsupervised algorithm to predict when the person will be 60.  You would then use the trained model to predict when the person will be 90, and you would use the trained model to predict when the person will be 120.  In this example, you would use the model to predict when the person will be 30,

### Test Prompt 9 (LO-RA MODEL): ###
Instruction: Summarize the following passage:
Context: The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.
LoRA Model Response: AI has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.

Artificial intelligence has made it possible for computers to think. This has created a new generation of machines that are able to do things that humans are not. Some of these machines are capable of doing more than humans are able to do.

The first

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BEncoderRT/Pythia-QLoRA-Instruction-Tuning

Adapter
(1)
this model

Dataset used to train BEncoderRT/Pythia-QLoRA-Instruction-Tuning