paligemma2_vizwiz_ft2

This model is a fine-tuned version of google/paligemma2-3b-pt-448 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 4000

Training results

Training Loss Epoch Step Validation Loss
0.7985 0.1308 500 0.7325
0.7331 0.2616 1000 0.7017
0.6924 0.3924 1500 0.6753
0.6626 0.5232 2000 0.6491
0.7161 0.6540 2500 0.6260
0.6387 0.7848 3000 0.6122
0.4991 0.9156 3500 0.6080
0.3359 1.0463 4000 0.6072

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.0+cu126
  • Datasets 4.4.2
  • Tokenizers 0.22.1
Downloads last month
393
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ebrukilic/paligemma2_vizwiz_ft2

Finetuned
(31)
this model
Adapters
2 models

Collection including ebrukilic/paligemma2_vizwiz_ft2

Evaluation results