paligemma2_vizwiz_ft

This model is a fine-tuned version of google/paligemma2-3b-pt-448 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
1.0059	0.2616	1000	0.9810
0.9879	0.5232	2000	0.9459
0.9642	0.7848	3000	0.8539
0.3278	1.0463	4000	0.9245
0.4549	1.3079	5000	0.8947
0.3203	1.5695	6000	0.9701
0.3636	1.8311	7000	0.8976
0.0711	2.0926	8000	1.1249
0.1172	2.3542	9000	1.3376
0.1118	2.6158	10000	1.3872
0.1114	2.8774	11000	1.3993

Safetensors

Model size

3B params

Tensor type

BF16

Base model

Finetuned

(31)

this model