LoRA Fine-tuning

Training Q-Align / OneAlign is resource-consuming. While it has shown good performance on lots of datasets, if new datasets come, we will still need to adapt it to newer ones.

Can we make this add-on adaptation more efficient?

Yes, we can.

We propose a more efficient LoRA (tunes less parameter than LLaVA-style default LoRA), which only needs to tune 149M parameters (1.8\% compared with full version Q-Align), and requires only 2 RTX3090 GPUs (available to many independent researches). To do this, simply run

sh scripts/${YOUR_DATASET}_lora.sh

The available template dataset options are agi (for AGIQA-3K), cgi (for CGIQA-6K), livec (for LIVE-Challenge), csiq (for CSIQ) and maxwell (for MaxWell, videos).

Please come with your datasets! (See here for examples on dataset preparation.)

Note: we do not encourage fine-tuning on datasets that are very similar with the original training corpus of OneAlign (might make your adapted model less robust). Just use the Q-Align.

To evaluate, please refer to the code below:

IQA (all excluding MaxWell)

python q_align/evaluate/iqa_eval_lora_split.py --model-path ${YOUR_MODEL_PATH} --model-base q-future/one-align

By default (if YOUR_MODEL_PATH is not specified), if will automatically evaluate on the test set of AGIQA-3K (split 1). a

VQA (MaxWell)

Please modify the q_align/evaluate/vqa_eval.py to only evaluate on MaxWell.

python q_align/evaluate/iqa_eval_lora_split.py --model-path q-future/q-align-maxwell-lora --model-base q-future/one-align

Performance Report

Dataset	AGIQA-3K	CGIQA-6K	LIVE-C	CSIQ	MaxWell
Before LoRA Fine-tuning	0.802/0.838	0.448/0.470	0.881/0.894	0.881/0.906	0.780/0.787
After LoRA Fine-tuning	0.880/0.920	0.847/0.849	0.920/0.934	0.929/0.949	0.803/0.816