A Beginner’s Guide to Fine-Tuning Gemma

6 min readFeb 21, 2024

A Comprehensive Guide to Fine-Tuning Gemma

Fine-tuning a state-of-the-art language model like Gemma can be an exciting journey. This guide will walk you through the process step by step, from setting up your environment to fine-tuning the model for your specific task. Whether you’re a seasoned machine learning practitioner or a newcomer to the field, this beginner-friendly tutorial will help you harness the power of Gemma for your projects.

Meet Gemma

a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

Colab Notebook to Finetuning

Google Colaboratory

Edit description

colab.research.google.com

Github Repository

GitHub - adithya-s-k/LLM-Alchemy-Chamber: a friendly neighborhood repository with diverse…

a friendly neighborhood repository with diverse experiments and adventures in the world of LLMs …

github.com

Prerequisites

Before delving into the fine-tuning process, ensure that you have the following prerequisites in place:

1. GPU: gemma-2b — can be finetuned on T4(free google colab) while gemma-7b requires an A100 GPU.

2. Python Packages: Ensure that you have the necessary Python packages installed. You can use the following commands to install them:

Let’s begin by checking if your GPU is correctly detected:

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

Hugging Face Hub Account: You’ll need an account on the Hugging Face Model Hub. You can sign up here.

Getting Started

Checking GPU

Let’s start by checking if your GPU is correctly detected:

!nvidia-smi

If your GPU is not recognized or you encounter CUDA out-of-memory errors during fine-tuning, consider using a more powerful GPU.

Loading Required Libraries

We’ll load the necessary Python libraries for our fine-tuning process:

import json
import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer

Logging into Hugging Face Hub

notebook_login()

Loading the Model

model_id = "google/gemma-7b-it"
# model_id = "google/gemma-7b"
# model_id = "google/gemma-2b-it"
# model_id = "google/gemma-2b"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)

Loading the Dataset

For this tutorial, we will fine-tune Gemma Instruct for code generation.

we will be using this dataset which is curated by TokenBender (e/xperiments) which is a awesome data for finetuning model for code generation. It follows the alpaca style of instructions which is an excellent starting point for this task. The dataset structure should resemble the following:

{
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "input":"[1, 2, 3, 4, 5]",
    "output": "# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum"
}

now lets load the dataset using huggingfaces datasets library

# Load your dataset (replace 'your_dataset_name' and 'split_name' with your actual dataset information)
# dataset = load_dataset("your_dataset_name", split="split_name")
dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")

Formatting the Dataset

Now, let’s format the dataset in the required gemma instruction formate.

Many tutorials and blogs skip over this part, but I feel this is a really important step.

<start_of_turn>user What is your favorite condiment? <end_of_turn>
<start_of_turn>model Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavor to whatever I'm cooking up in the kitchen!<end_of_turn>

You can use the following code to process your dataset and create a JSONL file in the correct format:

def generate_prompt(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt
    """
    prefix_text = 'Below is an instruction that describes a task. Write a response that ' \\
               'appropriately completes the request.\\n\\n'
    # Samples with additional context into.
    if data_point['input']:
        text = f"""<start_of_turn>user {prefix_text} {data_point["instruction"]} here are the inputs {data_point["input"]} <end_of_turn>\\n<start_of_turn>model{data_point["output"]} <end_of_turn>"""
    # Without
    else:
        text = f"""<start_of_turn>user {prefix_text} {data_point["instruction"]} <end_of_turn>\\n<start_of_turn>model{data_point["output"]} <end_of_turn>"""
    return text

# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in dataset]
dataset = dataset.add_column("prompt", text_column)

We'll need to tokenize our data so the model can understand.

dataset = dataset.shuffle(seed=1234)  # Shuffle dataset here
dataset = dataset.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

Split dataset into 90% for training and 10% for testing

dataset = dataset.train_test_split(test_size=0.2)
train_data = dataset["train"]
test_data = dataset["test"]

After Formatting, We should get something like this

{
"text":"<start_of_turn>user Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] <end_of_turn>
<start_of_turn>model # Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum <end_of_turn>",
"instruction":"Create a function to calculate the sum of a sequence of integers",
"input":"[1, 2, 3, 4, 5]",
"output":"# Python code def sum_sequence(sequence): sum = 0 for num in,
 sequence: sum += num return sum",
"prompt":"<start_of_turn>user Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] <end_of_turn>
<start_of_turn>model # Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum <end_of_turn>"
}

While using SFT (Supervised Fine-tuning Trainer) for fine-tuning, we will be only passing in the “text” column of the dataset for fine-tuning.

Setting Model Parameters and Lora

We need to set various parameters for our fine-tuning process, including QLoRA (Quantization LoRA) parameters, bitsandbytes parameters, and training arguments:

Apply Lora

Here comes the magic with peft! Let’s load a PeftModel and specify that we are going to use low-rank adapters (LoRA) using get_peft_model utility function and the prepare_model_for_kbit_training method from PEFT.

Here is a tweet on how to pick the best Lora config

from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules=['o_proj', 'q_proj', 'up_proj', 'v_proj', 'k_proj', 'down_proj', 'gate_proj'],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
) 
model = get_peft_model(model, lora_config)

Calculating the number of trainable parameters

trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")

expected output → Trainable: 200015872 | total: 8737696768 | Percentage: 2.2891%

Fine-Tuning with qLora and Supervised Finetuning

We’re ready to fine-tune our model using qLora. For this tutorial, we’ll use the SFTTrainer from the trl library for supervised fine-tuning. Ensure that you've installed the trl library as mentioned in the prerequisites.

#new code using SFTTrainer
import transformers

from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
torch.cuda.empty_cache()
trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=test_data,
    dataset_text_field="prompt",
    peft_config=lora_config,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=0.03,
        max_steps=100,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        save_strategy="epoch",
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

Lets start the training process

# Start the training process
trainer.train()

new_model = "gemma-Code-Instruct-Finetune-test" #Name of the model you will be pushing to huggingface model hub
# Save the fine-tuned model
trainer.model.save_pretrained(new_model)

Merge and Share

After fine-tuning, if you want to merge the model with LoRA weights or share it with the Hugging Face Model Hub, you can do so. This step is optional and depends on your specific use case.

# Merge the model with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
merged_model= PeftModel.from_pretrained(base_model, new_model)
merged_model= merged_model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Push the model and tokenizer to the Hugging Face Model Hub
merged_model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

Test the merged model

def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"
  prompt_template = """
  <start_of_turn>user
  Below is an instruction that describes a task. Write a response that appropriately completes the request.
  {query}
  <end_of_turn>\\n<start_of_turn>model
  
  """
  prompt = prompt_template.format(query=query)
  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
  model_inputs = encodeds.to(device)
  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  # decoded = tokenizer.batch_decode(generated_ids)
  decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
  return (decoded)

result = get_completion(query="code the fibonacci series in python using reccursion", model=merged_model, tokenizer=tokenizer)
print(result)

And that’s it! You’ve successfully fine-tuned Gemma Instruct for code generation. You can adapt this process for various natural language understanding and generation tasks. Keep exploring and experimenting with Gemma to unlock its full potential for your projects.

Happy Fine-Tuning!!

If you found this post valuable, make sure to follow me for more insightful content. I frequently write about the practical applications of Generative AI, LLMs, Stable Diffusion, and explore the broader impacts of AI on society.
Let’s stay connected on Twitter. I’d love to engage in discussions with you.
If you’re not a Medium member yet and wish to support writers like me, consider signing up through my referral link: Medium Membership. Your support is greatly appreciated!