A Beginner’s Guide to Fine-Tuning Mistral 7B Instruct Model

Fine-Tuning for Code Generation Using a Single Google Colab Notebook

8 min readOct 6, 2023

Updated : 10th December 2023

Fine-tuning a state-of-the-art language model like Mistral 7B Instruct can be an exciting journey. This guide will walk you through the process step by step, from setting up your environment to fine-tuning the model for your specific task. Whether you’re a seasoned machine learning practitioner or a newcomer to the field, this beginner-friendly tutorial will help you harness the power of Mistral 7B for your projects.

Meet Mistral 7B Instruct

The team at MistralAI has created an exceptional language model called Mistral 7B Instruct. It has consistently delivered outstanding results in a range of benchmarks, which positions it as an ideal option for natural language generation and understanding. This guide will concentrate on how to fine-tune the model for coding purposes, but the methodology can effectively be applied to other tasks.

Colab Notebook to Finetuning Mistral-7b-Instruct

Code has been updated on December 10th , 2023

Mistral 7b finetuning

Edit description

colab.research.google.com

LLM-Alchemy-Chamber/LLMs/Mistral-7b/Mistral_Colab_Finetune_ipynb_Colab_Final.ipynb at main ·…

a friendly neighborhood repository with diverse experiments and adventures in the world of LLMs …

github.com

Prerequisites

Before diving into the fine-tuning process, make sure you have the following prerequisites in place:

GPU: While this tutorial can run on a free Google Colab notebook with a GPU, it’s recommended to use more powerful GPUs like V100 or A100 for better performance.
Python Packages: Ensure you have the required Python packages installed. You can run the following commands to install them:

!pip install -q torch
!pip install -q git+https://github.com/huggingface/transformers #huggingface transformers for downloading models weights
!pip install -q datasets #huggingface datasets to download and manipulate datasets
!pip install -q peft #Parameter efficient finetuning - for qLora Finetuning
!pip install -q bitsandbytes #For Model weights quantisation
!pip install -q trl #Transformer Reinforcement Learning - For Finetuning using Supervised Fine-tuning
!pip install -q wandb -U #Used to monitor the model score during training

Hugging Face Hub Account: You’ll need an account on the Hugging Face Model Hub. You can sign up here.

Getting Started

Let’s start by checking if your GPU is correctly detected:

!nvidia-smi

If your GPU is not recognized or you encounter CUDA out-of-memory errors during fine-tuning, consider using a more powerful GPU.

Loading Required Libraries

We’ll load the necessary Python libraries for our fine-tuning process:

import json
import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer

Logging into Hugging Face Hub

notebook_login()

Loading the Dataset

For this tutorial, we will fine-tune Mistral 7B Instruct for code generation.

we will be using this dataset which is curated by TokenBender (e/xperiments) which is a awesome data for finetuning model for code generation. It follows the alpaca style of instructions which is an excellent starting point for this task. The dataset structure should resemble the following:

{
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "input":"[1, 2, 3, 4, 5]",
    "output": "# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum"
}

now lets load the dataset using huggingfaces datasets library

# Load your dataset (replace 'your_dataset_name' and 'split_name' with your actual dataset information)
# dataset = load_dataset("your_dataset_name", split="split_name")
dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")

Formatting the Dataset

Now, let’s format the dataset in the required Mistral-7B-Instruct-v0.1 format.

Many tutorial and blogs skip over this part but i feel this is a really important step.

We’ll put each instruction and input pair between [INST] and [/INST] output after that, like this:

<s>[INST] What is your favourite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s>

You can use the following code to process your dataset and create a JSONL file in the correct format:

# this function is used to output the right formate for each row in the dataset
def create_text_row(instruction, output, input):
    text_row = f"""<s>[INST] {instruction} here are the inputs {input} [/INST] \\n {output} </s>"""
    return text_row

# interate over all the rows formate the dataset and store it in a jsonl file
def process_jsonl_file(output_file_path):
    with open(output_file_path, "w") as output_jsonl_file:
        for item in dataset:
            json_object = {
                "text": create_text_row(item["instruction"], item["input"] ,item["output"]),
                "instruction": item["instruction"],
                "input": item["input"],
                "output": item["output"]
            }
            output_jsonl_file.write(json.dumps(json_object) + "\\n")

# Provide the path where you want to save the formatted dataset
process_jsonl_file("./training_dataset.jsonl")

After Formatting

{
"text":"<s>[INST] Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] [/INST]
# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum</s>",
"instruction":"Create a function to calculate the sum of a sequence of integers",
"input":"[1, 2, 3, 4, 5]",
"output":"# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum"
}

While using SFT(Supervised Fine-tuning Trainer) to finetune we will be only passing in the “text” column of the dataset for finetuning.

Loading the Training Dataset

Now, let’s load the training dataset from the JSONL file we created:

train_dataset = load_dataset('json', data_files='./training_dataset.jsonl' , split='train')

Setting Model Parameters

We need to set various parameters for our fine-tuning process, including QLoRA (Quantization LoRA) parameters, bitsandbytes parameters, and training arguments:

new_model = "mistralai-Code-Instruct" #set the name of the new model

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule (constant a bit better than cosine)
lr_scheduler_type = "constant"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 25

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

Loading the Base Model

Let’s load the Mistral 7B Instruct base model:

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

# Load the base model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map={"": 0}
)

base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

# Load MitsralAi tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"pyt

Base model Inference

eval_prompt = """Print hello world in python c and c++"""

# import random
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True))Fine-Tuning with qLora

The results from the base model tend to be of poor quality and doesn’t always generate sytactically correct code

Fine-Tuning with qLora and Supervised Finetuning

We’re ready to fine-tune our model using qLora. For this tutorial, we’ll use the SFTTrainer from the trl library for supervised fine-tuning. Ensure that you've installed the trl library as mentioned in the prerequisites.

# Set LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=100, # the total number of training steps to perform
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Initialize the SFTTrainer for fine-tuning
trainer = SFTTrainer(
    model=base_model,
    train_dataset=train_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,  # You can specify the maximum sequence length here
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

Lets start the training process

# Start the training process
trainer.train()

# Save the fine-tuned model
trainer.model.save_pretrained(new_model)

Inference with Fine-Tuned Model

Now that we have fine-tuned our model, let’s test its performance with some code generation tasks. Replace eval_prompt with your code generation prompt:

eval_prompt = """Print hello world in python c and c++"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
    generated_code = tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True)
print(generated_code)

Merge and Share

After fine-tuning, if you want to merge the model with LoRA weights or share it with the Hugging Face Model Hub, you can do so. This step is optional and depends on your specific use case.

# Merge the model with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
merged_model= PeftModel.from_pretrained(base_model, new_model)
merged_model= model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")

# Merge the model with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
merged_model= PeftModel.from_pretrained(base_model, new_model)
merged_model= merged_model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")

Test the merged model

from random import randrange
sample = train_dataset [randrange(len(train_dataset ))]

prompt = f"""<s> 
{sample['instruction']}
{sample['input']}
[INST]

"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = merged_model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5)

print(f"Prompt:\n{prompt}\n")
print(f"\nGenerated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"\nGround truth:\n{sample['output']}")

And that’s it! You’ve successfully fine-tuned Mistral 7B Instruct for code generation. You can adapt this process for various natural language understanding and generation tasks. Keep exploring and experimenting with Mistral 7B to unlock its full potential for your projects.

All the code will be available on my github. Do drop by and give a follow and a star

adithya-s-k - Overview

Exploring Generative AI * Google DSC Lead'23 * Cloud & Full Stack Engineer * Drones & IoT * FOSS Contributor …