Skip to content

The "GPT from Scratch" project is an endeavor to implement the Generative Pre-trained Transformer (GPT) model from the ground up. Its primary objective is to demystify the GPT architecture by providing a comprehensive, step-by-step guide to its implementation.

License

Notifications You must be signed in to change notification settings

Harsh-2909/gpt-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT from Scratch

Overview

GPT from Scratch is a project aimed at implementing the GPT (Generative Pre-trained Transformer) model from scratch. The project focuses on understanding the inner workings of the GPT model and providing a step-by-step guide for its implementation.

This project follows the Transformer architecture, which is a deep learning model that uses self-attention mechanisms to capture long-range dependencies in sequential data. The transformer model has been widely used in natural language processing tasks, such as machine translation and text generation.

Note: This project follows Andrej Karpathy's Let's build GPT video, which provides a detailed explanation of the GPT model and its implementation.

Setup

To set up the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/username/gpt-from-scratch.git
  2. Create a virtual environment using venv:

    python3 -m venv gpt-env
  3. Activate the virtual environment:

    source gpt-env/bin/activate
  4. Install the required dependencies:

    pip install -r requirements.txt

Running the Project

To run the project, execute the following command:

python main.py

This will train the GPT model on the provided training data and generate text using the trained model.

The parameters for the GPT model can be modified in the main.py file, such as the number of layers, hidden size, and number of attention heads. To run it in a device without a good gpu just for testing purpose, you can change the hyperparameters in the main.py file to a smaller value.

# Hyper Parameters
batch_size: int = 32  # How many sequences to process in parallel
# Number of tokens processed at a time. Content length of predictions
block_size: int = 8
max_iters = 5000
eval_interval = 500
learning_rate = 1e-3
device = 'cuda' if torch.cuda.is_available() else 'cpu'
eval_iters = 200
n_embd = 32  # Number of embeddings
n_head = 4  # Number of heads in the multi-head attention
n_layer = 6
dropout = 0.2  # Drops out 20% of the neurons every forward and backward pass

Code Structure

The project is structured as follows:

gpt-from-scratch/
│
├── data/
│   ├── data.txt
│
├── src/
|   ├── gpt.ipynb
│   ├── main.py
│
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── requirements.txt

The data directory contains the training data for the GPT model. The src directory contains the source code for the GPT model implementation. The main.py file contains the GPT model implementation while gpt.ipynb contains the notes and approach to implement the GPT model. It also has notes on why the model is implemented in a certain way.

Resources

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Contributing

Contributions are welcome! Please refer to the contribution guidelines for more information.

About

The "GPT from Scratch" project is an endeavor to implement the Generative Pre-trained Transformer (GPT) model from the ground up. Its primary objective is to demystify the GPT architecture by providing a comprehensive, step-by-step guide to its implementation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published