GPT from Scratch is a project aimed at implementing the GPT (Generative Pre-trained Transformer) model from scratch. The project focuses on understanding the inner workings of the GPT model and providing a step-by-step guide for its implementation.
This project follows the Transformer architecture, which is a deep learning model that uses self-attention mechanisms to capture long-range dependencies in sequential data. The transformer model has been widely used in natural language processing tasks, such as machine translation and text generation.
Note: This project follows Andrej Karpathy's Let's build GPT video, which provides a detailed explanation of the GPT model and its implementation.
To set up the project, follow these steps:
-
Clone the repository:
git clone https://github.com/username/gpt-from-scratch.git
-
Create a virtual environment using venv:
python3 -m venv gpt-env
-
Activate the virtual environment:
source gpt-env/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
To run the project, execute the following command:
python main.py
This will train the GPT model on the provided training data and generate text using the trained model.
The parameters for the GPT model can be modified in the main.py
file, such as the number of layers, hidden size, and number of attention heads.
To run it in a device without a good gpu just for testing purpose, you can change the hyperparameters in the main.py
file to a smaller value.
# Hyper Parameters
batch_size: int = 32 # How many sequences to process in parallel
# Number of tokens processed at a time. Content length of predictions
block_size: int = 8
max_iters = 5000
eval_interval = 500
learning_rate = 1e-3
device = 'cuda' if torch.cuda.is_available() else 'cpu'
eval_iters = 200
n_embd = 32 # Number of embeddings
n_head = 4 # Number of heads in the multi-head attention
n_layer = 6
dropout = 0.2 # Drops out 20% of the neurons every forward and backward pass
The project is structured as follows:
gpt-from-scratch/
│
├── data/
│ ├── data.txt
│
├── src/
| ├── gpt.ipynb
│ ├── main.py
│
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── requirements.txt
The data
directory contains the training data for the GPT model. The src
directory contains the source code for the GPT model implementation. The main.py
file contains the GPT model implementation while gpt.ipynb
contains the notes and approach to implement the GPT model. It also has notes on why the model is implemented in a certain way.
This project is licensed under the MIT License. See the LICENSE file for more information.
Contributions are welcome! Please refer to the contribution guidelines for more information.