Reduced nanoGPT for fast experimentation with mechanisms and architectures.
Just create a venv and install dependencies...
python -m venv venv
source venv/bin/activate
pip install datasets deepeval numpy tiktoken torch tqdm transformers wandb
You're ready to tinker with GPT2-class models!
Dependencies:
datasets
for huggingface datasets <3 (if you want to download + preprocess OpenWebText)deepeval
for benchmarks <3numpy
<3pytorch
<3tiktoken
for OpenAI's fast BPE code <3tqdm
for progress bars <3transformers
for huggingface transformers <3 (to load GPT-2 checkpoints)wandb
for optional logging <3
The main model is defined in model.py
. Parameters and their defaults are defined in train.py
, and are mostly the same as in the original repo but with these notable additions:
multiple_choice_benchmarks
: a list of DeepEval benchmark classes to run on each eval intervaltrain_datasets
,val_datasets
: lists ofDatasetConfig
objects defining training and eval dataset streams
Parameters can be overriden by files in config/
.
To begin training on a CPU or single-GPU machine, run:
python train.py [config/<optional_override_file>.py]
For multi-GPU training, run this or similar:
torchrun --standalone --nproc_per_node=8 train.py [config/<optional_override_file>.py]
Use the script sample.py
to sample either from pre-trained GPT-2 models released by OpenAI, or from a model you trained yourself. For example, here is a way to sample from the largest available gpt2-xl
model:
python sample.py \
--init_from=gpt2-xl \
--start="What is the answer to life, the universe, and everything?" \
--num_samples=5 --max_new_tokens=100
If you'd like to sample from a model you trained, use the --out_dir
to point the code appropriately. You can also prompt the model with some text from a file, e.g. python sample.py --start=FILE:prompt.txt
.