Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 1.71 KB

Run_Mixtral.md

File metadata and controls

36 lines (21 loc) · 1.71 KB

Mixtral

Mixtral is a state-of-the-art AI model developed by Mistral AI, utilizing a sparse mixture-of-experts (MoE) architecture.

To get started, follow the instructions at mistral-inference to download the model. Once downloaded, run llama_or_mistral_ckpt.py to convert the checkpoint for MaxText compatibility. You can then proceed with decoding, pretraining, and finetuning. You could find Mixtral 8x7B example in the end_to_end/tpu/mixtral/8x7b test scripts.

Additionally, Mixtral integrates with MegaBlocks, an efficient dropless MoE strategy, which can be activated by setting both sparse_matmul and megablox flags to True (default).

MaxText supports pretraining and finetuning with high performance

Model Flop utilization for training on v5p TPUs.

Model size Accelerator type TFLOP/chip/sec Model flops utilization (MFU)
Mixtral 8X7B v5p-128 251.94 54.89%