Mixtral

Mixtral is a state-of-the-art AI model developed by Mistral AI, utilizing a sparse mixture-of-experts (MoE) architecture.

To get started, follow the instructions at mistral-inference to download the model. Once downloaded, run llama_or_mistral_ckpt.py to convert the checkpoint for MaxText compatibility. You can then proceed with decoding, pretraining, and finetuning. You could find Mixtral 8x7B example in the end_to_end/tpu/mixtral/8x7b test scripts.

Additionally, Mixtral integrates with MegaBlocks, an efficient dropless MoE strategy, which can be activated by setting both sparse_matmul and megablox flags to True (default).

MaxText supports pretraining and finetuning with high performance

Model Flop utilization for training on v5p TPUs.

Model size	Accelerator type	TFLOP/chip/sec	Model flops utilization (MFU)
Mixtral 8X7B	v5p-128	251.94	54.89%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run_Mixtral.md

Run_Mixtral.md

Mixtral

MaxText supports pretraining and finetuning with high performance

Files

Run_Mixtral.md

Latest commit

History

Run_Mixtral.md

File metadata and controls

Mixtral

MaxText supports pretraining and finetuning with high performance