Skip to content

Neuron SDK Release - September 15, 2023

Compare
Choose a tag to compare
@aws-mesharma aws-mesharma released this 16 Sep 04:22
· 109 commits to master since this release

What’s New

This release introduces support for Llama-2-7B model training and T5-3B model inference using neuronx-distributed. It also adds support for Llama-2-13B model training using neuronx-nemo-megatron. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base) model inference using torch-neuronx . This release also introduces other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

Note

This release deprecates --model-type=transformer-inference compiler flag. Users are highly encouraged to migrate to the --model-type=transformer compiler flag.

What’s New Details Instances
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) Llama-2-13B model training support ( tutorial ) ZeRO-1 Optimizer support that works with tensor parallelism and pipeline parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Training pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) Llama-2-7B model training support (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Inference T5-3B model inference support (tutorial) pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) Inf2,Trn1/Trn1n
Transformers Neuron (transformers-neuronx) for Inference Introducing --model-type=transformer compiler flag that deprecates --model-type=transformer-inference compiler flag. See more at Transformers Neuron (transformers-neuronx) release notes Inf2, Trn1/Trn1n
PyTorch Neuron (torch-neuronx) Performance optimizations in torch_neuronx.analyze API. See PyTorch Neuron (torch-neuronx) Analyze API for Inference Stable Diffusion XL(Refiner and Base) model inference support ( sample script) Trn1/Trn1n,Inf2
Neuron Compiler (neuronx-cc) New --O compiler option that enables different optimizations with tradeoff between faster model compile time and faster model execution. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes Inf2/Trn1/Trn1n
Neuron Tools Neuron SysFS support for showing connected devices on trn1.32xl, inf2.24xl and inf2.48xl instances. See Neuron Sysfs User Guide See more at Neuron System Tools Inf1/Inf2/Trn1/Trn1n
Documentation Updates Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag See more at Neuron Documentation Release Notes Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes. See Neuron Components Release Notes Trn1/Trn1n , Inf2, Inf1
Release Artifacts see Release Artifacts Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)