Neuron SDK Release - October 26, 2023
What’s New
This
release adds support for PyTorch 2.0 (Beta), increases performance for
both training and inference workloads, adding ability to train models
like Llama-2-70B
using neuronx-distributed
. With this release, we are also adding pipeline parallelism support for neuronx-distributed
enabling full 3D parallelism support to easily scale training to large model sizes.
Neuron 2.15 also introduces support for training resnet50
, milesial/Pytorch-UNet
and deepmind/vision-perceiver-conv
models using torch-neuronx
, as well as new sample code for flan-t5-xl
model inference using neuronx-distributed
, in addition to other performance optimizations, minor enhancements and bug fixes.
What’s New | Details | Instances |
---|---|---|
Neuron Distributed (neuronx-distributed) for Training | Pipeline parallelism support. See API Reference Guide (neuronx-distributed ) , pp_developer_guide and pipeline_parallelism_overview Llama-2-70B model training script (sample script) (tutorial) Mixed precision support. See pp_developer_guide Support serialized checkpoint saving and loading using save_xser and load_xser parameters. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference | flan-t5-xl model inference script (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Inf2,Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference | Serialization support for Llama, Llama-2, GPT2 and BLOOM models . See developer guide and tutorial See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) | Introducing PyTorch 2.0 Beta support. See Introducing PyTorch 2.0 Support (Beta) . See llama-2-7b training , bert training and t5-3b inference samples. Scripts for training resnet50[Beta] , milesial/Pytorch-UNet[Beta] and deepmind/vision-perceiver-conv[Beta] models. | Trn1/Trn1n,Inf2 |
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) | Llama-2-70B model training sample using pipeline parallelism and tensor parallelism ( tutorial ) GPT-NeoX-20B model training using pipeline parallelism and tensor parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo | Trn1/Trn1n |
Neuron Compiler (neuronx-cc) | New llm-training option argument to --distribution_strategy compiler option for optimizations related to distributed training. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes | Inf2/Trn1/Trn1n |
Neuron Tools | alltoall Collective Communication operation, previously released in Neuron Collectives v2.15.13, was added as a testable operation in nccom-test. See NCCOM-TEST User Guide See more at Neuron System Tools | Inf1/Inf2/Trn1/Trn1n |
Documentation Updates | New App Note and Developer Guide about Activation memory reduction using sequence parallelism and activation recomputation in neuronx-distributed Added a new Model Samples and Tutorials summary page. See Model Samples and Tutorials Added Neuron SDK Classification guide. See Neuron Software Classification See more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)