Skip to content

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - April 1, 2024

02 Apr 01:34
af96728
Compare
Choose a tag to compare

What's New

Neuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).

Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).

Inference highlights: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature. This release also adds a new sample for Mixtral-8x7B-v0.1 and mistralai/Mistral-7B-Instruct-v0.2 in TNx.

Neuron DLAMI and Neuron DLC support highlights: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.

Neuron SDK Release - February 13, 2024

14 Feb 02:27
Compare
Choose a tag to compare

What's New

Neuron 2.17 release improves small collective communication operators (smaller than 16MB) by up to 30%, which improves large language model (LLM) Inference performance by up to 10%. This release also includes improvements in :ref:`Neuron Profiler <neuron-profile-ug>` and other minor enhancements and bug fixes.

For more detailed release notes of the new features and resolved issues, see :ref:`components-rn`.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see :ref:`model_architecture_fit`.

Neuron Components Release Notes

Inf1, Trn1/Trn1n and Inf2 common packages

Component Instance/s Package/s Details
Neuron Runtime Trn1/Trn1n, Inf1, Inf2 Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm) Inf1: Runtime is linked into the ML frameworks packages :ref:neuron-runtime-rn
Neuron Runtime Driver Trn1/Trn1n, Inf1, Inf2 aws-neuronx-dkms (.deb, .rpm) :ref:neuron-driver-release-notes
Neuron System Tools Trn1/Trn1n, Inf1, Inf2 aws-neuronx-tools (.deb, .rpm) :ref:neuron-tools-rn
Containers Trn1/Trn1n, Inf1, Inf2 aws-neuronx-k8-plugin (.deb, .rpm) aws-neuronx-k8-scheduler (.deb, .rpm) aws-neuronx-oci-hooks (.deb, .rpm) :ref:neuron-k8-rn :ref:neuron-containers-release-notes
NeuronPerf (Inference only) Trn1/Trn1n, Inf1, Inf2 neuronperf (.whl) :ref:neuronperf_rn
TensorFlow Model Server Neuron Trn1/Trn1n, Inf1, Inf2 tensorflow-model-server-neuronx (.deb, .rpm) :ref:tensorflow-modeslserver-neuronx-rn
Neuron Documentation Trn1/Trn1n, Inf1, Inf2   :ref:neuron-documentation-rn

Neuron SDK Release - January 18, 2024

18 Jan 23:51
Compare
Choose a tag to compare

Neuron SDK Release - Decemeber 21, 2023

22 Dec 03:34
Compare
Choose a tag to compare

What’s New

Neuron 2.16 adds support for Llama-2-70B training and inference, upgrades to PyTorch 2.1 (beta) and adds new support for PyTorch Lightning Trainer (beta) as well as performance improvements and adding Amazon Linux 2023 support.

Training highlights: NeuronX Distributed library LLM models training performance is improved by up to 15%. LLM model training user experience is improved by introducing support of PyTorch Lightning Trainer (beta), and a new model optimizer wrapper which will minimize the amount of changes needed to partition models using NeuronX Distributed primitives.

Inference highlights: PyTorch inference now allows to dynamically swap different fine-tuned weights for an already loaded model, as well as overall improvements of LLM inference throughput and latency with Transformers NeuronX. Two new reference model samples for LLama-2-70b and Mistral-7b model inference.

User experience: This release introduces two new capabilities: A new tool, Neuron Distributed Event Tracing (NDET) which improves debuggability, and the support of profiling collective communication operators in the Neuron Profiler tool.

More release content can be found in the table below and each component release notes.

What’s New Details Instances
Transformers NeuronX (transformers-neuronx) for Inference [Beta] Support for Grouped Query Attention(GQA). See developer guide [Beta] Support for Llama-2-70b model inference using Grouped Query Attention. See tutorial [Beta] Support for Mistral-7B-Instruct-v0.1 model inference. See sample code See more at Transformers Neuron (transformers-neuronx) release notes Inf2, Trn1/Trn1n
NeuronX Distributed (neuronx-distributed) for Training [Beta] Support for PyTorch Lightning to train models using tensor parallelism and data parallelism . See api guide , developer guide and tutorial Support for Model and Optimizer Wrapper training API that handles the parallelization. See api guide and Developer guide for model and optimizer wrapper (neuronx-distributed ) New save_checkpoint and load_checkpoint APIs to save/load checkpoints during distributed training. See Developer guide for save/load checkpoint (neuronx-distributed ) Support for a new Query-Key-Value(QKV) module that provides the ability to replicate the Key Value heads and adds flexibility to use higher Tensor parallel degree during Training. See api guide and tutorial See more at Neuron Distributed Release Notes (neuronx-distributed) Trn1/Trn1n
NeuronX Distributed (neuronx-distributed) for Inference Support weight-deduplication amongst TP shards by giving ability to save weights separately than in NEFF files. See developer guide Llama-2-7B model inference script ([html] [notebook]) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) Inf2,Trn1/Trn1n
PyTorch NeuronX (torch-neuronx) [Beta]Support for] PyTorch 2.1. See Introducing PyTorch 2.1 Support (Beta) . See llama-2-13b inference sample. Support to separate out model weights from NEFF files and new replace_weights API to replace the separated weights. See PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference and PyTorch NeuronX Tracing API for Inference [Beta] Script for training stabilityai/stable-diffusion-2-1-base and runwayml/stable-diffusion-v1-5 models . See script [Beta] Script for training facebook/bart-large model. See script [Beta] Script for stabilityai/stable-diffusion-2-inpainting model inference. See script Trn1/Trn1n,Inf2
Neuron Tools New Neuron Distributed Event Tracing (NDET) tool to help visualize execution trace logs and diagnose errors in multi-node workloads. See Neuron Distributed Event Tracing (NDET) User Guide Support for multi-worker jobs in neuron-profile . See Neuron Profile User Guide See more at Neuron System Tools Inf1/Inf2/Trn1/Trn1n
Documentation Updates Added setup guide instructions for AL2023 OS. See Setup Guide Added announcement for name change of Neuron Components. See Announcing Name Change for Neuron Components Added announcement for End of Support for PyTorch 1.10 . See Announcing End of Support for PyTorch Neuron version 1.10 Added announcement for End of Support for PyTorch 2.0 Beta. See Announcing End of Support for PyTorch NeuronX version 2.0 (beta) See more at Neuron Documentation Release Notes Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes. See Neuron Components Release Notes Trn1/Trn1n , Inf2, Inf1
Known Issues and Limitations See 2.16.0 Known Issues and Limitations Trn1/Trn1n , Inf2, Inf1

Neuron SDK Release - November 17, 2023

18 Nov 00:16
Compare
Choose a tag to compare

Patch release to fix performance related issues when training through neuronx-nemo-megatron library. Refer to 2.15.2 compiler release notes for additional information.

Neuron SDK Release - November 9, 2023

09 Nov 21:44
Compare
Choose a tag to compare

Patch release to fix execution overhead issues in Neuron Runtime that were inadvertently introduced in 2.15 release. Refer to 2.15.1 runtime release notes for additional information.

Neuron SDK Release - October 26, 2023

27 Oct 23:00
Compare
Choose a tag to compare

What’s New

This release adds support for PyTorch 2.0 (Beta), increases performance for both training and inference workloads, adding ability to train models like Llama-2-70B using neuronx-distributed. With this release, we are also adding pipeline parallelism support for neuronx-distributed enabling full 3D parallelism support to easily scale training to large model sizes. Neuron 2.15 also introduces support for training resnet50, milesial/Pytorch-UNet and deepmind/vision-perceiver-conv models using torch-neuronx, as well as new sample code for flan-t5-xl model inference using neuronx-distributed, in addition to other performance optimizations, minor enhancements and bug fixes.

What’s New Details Instances
Neuron Distributed (neuronx-distributed) for Training Pipeline parallelism support. See API Reference Guide (neuronx-distributed ) , pp_developer_guide and pipeline_parallelism_overview Llama-2-70B model training script (sample script) (tutorial) Mixed precision support. See pp_developer_guide Support serialized checkpoint saving and loading using save_xser and load_xser parameters. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Inference flan-t5-xl model inference script (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) Inf2,Trn1/Trn1n
Transformers Neuron (transformers-neuronx) for Inference Serialization support for Llama, Llama-2, GPT2 and BLOOM models . See developer guide and tutorial See more at Transformers Neuron (transformers-neuronx) release notes Inf2, Trn1/Trn1n
PyTorch Neuron (torch-neuronx) Introducing PyTorch 2.0 Beta support. See Introducing PyTorch 2.0 Support (Beta) . See llama-2-7b training , bert training and t5-3b inference samples. Scripts for training resnet50[Beta] , milesial/Pytorch-UNet[Beta] and deepmind/vision-perceiver-conv[Beta] models. Trn1/Trn1n,Inf2
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) Llama-2-70B model training sample using pipeline parallelism and tensor parallelism ( tutorial ) GPT-NeoX-20B model training using pipeline parallelism and tensor parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo Trn1/Trn1n
Neuron Compiler (neuronx-cc) New llm-training option argument to --distribution_strategy compiler option for optimizations related to distributed training. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes Inf2/Trn1/Trn1n
Neuron Tools alltoall Collective Communication operation, previously released in Neuron Collectives v2.15.13, was added as a testable operation in nccom-test. See NCCOM-TEST User Guide See more at Neuron System Tools Inf1/Inf2/Trn1/Trn1n
Documentation Updates New App Note and Developer Guide about Activation memory reduction using sequence parallelism and activation recomputation in neuronx-distributed Added a new Model Samples and Tutorials summary page. See Model Samples and Tutorials Added Neuron SDK Classification guide. See Neuron Software Classification See more at Neuron Documentation Release Notes Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes. See Neuron Components Release Notes Trn1/Trn1n , Inf2, Inf1
Release Artifacts see Release Artifacts Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)

Neuron SDK Release - September 26, 2023

26 Sep 23:57
a390205
Compare
Choose a tag to compare

This is a patch release that fixes compiler issues in certain configurations of Llama and Llama-2 model inference using transformers-neuronx. Refer to 2.14.1 release notes for additional information.

Neuron SDK Release - September 15, 2023

16 Sep 04:22
Compare
Choose a tag to compare

What’s New

This release introduces support for Llama-2-7B model training and T5-3B model inference using neuronx-distributed. It also adds support for Llama-2-13B model training using neuronx-nemo-megatron. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base) model inference using torch-neuronx . This release also introduces other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

Note

This release deprecates --model-type=transformer-inference compiler flag. Users are highly encouraged to migrate to the --model-type=transformer compiler flag.

What’s New Details Instances
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) Llama-2-13B model training support ( tutorial ) ZeRO-1 Optimizer support that works with tensor parallelism and pipeline parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Training pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) Llama-2-7B model training support (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Inference T5-3B model inference support (tutorial) pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) Inf2,Trn1/Trn1n
Transformers Neuron (transformers-neuronx) for Inference Introducing --model-type=transformer compiler flag that deprecates --model-type=transformer-inference compiler flag. See more at Transformers Neuron (transformers-neuronx) release notes Inf2, Trn1/Trn1n
PyTorch Neuron (torch-neuronx) Performance optimizations in torch_neuronx.analyze API. See PyTorch Neuron (torch-neuronx) Analyze API for Inference Stable Diffusion XL(Refiner and Base) model inference support ( sample script) Trn1/Trn1n,Inf2
Neuron Compiler (neuronx-cc) New --O compiler option that enables different optimizations with tradeoff between faster model compile time and faster model execution. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes Inf2/Trn1/Trn1n
Neuron Tools Neuron SysFS support for showing connected devices on trn1.32xl, inf2.24xl and inf2.48xl instances. See Neuron Sysfs User Guide See more at Neuron System Tools Inf1/Inf2/Trn1/Trn1n
Documentation Updates Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag See more at Neuron Documentation Release Notes Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes. See Neuron Components Release Notes Trn1/Trn1n , Inf2, Inf1
Release Artifacts see Release Artifacts Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)

Neuron SDK Release - September 01, 2023

02 Sep 01:30
Compare
Choose a tag to compare

This release adds support for Llama 2 model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2 model inference using transformers-neuronx library (tutorial) .

Please follow these instructions in setup guide to upgrade to latest Neuron release.

Note

Please install transformers-neuronx from https://pip.repos.neuron.amazonaws.com/ to get latest features and improvements.

This release does not support LLama 2 model with Grouped-Query Attention