Retentive Network

arxiv.org...21.pdf

thegenerality.com...ch.html

Is a

Technology

Technology attributes

Created/Discovered by

Related Industries

Artificial Intelligence (AI)

Related Technology

Date Invented

July 17, 2023

Other attributes

Also Known As

RetNet0

Creator

‌

Yutao Sun

Published Date

July 25, 2023

Overview

The Retentive network (RetNet) architecture is a foundational architecture for large language models (LLMs) proposed as an alternative to transformers. RetNet was first proposed by researchers at Microsoft Research and Tsinghua University, Beijing, in a paper submitted on July 17, 2023. The paper titled "Retentive Network: A Successor to Transformer for Large Language Models" was authored by Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. Alongside the paper, the researchers released code on GitHub allowing users to develop their own RetNet models. The code is available through TorchScale, a PyTorch library of foundation architectures.

RetNet derives a connection between recurrence and attention (a key concept in the transformer architecture), proposing the retention mechanism for sequence modeling that supports three computation paradigms:

Parallel
Recurrent
Chunkwise recurrent

In particular, the parallel representation allows for training parallelism. The recurrent representation enables inference, improving decoding throughput, latency, and GPU memory. The chunkwise recurrent representation allows for efficient long-sequence modeling with linear complexity, with each chunk encoded parallelly while recurrently summarizing chunks.

Transformers have become the primary architecture for LLMs. The training parallelism of transformers leads to inefficient inference. With growing sequence lengths, this deficiency increases GPU memory consumption and latency while reducing inference speed. RetNet is a potential next-generation architecture aiming to retain the training parallelism and competitive performance of transformers but with improved inference.

Performance

In their paper, the team at Microsoft Research and Tsinghua University conducted a series of experiments to show that RetNet is competitive in terms of both scaling curves and in-context learning with Transformers and its variants. The paper also states that the inference cost of RetNet is length-invariant. For a 7B parameter model and 8k sequence length, RetNet decoded 8.4x faster, saving 70% of the memory compared to transformers with key-value caches. Training RetNet achieves 25-50% memory saving and 7x acceleration than standard transformers.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

Retentive Network: A Successor to Transformer for Large Language Models

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

https://arxiv.org/pdf/2307.08621.pdf

July 17, 2023

Retentive Network

Contents

Technology attributes

Other attributes

Timeline

Further Resources

References

Find more entities like Retentive Network