Log in
Enquire now
Retentive Network

Retentive Network

The Retentive network (RetNet) architecture is a foundational architecture for large language models proposed as an alternative to transformers.

OverviewStructured DataIssuesContributors

Contents

arxiv.org...21.pdf
thegenerality.com...ch.html
Is a
Technology
Technology

Technology attributes

Created/Discovered by
Microsoft Research
Microsoft Research
0
‌
Yutao Sun
0
Tsinghua University
Tsinghua University
0
Related Industries
Artificial Intelligence (AI)
Artificial Intelligence (AI)
Generative AI
Generative AI
Machine learning
Machine learning
Related Technology
Large language model
Large language model
Transformer
Transformer
Date Invented
July 17, 2023
0

Other attributes

Also Known As
RetNet0
Creator
‌
Yutao Sun
0
Published Date
July 25, 2023
Overview

The Retentive network (RetNet) architecture is a foundational architecture for large language models (LLMs) proposed as an alternative to transformers. RetNet was first proposed by researchers at Microsoft Research and Tsinghua University, Beijing, in a paper submitted on July 17, 2023. The paper titled "Retentive Network: A Successor to Transformer for Large Language Models" was authored by Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. Alongside the paper, the researchers released code on GitHub allowing users to develop their own RetNet models. The code is available through TorchScale, a PyTorch library of foundation architectures.

RetNet derives a connection between recurrence and attention (a key concept in the transformer architecture), proposing the retention mechanism for sequence modeling that supports three computation paradigms:

  • Parallel
  • Recurrent
  • Chunkwise recurrent

In particular, the parallel representation allows for training parallelism. The recurrent representation enables inference, improving decoding throughput, latency, and GPU memory. The chunkwise recurrent representation allows for efficient long-sequence modeling with linear complexity, with each chunk encoded parallelly while recurrently summarizing chunks.

Transformers have become the primary architecture for LLMs. The training parallelism of transformers leads to inefficient inference. With growing sequence lengths, this deficiency increases GPU memory consumption and latency while reducing inference speed. RetNet is a potential next-generation architecture aiming to retain the training parallelism and competitive performance of transformers but with improved inference.

Performance

In their paper, the team at Microsoft Research and Tsinghua University conducted a series of experiments to show that RetNet is competitive in terms of both scaling curves and in-context learning with Transformers and its variants. The paper also states that the inference cost of RetNet is length-invariant. For a 7B parameter model and 8k sequence length, RetNet decoded 8.4x faster, saving 70% of the memory compared to transformers with key-value caches. Training RetNet achieves 25-50% memory saving and 7x acceleration than standard transformers.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

Retentive Network: A Successor to Transformer for Large Language Models

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

https://arxiv.org/pdf/2307.08621.pdf

July 17, 2023

References

Find more entities like Retentive Network

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.