Log in
Enquire now
‌

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

OverviewStructured DataIssuesContributors

Contents

Is a
‌
Academic paper
0

Academic Paper attributes

arXiv ID
2104.051580
arXiv Classification
Computer science
Computer science
0
Publication URL
arxiv.org/pdf/2104.0...58.pdf0
Publisher
ArXiv
ArXiv
0
DOI
doi.org/10.48550/ar...04.051580
Paid/Free
Free0
Academic Discipline
‌
Computer performance
0
Computer science
Computer science
0
Artificial Intelligence (AI)
Artificial Intelligence (AI)
0
Machine learning
Machine learning
0
Submission Date
April 27, 2022
0
September 3, 2021
0
April 12, 2021
0
April 13, 2021
0
April 15, 2021
0
September 15, 2021
0
February 27, 2023
0
Author Names
Tyler Graf0
Rakesh Komuravelli0
Serhat Yilmaz0
Srinivas Sridharan0
Vijay Rao0
Whitney Zhao0
Xiaodong Wang0
Xing Liu0
...
Paper abstract

Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date
No Further Resources data yet.

References

Find more entities like Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Pricing
  • Become an Editor
  • Enterprise

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us

Explore companies

  • Artificial Intelligence
  • Fintech
  • Biotechnology
  • Cybersecurity
  • Semiconductors
  • Electric Vehicles
  • Cloud Computing
  • Robotics
  • SaaS
  • Renewable Energy
  • Venture Capital
  • Blockchain
  • Browse all →
By using this site, you agree to our Terms of Service.