Log in
Enquire now
‌

High Performance Data Engineering Everywhere

OverviewStructured DataIssuesContributors

Contents

Is a
‌
Academic paper
0

Academic Paper attributes

arXiv ID
2007.095890
arXiv Classification
Computer science
Computer science
0
Publication URL
arxiv.org/pdf/2007.0...89.pdf0
Publisher
ArXiv
ArXiv
0
DOI
doi.org/10.48550/ar...07.095890
Paid/Free
Free0
Academic Discipline
Computer science
Computer science
0
Database
Database
0
Submission Date
July 19, 2020
0
Author Names
Pulasthi Wickramasinghe0
Vibhatha Abeykoon0
Thejaka Amila Kanewala0
Niranda Perera0
Supun Kamburugamuve0
Ahmet Uyar0
Chathura Widanage0
Geoffrey Fox0
...
Paper abstract

The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide. However this is just a small part of the issues facing the overall data processing environment, which must also support a raft of data engineering for pre- and post-data processing, communication, and system integration. An important requirement of data analytics tools is to be able to easily integrate with existing frameworks in a multitude of languages, thereby increasing user productivity and efficiency. All this demands an efficient and highly distributed integrated approach for data processing, yet many of today's popular data analytics tools are unable to satisfy all these requirements at the same time. In this paper we present Cylon, an open-source high performance distributed data processing library that can be seamlessly integrated with existing Big Data and AI/ML frameworks. It is developed with a flexible C++ core on top of a compact data structure and exposes language bindings to C++, Java, and Python. We discuss Cylon's architecture in detail, and reveal how it can be imported as a library to existing applications or operate as a standalone framework. Initial experiments show that Cylon enhances popular tools such as Apache Spark and Dask with major performance improvements for key operations and better component linkages. Finally, we show how its design enables Cylon to be used cross-platform with minimum overhead, which includes popular AI tools such as PyTorch, Tensorflow, and Jupyter notebooks.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date
No Further Resources data yet.

References

Find more entities like High Performance Data Engineering Everywhere

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Pricing
  • Become an Editor
  • Enterprise

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us

Explore companies

  • Artificial Intelligence
  • Fintech
  • Biotechnology
  • Cybersecurity
  • Semiconductors
  • Electric Vehicles
  • Cloud Computing
  • Robotics
  • SaaS
  • Renewable Energy
  • Venture Capital
  • Blockchain
  • Browse all →
By using this site, you agree to our Terms of Service.