Data lake

Is a

Technology

Industry

Industry attributes

Parent Industry

‌

Data management

Other attributes

Wikidata ID

Q20707560

A data lake is a method of storing all data types and schemas in an unstructured manner at a central location. Data lakes retain all types of data such as structured, semi-structured, and unstructured or raw data. Data lakes are typically used in the profession of data science and are less restrictive than data warehouses for analyzing data. The phrase 'data lake' is credited to the CTO of Pentaho James Dixon who explains data lakes using the following analogy:

If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

Advantages of data lakes over data warehouses include: the retention of all data in data lakes, data lakes supporting all data types, data lakes support all users, users can make changes to data lakes more easily compared to data warehouses, and data lakes are generally faster to gain insights based on data analytics compared to data warehouses.

Notable companies managing and offering data lakes include: Amazon Web Services (AWS), IBM, Informatica, Qlik, Unifi Software, and Zaloni.

Timeline

No Timeline data yet.

Companies in this industry

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

Data lake

Contents

Industry attributes

Other attributes

Timeline

Companies in this industry

Further Resources

References

Find more entities like Data lake