Log in
Enquire now
Data integration

Data integration

Data integration is the process of unifying data from multiple sources across an organization to provide a comprehensive and accurate dataset.

OverviewStructured DataIssuesContributors

All edits by  Amy Tomlinson Gayle 

Edits on 21 Feb, 2024
Amy Tomlinson Gayle profile picture
Amy Tomlinson Gayle
edited on 21 Feb, 2024
Edits made to:
Article (+9/-9 characters)
Article

Streaming data integration continuously moves data in real-timereal time from the source to the target storage system. Streaming involves capturing and processing data as it becomes available in the source system and immediately integrating it into the target system. It is commonly used in scenarios that require up-to-date insights, such as real-time analytics

Amy Tomlinson Gayle profile picture
Amy Tomlinson Gayle
edited on 21 Feb, 2024
Edits made to:
Article (+194/-161 characters)
Article

Data integration is the process of unifying data from multiple sources across an organization to provide a comprehensive and accurate dataset. The field comprises the practices, tools, and architectural techniques used to achieve consistent access to data across different subject areas and structure types in the organization, meeting the requirements of all business applications and processes. Data integration includes data replication, ingestion, and transformation to combine different data types into standardized formats for storage in a target repository, such as a data warehouse, data lake, or data lakehouse.

...

Data integration aims to provide a range of benefits to organizations, enabling them to make better-informed decisions, streamline operations, and gain a competitive advantage. The process breaks down data silos (isolated data sources), eliminating redundancies and inconsistencies through a unified and comprehensive view of the organization's data. Transformation and cleansing processes associated with data integration improve data quality by identifying and correcting errors. Integrated data sets facilitate smoother business practices, reducing manual data entry. Data integration simplifies data access for analysis, leading to faster decision makingdecision-making. Data integration is a fundamental part of business intelligence and data-driven innovation initiatives.

...

Traditionally, data integration tools have been delivered via a set of related markets with vendors offering a specific style of tool. The most popular in recent years is the ETL (extract, transform, learn) tool market. Vendors offering tools optimized for a particular style of data integration hashave led to fragmentation in the data integration market, complicating data integration processes in large enterprises with different teams relying on different tools, resulting in significant overlap and redundancy without common management of metadata. However, data integration submarkets have been converging at the vendor and technology level, enabling organizations to take a more holistic approach with a common set of data integration capabilities across the enterprise.

...

Data integration includes a combination of technical processes, tools, and strategies to bring data together from disparate sources, transforming it into a unified and usable format for meaningful analysis and decision-making. An overview of a typical data integration process can include the following:

  • Data source identification—identifying the data sources that need to be integrated, such as databases, spreadsheets, cloud services, APIs, legacy systems, and others.
  • Data validation and quality assurance—validation is the process of checking for errors, inconsistencies, and data integrity to ensure accuracy and quality. Quality assurance processes are implemented to maintain data accuracy and reliability.
  • Data transformation—extracted data is converted and structured into a common format that ensures consistency, accuracy, and compatibility. This process may include data cleansing, data enrichment, and data normalization.
  • Metadata management—improving discoverability and usability so users can more easily understand the data’s context, source, and meaning.
...

There are multiple approaches to data integration, each with its own strengths and weaknesses. Selecting the best data integration method depends on a number of factors, including the organization's data needs, technology landscape, performance requirements, and budget constraints. Common approaches include the below:

...

To implement these processes, data engineers, architects, and developers either manually code an architecture using SQL or set up and manage a data integration tool to streamline development and automate the system.

...

An ETL pipeline transforms the data before loading it into the storage system, converting raw data to match the new system via three steps: extract, transform, and load. The data transformation in the ETL process takes place outside of the data storage system, typically in a separate staging area. This allows for fast and accurate data analysis in the target system and is most appropriate for small datasets whichthat require complex transformations or in scenarios when data quality is the most important factor, as it can include rigorous data cleaning and validation steps. Change data capture (CDC) is a popular method of ETL and refers to the process of identifying and capturing changes made to a database.

...

A more modern approach to data integration, in ELT the data is immediately loaded and then transformed within the target system. This can include cleaning, aggregating, or summarizing the data. ELT is more appropriate for large datasets that need to be integrated quickly. ELT operates either on a micro-batch or change data capture (CDC) timescale. Micro-batch loads the data modified since the last successful load. In contrast, CDC continually loads data as and when it changes on the source.

...

Streaming data integration continuously moves data in real-time from the source to the target storage system. Streaming involves capturing and processing data as it becomes available in the source system and immediately integrating it into the target system. It is commonly used in scenarios that require up-to-date insights, and immediately integrating it into the target system. It is commonly used in scenarios that require up-to-date insights such as real-time analytics

...

Data virtualization creates a virtual layer to provide a unified view of data from different sources, regardless of the data's source. Organizations can access and query integrated data in real-timereal time without the need for physical data movement. It is well-suited to scenarios where agility and real-time access to integrated data are crucial, or transactional systems needingneed high-performance queries.

Find more entities like Data integration

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.