Patent attributes
An integrated data pipeline can take advantage of a streaming service, which can handle tasks such as automated redelivery, as well as a processing service, which can allocate workers on a task- or event-specific basis. Event data is aggregated and compressed for delivery by the streaming service. The streaming service can deliver the data asynchronously to the processing service, which can disaggregate and decompress the data to obtain the original data records. The type of event for each record can be determined to determine whether the data should be processed using online and/or offline processing. For online processing the appropriate fields are determined and data extracted to be passed to the online processing services. For offline processing the record data is concatenated sequentially into mini-batches, then compacted into larger batch files that are stored for subsequent offline processing.