Patent 11531484 was granted and assigned to Twitter on December, 2022 by the United States Patent and Trademark Office.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data retention and modification. One of the methods includes dividing partitions into a set of generations according to a retention policy; accumulating modification and deletion events that define changes to be applied to data of the distributed dataset; and when a triggering event occurs for a triggered generation in the set of generations, rolling an oldest partition out of the triggered generation, the rolling comprising: if the oldest partition has reached the end of a retention period for the dataset, marking the oldest partition for deletion in the triggered generation; otherwise: creating a new partition corresponding to the data of the oldest partition, wherein the data is cleaned using a scrubbing process; adding the new partition to a next generation in the set of generations; and marking the oldest partition for deletion in the triggered generation.