Patent attributes
Systems and methods are provided for online maintenance, processing, and querying of large random samples of data from a large/infinite data stream. In an illustrative implementation an exemplary computing environment comprises at least one data store, a data storage and management engine operable to process and/or store data according to a selected data processing and storage management paradigm on a cooperating data store (e.g., flash media). The exemplary data storage and management engine can deploy the exemplary sampling algorithm to perform and/or provide one or more of the following operations/features comprising the algorithm is operable for streaming data (or a single pass through the dataset), allows for the semi-random data write operations, the algorithm avoids operations (e.g., in-place updates) that are expensive on flash storage media, and the algorithm is tunable to both the amount of flash storage and the amount of standard memory (DRAM) available to the algorithm.