Patent attributes
A method for efficiently processing and storing large data sets associated with a multi-stage bioinformatics analysis of genomic data is disclosed. The present method increases the efficiency of the electronic storage of these large data sets by automatically deleting or compressing intermediate data or a portion of output data and compressing input data, where both deletion and compression are based on predetermined characteristics of said data. When necessary, such data can be recovered using generated metadata associated with the data. Doing so, not only improves the storage efficiency of massively large genomic datasets, but also allows for the consistent reproduction of output data with the re-processing of intermediate data based on information stored in metadata.