Patent attributes
Methods for index hopping sequence read filtering are provided. Each read in a plurality of reads from a multiplexed reaction comprises an insert portion, and first (molecular identifier) and second (sample index) non-insert portions. For each of a plurality of hashes, a hash data structure is formed with a representation of each read. Each representation comprises a hash of the first non-insert portion of the corresponding read. Read pairs are identified in the hash data structures. Each pair includes a first and second read sharing a common hash value but differing index values. An entry is added into a heterogeneous data structure, for each such pair, that includes the first and second non-insert portions of the first and second reads of the pair. Reads with first non-insert portion values appearing more than a threshold number of times in the heterogeneous data structure are removed from the plurality of reads.