Managing data in a cloud computing environment, including data transfers. File level and block level similarities are identified, including for archive and nested archive files, residing on datacenters and regional repositories. A replication plan is generated based on receiving a replication instruction, and further based on similarity clusters by transferring unique data blocks and files from best available sources including regional repositories.