Patent attributes
Deduplication is integrated with software building and chunk storing. A dedup module includes dedup software, a build graph interface, and a chunk store interface. A dedup graph includes a portion of the build graph, and a portion that represents build artifact file chunks. The dedup software queries whether chunks are present in the chunk store, submits a chunk for storage when the chunk is not already present, and avoids submitting the chunk when it is present. Queries may use hash comparisons, a hash tree dedup graph, chunk expiration dates, content addressable chunk store memory, inference of a child node's presence, recursion, and a local cache of node hashes and node expiration dates, for example. A change caused by the build impacts fewer dedup graph nodes than directory graph nodes, resulting in fewer storage operations to update the chunk storage with new or changed build artifacts.