Patent attributes
The disclosed embodiments provide techniques for performing deduplication for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers cache and ensure data consistency for the stored data. During operation, a cloud controller receives an incremental metadata snapshot that references new data that was added to the distributed filesystem by a remote cloud controller. The cloud controller extracts a set of deduplication information from this incremental metadata snapshot. Upon receiving a subsequent client write request (e.g., a file write that includes one or more data blocks), the cloud controller uses the extracted deduplication information to determine that one or more data blocks in the client write request have already been written to the distributed filesystem.