Added KB page about duperemove

This commit is contained in:
Michele Cereda
2022-05-16 02:09:09 +02:00
parent 47f004dcf5
commit 535cf70918

View File

@@ -0,0 +1,44 @@
# Duperemove
Finds duplicated extents and submits them for deduplication.
When given a list of files, `duperemove` hashes their contents block by block and compares them.
When given the `-d` option, `duperemove` also submits duplicated extents for deduplication using the Linux kernel extent-same ioctl.
`duperemove` can store the hashes it computes in a hashfile. If given an existing hashfile in input, it only computes hashes for those files which have changed since the last run. This lets you run `duperemove` repeatedly on your data as it changes, without having to re-checksum unchanged data.
`duperemove` can also take input from `fdupes`, given the `--fdupes` option.
## TL;DR
```shell
# Recursively search for duplicated extents in a directory.
duperemove -hr path/to/directory
# Recursively deduplicate duplicated extents on a Btrfs or XFS filesystem.
# XFS deduplication is still experimental at the time of writing.
duperemove -Adhr path/to/directory
# Store extent hashes in a file.
# Hogs less memory and can be reused on subsequent runs.
duperemove -Adhr --hashfile=path/to/hashfile path/to/directory
# List the files tracked by hashfiles.
duperemove -L --hashfile=path/to/hashfile
# Limit threads.
# I/O threads are used for hashing and in the deduplication stage.
# CPU threads are used in the duplicate extent finding stage.
duperemove -Adhr --hashfile=path/to/hashfile --io-threads=N --cpu-threads=N path/to/directory
```
## Sources
- [Website]
- [cheat.sh]
- [manpage]
[cheat.sh]: https://cheat.sh/duperemove
[manpage]: https://markfasheh.github.io/duperemove/duperemove.html
[website]: https://markfasheh.github.io/duperemove/