Skip to content

Commit

Permalink
Merge 'develop' into 'mfwitten/0007/prepend-paths'
Browse files Browse the repository at this point in the history
  • Loading branch information
mfwitten committed Jan 23, 2019
2 parents c0ba1b5 + 5a5cf89 commit 0d332c9
Show file tree
Hide file tree
Showing 23 changed files with 364 additions and 162 deletions.
2 changes: 1 addition & 1 deletion .version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.7.0 Toothless Taipan
2.8.0 Maidenly Moose
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ All notable changes to this project will be documented in this file.

The format follows [keepachangelog.com]. Please stick to it.

## [2.8.0 Maidenly Moose] -- Unreleased
## [2.8.0 Maidenly Moose] -- 2018-10-30

Mostly a bugfix release with smaller functional changes.

Expand All @@ -23,6 +23,7 @@ Mostly a bugfix release with smaller functional changes.
This was done to not to overwrite the results of long runs.
You can use --no-backup to disable this behaviour.
- Several internal cleanups and potential bug fixes (thanks to Michael Witten)
- Change the default optimization level for a build to -O2.

### Deprecated

Expand All @@ -43,6 +44,9 @@ Nothing was removed.
- Fix a bug when doing "rmlint --replay x.json" without an explicit path.
- Fix -f that did not really follow symbolic links.
- gui: locations are now stored persistently and survive restarts.
- scons should work now with both python2 and python3.
- extensive memory allocation with slow CPUs.
- Do not use --remove-destination of cp, but use "rm + ln" to support non-GNU systems.

## [2.7.0 Toothless Taipan] -- 2017-04-25

Expand Down
4 changes: 1 addition & 3 deletions SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,7 @@ else:

# If the output is not a terminal, remove the COLORS
if not sys.stdout.isatty():
for key, value in COLORS.iteritems():
for key, value in COLORS.items():
COLORS[key] = ''

# Configure the actual colors to our liking:
Expand Down Expand Up @@ -663,8 +663,6 @@ else:

# check _mm_crc32_u64 (SSE4.2) support:
conf.check_mm_crc32_u64()
if conf.env['HAVE_MM_CRC32_U64']:
conf.env.Append(CCFLAGS=['-msse4.2'])

if 'clang' in os.path.basename(conf.env['CC']):
conf.env.Append(CCFLAGS=['-fcolor-diagnostics']) # Colored warnings
Expand Down
2 changes: 1 addition & 1 deletion docs/gui.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ to delete this file, a green checkmark will make it keep it.
The user can edit those to his liking.

Additionally, the view can be filtered after a search query. In the simplest
case this filters by a path element, in more complex usecases you can also
case this filters by a path element, in more complex use cases you can also
filter by size, mtime and twincount. The latter can be done by adding
``size:10K`` or ``size:1M-2M,3M-4M`` to the query (similar with ``mtime:`` and
``count:``)
Expand Down
6 changes: 2 additions & 4 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,10 @@ Here's a list of readily prepared commands for known operating systems:

.. code-block:: bash
$ dnf copr enable sahib/rmlint
$ dnf copr enable eclipseo/rmlint
$ dnf install rmlint
Those packages are built from master snapshots and might be outdated.

.. _`Fedora Copr`: https://copr.fedoraproject.org/coprs/sahib/rmlint/
.. _`Fedora Copr`: https://copr.fedorainfracloud.org/coprs/eclipseo/rmlint/

* **ArchLinux:**

Expand Down
69 changes: 37 additions & 32 deletions docs/rmlint.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ It's main focus lies on finding duplicate files and directories.
It is able to find the following types of lint:

* Duplicate files and directories (and as a result unique files).
* Nonstripped Binaries (Binaries with debug symbols; needs to be explicityl enabled).
* Nonstripped Binaries (Binaries with debug symbols; needs to be explicitly enabled).
* Broken symbolic links.
* Empty files and directories (also nested empty directories).
* Files with broken user or group id.
Expand All @@ -33,26 +33,27 @@ output (for example a shell script) to help you delete the files if you want
to. Another design principle is that it should work well together with other
tools like ``find``. Therefore we do not replicate features of other well know
programs, as for example pattern matching and finding duplicate filenames.
However we provide many convinience options for common usecases that are hard
However we provide many convenience options for common use cases that are hard
to build from scratch with standard tools.

In order to find the lint, ``rmlint`` is given one or more directories to traverse.
If no directories or files were given, the current working directory is assumed.
By default, ``rmlint`` will ignore hidden files and will not follow symlinks (see
traversal options below). ``rmlint`` will first find "other lint" and then search
`Traversal Options`_). ``rmlint`` will first find "other lint" and then search
the remaining files for duplicates.

``rmlint`` tries to be helpful by guessing what file of a group of duplicates
is the **original** (i.e. the file that should not be deleted). It does this by using
different sorting strategies that can be controlled via the ``-S`` option. By
default it chooses the first-named path on the commandline. If two duplicates
come from the same path, it will also apply different fallback sort strategies (See the documentation of the ``-S`` strategy).
come from the same path, it will also apply different fallback sort strategies
(See the documentation of the ``-S`` strategy).

This behaviour can be also overwritten if you know that a certain directory
contains duplicates and another one originals. In this case you write the
original directory after specifying a single ``//`` on the commandline.
Everything that comes after is a preferred (or a "tagged") directory. If there
are duplicates from a unpreferred and from a preffered directory, the preferred
are duplicates from an unpreferred and from a preferred directory, the preferred
one will always count as original. Special options can also be used to always
keep files in preferred directories (``-k``) and to only find duplicates that
are present in both given directories (``-m``).
Expand Down Expand Up @@ -99,10 +100,10 @@ General Options
double quotes. In obscure cases argument parsing might fail in weird ways,
especially when using spaces as separator.

Example:
Example::

``$ rmlint -T "df,dd" # Only search for duplicate files and directories``
``$ rmlint -T "all -df -dd" # Search for all lint except duplicate files and dirs.``
$ rmlint -T "df,dd" # Only search for duplicate files and directories
$ rmlint -T "all -df -dd" # Search for all lint except duplicate files and dirs.

:``-o --output=spec`` / ``-O --add-output=spec`` (**default\:** *-o sh\:rmlint.sh -o pretty\:stdout -o summary\:stdout -o json\:rmlint.json*):

Expand All @@ -118,10 +119,10 @@ General Options
specified multiple times to get multiple outputs, including multiple
outputs of the same format.

Examples:
Examples::

``$ rmlint -o json # Stream the json output to stdout``
``$ rmlint -O csv:/tmp/rmlint.csv # Output an extra csv fle to /tmp``
$ rmlint -o json # Stream the json output to stdout
$ rmlint -O csv:/tmp/rmlint.csv # Output an extra csv fle to /tmp

:``-c --config=spec[=value]`` (**default\:** *none*):

Expand All @@ -131,10 +132,10 @@ General Options

If the value is omitted it is set to a value meaning "enabled".

Examples:
Examples::

``$ rmlint -c sh:link # Smartly link duplicates instead of removing``
``$ rmlint -c progressbar:fancy # Use a different theme for the progressbar``
$ rmlint -c sh:link # Smartly link duplicates instead of removing
$ rmlint -c progressbar:fancy # Use a different theme for the progressbar

:``-z --perms[=[rwx]]`` (**default\:** *no check*):

Expand Down Expand Up @@ -162,14 +163,14 @@ General Options

**highway**, **md**

**metro**, **murmur**, *xxhash**
**metro**, **murmur**, **xxhash**

The weaker hash functions still offer excellent distribution properties, but are potentially
more vulnerable to *malicious* crafting of duplicate files.

The full list of hash functions (in decreasing order of checksum length) is:

512-bit: **blake2b**, **blake2bp**, **sha3-512, **sha512**
512-bit: **blake2b**, **blake2bp**, **sha3-512**, **sha512**

384-bit: **sha3-384**,

Expand Down Expand Up @@ -300,7 +301,7 @@ Traversal Options
2 Megabyte.

It's also possible to specify only one size. In this case the size is
interpreted as *"bigger or equal"*. If you want to to filter for files
interpreted as *"bigger or equal"*. If you want to filter for files
*up to this size* you can add a ``-`` in front (``-s -1M`` == ``-s 0-1M``).

**Edge case:** The default excludes empty files from the duplicate search.
Expand All @@ -313,7 +314,7 @@ Traversal Options
:``-d --max-depth=depth`` (**default\:** *INF*):

Only recurse up to this depth. A depth of 1 would disable recursion and is
equivalent to a directory listing. A depth of 2 would also consider also all
equivalent to a directory listing. A depth of 2 would also consider all
children directories and so on.

:``-l --hardlinked`` (**default**) / ``--keep-hardlinked`` / ``-L --no-hardlinked``:
Expand Down Expand Up @@ -497,17 +498,17 @@ Caching
duplicates from previous runs are printed. Therefore specifying new paths
will simply have no effect. As a security measure, ``--replay`` will ignore
files whose mtime changed in the meantime (i.e. mtime in the ``.json`` file
differes from the current one). These files might have been modified and
differs from the current one). These files might have been modified and
are silently ignored.

By design, some options will not have any effect. Those are:

- `--followlinks`
- `--algorithm`
- `--paranoid`
- `--clamp-low`
- `--hardlinked`
- `--write-unfinished`
- ``--followlinks``
- ``--algorithm``
- ``--paranoid``
- ``--clamp-low``
- ``--hardlinked``
- ``--write-unfinished``
- ... and all other caching options below.

*NOTE:* In ``--replay`` mode, a new ``.json`` file will be written to
Expand All @@ -526,7 +527,7 @@ Caching
Also, this is a linux specific feature that works not on all filesystems and
only if you have write permissions to the file.

Usage example: ::
Usage example::

$ rmlint large_file_cluster/ -U --xattr-write # first run.
$ rmlint large_file_cluster/ --xattr-read # second run.
Expand Down Expand Up @@ -556,13 +557,13 @@ Rarely used, miscellaneous options
Apply a maximum number of memory to use for hashing and **--paranoid**.
The total number of memory might still exceed this limit though, especially
when setting it very low. In general ``rmlint`` will however consume about this
amont of memory plus a more or less constant extra amount that depends on the
amount of memory plus a more or less constant extra amount that depends on the
data you are scanning.

The ``size``-description has the same format as for **--size**, therefore you
can do something like this (use this if you have 1GB of memory available):

``$ rmlint -u 512M # Limit paranoid mem usage to 512 MB```
``$ rmlint -u 512M # Limit paranoid mem usage to 512 MB``

:``-q --clamp-low=[fac.tor|percent%|offset]`` (**default\:** *0*) / ``-Q --clamp-top=[fac.tor|percent%|offset]`` (**default\:** *1.0*):

Expand Down Expand Up @@ -632,7 +633,7 @@ FORMATTERS
* ``clone``: For reflink-capable filesystems only. Try to clone both files with the
FIDEDUPERANGE ``ioctl(3p)`` (or BTRFS_IOC_FILE_EXTENT_SAME on older kernels).
This will free up duplicate extents. Needs at least kernel 4.2.
Use this option when you only have read-only acess to a btrfs filesystem but still
Use this option when you only have read-only access to a btrfs filesystem but still
want to deduplicate it. This is usually the case for snapshots.
* ``reflink``: Try to reflink the duplicate file to the original. See also
``--reflink`` in ``man 1 cp``. Fails if the filesystem does not support
Expand All @@ -647,7 +648,7 @@ FORMATTERS
* ``remove``: Remove the file using ``rm -rf``. (``-r`` for duplicate dirs).
This handler never fails.
* ``usercmd``: Use the provided user defined command (``-c
sh:cmd=something``). Never fails.
sh:cmd=something``). This handler never fails.

Default is ``remove``.

Expand All @@ -674,7 +675,7 @@ FORMATTERS

* ``py``: Outputs a python script and a JSON document, just like the **json** formatter.
The JSON document is written to ``.rmlint.json``, executing the script will
make it read from there. This formatter is mostly intented for complex use-cases
make it read from there. This formatter is mostly intended for complex use-cases
where the lint needs special handling that you define in the python script.
Therefore the python script can be modified to do things standard ``rmlint``
is not able to do easily.
Expand Down Expand Up @@ -793,7 +794,7 @@ OTHER STAND-ALONE COMMANDS
EXAMPLES
========

This is a collection of common usecases and other tricks:
This is a collection of common use cases and other tricks:

* Check the current working directory for duplicates.

Expand Down Expand Up @@ -853,6 +854,10 @@ This is a collection of common usecases and other tricks:

``$ rmlint -o sh -c sh:cmd='echo "original:" "$2" "is the same as" "$1"'``

* Use ``shred`` to overwrite the contents of a file fully:

``$ rmlint -c 'sh:cmd=shred -un 10 "$1"'``

* Use *data* as master directory. Find **only** duplicates in *backup* that are
also in *data*. Do not delete any files in *data*:

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ Here's the list of currently available formatters and their config options:
python script will find it there. The default python script produced by rmlint does
pretty much the same thing as the shell script described above (although not reflinking
or hardlinking or symlinking at the moment). You can customise the python script for
just about any usecase (Python is a simple and extremely powerful programming language).
just about any use case (Python is a simple and extremely powerful programming language).

**Example:**

Expand Down Expand Up @@ -465,7 +465,7 @@ a stronger hash function or to do a byte-by-byte comparison. While this might so
slow it's often only a few seconds slower than the default behaviour.

There is a bunch of other hash functions you can lookup in the manpage.
We recommend never to use anythinh worse than the default.
We recommend never to use anything worse than the default.

.. note::

Expand Down
6 changes: 3 additions & 3 deletions gui/shredder/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,9 @@ def _create_rmlint_process(

min_size, max_size = cfg.get_value('traverse-size-limits')
extra_options += [
'--size', '{a}M-{b}M'.format(
a=min_size // (1024 ** 2),
b=max_size // (1024 ** 2)
'--size', '{a}-{b}'.format(
a=min_size,
b=max_size
)
]

Expand Down
Loading

0 comments on commit 0d332c9

Please sign in to comment.