How I Built Fylex: A Fast Python Tool to Clean, Organize, and De-Duplicate Files

Let me guess. You’ve got files named things like:

  • report_final(1).docx

  • vacation-final-final.jpg

  • copy_of_copy_of_final_draft.pdf

Sound familiar?

I’ve been there. My Downloads folder was a digital landfill—overflowing with duplicates, mismatched names, and backups of backups. The problem? I didn’t want to manually clean hundreds of files or rely on clunky GUI tools that didn’t give me control.

So I built a Python tool that thinks before it copies, and cleans like it’s caffeinated. Meet fylex.


What Is fylex?

fylex is a Python-powered file management utility focused on speed, clarity, and control.

It helps you:

  • Detect and remove duplicates using fast hashing

  • Copy and move files with rules and safety checks

  • Categorize files by extension, size, or custom patterns

  • Handle conflicts intelligently (rename, skip, replace, etc.)

  • Use dry-run and interactive modes to stay safe

And it does all of this without bloated GUIs. It’s designed for devs, power users, and automation enthusiasts.


Why I Built It

I was tired of writing one-off scripts every time I needed to clean up a directory. Tools like shutil didn’t help when I wanted to:

  • Avoid copying duplicates

  • Detect renamed but identical files

  • Organize large dumps of media or documents

So I turned this frustration into fylex, a CLI-optional library you can call directly from Python.


Detecting Duplicates — The Smart Way

Unlike traditional scripts that use SHA1 or MD5, fylex uses xxhash, which is lightning fast for large files.

Here’s how to use it:

import fylex as fx

fx.refine(
    target="~/Downloads",
    match_glob="*.mp3",
    recursive_check=True,
    dry_run=False,  # Set to True to preview
    verbose=True
)

This will:

  • Recursively scan for .mp3 files

  • Group by file size

  • Hash candidates using xxhash

  • Move duplicates to a backup folder instead of deleting them outright

Safety and speed, all in one.


Organize Files by Extension or Size

Need to organize files in a directory dump?

fx.categorize(target="./Documents", categorize_by="ext")

Or, group by file size:

fx.categorize_by_size(
    target=".",
    grouping={
        (0, 1_000_000): "tiny/",
        (1_000_000, 100_000_000): "medium/",
        (100_000_000, "max"): "huge/"
    }
)

Clean organization in seconds — no manual sorting needed.


What Makes fylex Different?

  • Fast hashing: Uses xxhash instead of slow SHA1 or MD5

  • Multi-threaded operations: Leverages your CPU for massive speedups

  • Conflict strategies: Choose from rename, skip, replace, larger, newer, and more

  • Dry-run mode: See what’ll happen before it does

  • Regex + glob support: Advanced filtering made simple

  • Safe deletion: Moves duplicates to a fallback folder, never deletes blindly

It’s not just a tool — it’s a defensive shield against digital clutter.


Real-World Use Cases

  • Cleaning up duplicate downloads

  • Tidying up music or photo libraries

  • Deduplicating backup drives

  • Automating file categorization in CI pipelines

  • Building data preprocessing scripts that avoid copying garbage


Installation

Get started in seconds:

pip install fylex

Full documentation and examples on PyPI


Who Should Use fylex?

  • Developers who want control over file ops

  • Researchers working with large datasets

  • Media professionals managing bulky files

  • Anyone tired of cleaning files manually

If you’ve ever written a Python script to move, rename, or delete files, you’ll appreciate what fylex can do out of the box.


Final Thoughts

I built fylex because I was tired of fighting my filesystem.

It turned into a fast, flexible utility I now use in almost every personal or work project involving file handling. It’s lightweight, safe by default, and battle-tested on real-world chaos like 20-year-old USB backups and messy project dumps.

If that sounds like your situation, fylex might be the utility you never knew you needed.

Give it a try. And if it cleans up even one layer of your file chaos, it’s done its job.



0
Subscribe to my newsletter

Read articles from Sivaprasad Murali directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sivaprasad Murali
Sivaprasad Murali