How I Built Fylex: A Fast Python Tool to Clean, Organize, and De-Duplicate Files

Let me guess. You’ve got files named things like:
report_final(1).docx
vacation-final-final.jpg
copy_of_copy_of_final_draft.pdf
Sound familiar?
I’ve been there. My Downloads folder was a digital landfill—overflowing with duplicates, mismatched names, and backups of backups. The problem? I didn’t want to manually clean hundreds of files or rely on clunky GUI tools that didn’t give me control.
So I built a Python tool that thinks before it copies, and cleans like it’s caffeinated. Meet fylex.
What Is fylex?
fylex is a Python-powered file management utility focused on speed, clarity, and control.
It helps you:
Detect and remove duplicates using fast hashing
Copy and move files with rules and safety checks
Categorize files by extension, size, or custom patterns
Handle conflicts intelligently (rename, skip, replace, etc.)
Use dry-run and interactive modes to stay safe
And it does all of this without bloated GUIs. It’s designed for devs, power users, and automation enthusiasts.
Why I Built It
I was tired of writing one-off scripts every time I needed to clean up a directory. Tools like shutil
didn’t help when I wanted to:
Avoid copying duplicates
Detect renamed but identical files
Organize large dumps of media or documents
So I turned this frustration into fylex, a CLI-optional library you can call directly from Python.
Detecting Duplicates — The Smart Way
Unlike traditional scripts that use SHA1
or MD5
, fylex uses xxhash
, which is lightning fast for large files.
Here’s how to use it:
import fylex as fx
fx.refine(
target="~/Downloads",
match_glob="*.mp3",
recursive_check=True,
dry_run=False, # Set to True to preview
verbose=True
)
This will:
Recursively scan for
.mp3
filesGroup by file size
Hash candidates using
xxhash
Move duplicates to a backup folder instead of deleting them outright
Safety and speed, all in one.
Organize Files by Extension or Size
Need to organize files in a directory dump?
fx.categorize(target="./Documents", categorize_by="ext")
Or, group by file size:
fx.categorize_by_size(
target=".",
grouping={
(0, 1_000_000): "tiny/",
(1_000_000, 100_000_000): "medium/",
(100_000_000, "max"): "huge/"
}
)
Clean organization in seconds — no manual sorting needed.
What Makes fylex Different?
Fast hashing: Uses
xxhash
instead of slow SHA1 or MD5Multi-threaded operations: Leverages your CPU for massive speedups
Conflict strategies: Choose from
rename
,skip
,replace
,larger
,newer
, and moreDry-run mode: See what’ll happen before it does
Regex + glob support: Advanced filtering made simple
Safe deletion: Moves duplicates to a fallback folder, never deletes blindly
It’s not just a tool — it’s a defensive shield against digital clutter.
Real-World Use Cases
Cleaning up duplicate downloads
Tidying up music or photo libraries
Deduplicating backup drives
Automating file categorization in CI pipelines
Building data preprocessing scripts that avoid copying garbage
Installation
Get started in seconds:
pip install fylex
Full documentation and examples on PyPI
Who Should Use fylex?
Developers who want control over file ops
Researchers working with large datasets
Media professionals managing bulky files
Anyone tired of cleaning files manually
If you’ve ever written a Python script to move, rename, or delete files, you’ll appreciate what fylex can do out of the box.
Final Thoughts
I built fylex because I was tired of fighting my filesystem.
It turned into a fast, flexible utility I now use in almost every personal or work project involving file handling. It’s lightweight, safe by default, and battle-tested on real-world chaos like 20-year-old USB backups and messy project dumps.
If that sounds like your situation, fylex might be the utility you never knew you needed.
Give it a try. And if it cleans up even one layer of your file chaos, it’s done its job.
Links
📦 PyPI: https://pypi.org/project/fylex
🧠 Docs: built-in
help(fylex)
or explore the code🤝 Contributions: GitHub
Subscribe to my newsletter
Read articles from Sivaprasad Murali directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
