DuplicateDuster
Find and eliminate duplicate files with 99.2% accuracy.
A high-performance enterprise deduplication tool that scans 500,000 files per pass with 99.2% detection accuracy. Combines MD5 fingerprinting, fuzzy filename matching and binary similarity analysis. DoD 5220.22-M secure deletion built in. 100% local — no data transmitted externally.
Available on
- Windows10 / 11 / Server
- macOS12 Monterey+
- LinuxDebian / RPM / AppImage
What it does.
Three independent algorithms catch what single methods miss.
Stage one: MD5 fingerprinting catches byte-identical duplicates. Stage two: fuzzy filename matching finds same-content-different-name pairs. Stage three: binary similarity catches near-duplicates with slight modifications (resave, re-encode, watermark added). Used together they catch the ~30% of real duplicates that hash-only scanners silently miss.
- MD5 cryptographic hash for exact matches
- Fuzzy filename matching (Levenshtein + token-set)
- Binary similarity for modified copies
- All three weights adjustable per scan profile
Scan 500,000 files in a single pass.
Optimised disk I/O patterns, parallel hash workers and an incremental cache. Index a 500K-file directory tree in minutes, not hours. Subsequent scans only re-hash files that changed — typical re-scans complete in seconds. Useful when nightly maintenance windows are short and audit-trail timestamps need to land before morning.
- Parallel hash workers tuned per CPU
- Incremental cache — only re-hash changed files
- Memory-bounded — works on modest hardware
- Resumable scans for unattended runs
- Bandwidth-aware on network shares (SMB / NFS)
Review every removal. Secure-erase when you mean it.
Nothing gets deleted without explicit confirmation. The review UI groups duplicates with previews, sizes and timestamps so you can pick which version to keep. Optional DoD 5220.22-M secure-erase overwrites file content before unlinking — suitable for sensitive disposals where forensic recovery must be impossible.
- Visual side-by-side comparison before delete
- Batch selection rules (keep newest / largest / specific path)
- DoD 5220.22-M three-pass overwrite
- Quarantine option — move to archive folder before permanent delete
- Session undo for non-secured deletions
Exportable before/after reports for storage reviews.
Detailed reports of every scan: files reviewed, duplicates found, space reclaimed, secure deletions logged. CSV / PDF / JSON exports suitable for compliance audits, storage chargeback systems and operational reviews. The report itself is signed and tamper-evident.
- Before/after storage breakdown by folder
- Per-action audit log (CSV / PDF / JSON)
- Tamper-evident report signing
- Per-user activity summaries for shared systems
- Storage chargeback-ready output
Everything else it does.
Live scan progress
Watch hashes complete in real time. Pause, resume, or cancel without losing partial work.
Smart auto-rules
Keep-newest, keep-largest, keep-in-path policies for unattended runs. All policy decisions audit-logged.
Archive instead of delete
Optionally move duplicates to a quarantine folder for a configurable retention period before final removal.
Custom exclusions
Glob patterns, file-size thresholds, age limits — all configurable per-job.
Network-aware scanning
Handles network-mapped drives and SMB shares without thrashing. Backs off automatically under load.
Scheduled runs
Cron-style scheduling for unattended maintenance. Email summary report when complete.
Per-seat pricing for individuals and teams.
- Solo and team plans available
- Volume discounts above 25 seats
- Cross-platform: Windows, macOS, Linux
- DoD-spec secure-delete included on every tier
Questions we hear often.
Talk to the team that actually builds the software.
Pilot deployments, volume licensing, product demos, security questionnaires — all handled by engineers and product leads, not a routing layer. We respond within one business day.
Send us a message
Leave your details and we'll follow up within one business day.