Storage Hygiene
Coming Soon

DuplicateDuster

Find and eliminate duplicate files with 99.2% accuracy.

A high-performance enterprise deduplication tool that scans 500,000 files per pass with 99.2% detection accuracy. Combines MD5 fingerprinting, fuzzy filename matching and binary similarity analysis. DoD 5220.22-M secure deletion built in. 100% local — no data transmitted externally.

500K
Files per pass
99.2%
Detection accuracy
DoD
5220.22-M secure delete
0
Bytes leave the machine
SEMANTIC SEARCHON-DEVICE OCRAUTO-CLASSIFYLOCAL ONLYDUAL-PANE BROWSERBATCH RENAME REGEXZERO CLOUDLIBREOFFICE CONVERTSEMANTIC SEARCHON-DEVICE OCRAUTO-CLASSIFYLOCAL ONLYDUAL-PANE BROWSERBATCH RENAME REGEXZERO CLOUDLIBREOFFICE CONVERT

Available on

  • Windows
    10 / 11 / Server
  • macOS
    12 Monterey+
  • Linux
    Debian / RPM / AppImage
Core capabilities

What it does.

01
Multi-stage detection

Three independent algorithms catch what single methods miss.

Stage one: MD5 fingerprinting catches byte-identical duplicates. Stage two: fuzzy filename matching finds same-content-different-name pairs. Stage three: binary similarity catches near-duplicates with slight modifications (resave, re-encode, watermark added). Used together they catch the ~30% of real duplicates that hash-only scanners silently miss.

  • MD5 cryptographic hash for exact matches
  • Fuzzy filename matching (Levenshtein + token-set)
  • Binary similarity for modified copies
  • All three weights adjustable per scan profile
01
02
Built for scale

Scan 500,000 files in a single pass.

Optimised disk I/O patterns, parallel hash workers and an incremental cache. Index a 500K-file directory tree in minutes, not hours. Subsequent scans only re-hash files that changed — typical re-scans complete in seconds. Useful when nightly maintenance windows are short and audit-trail timestamps need to land before morning.

  • Parallel hash workers tuned per CPU
  • Incremental cache — only re-hash changed files
  • Memory-bounded — works on modest hardware
  • Resumable scans for unattended runs
  • Bandwidth-aware on network shares (SMB / NFS)
02
03
Safe by default

Review every removal. Secure-erase when you mean it.

Nothing gets deleted without explicit confirmation. The review UI groups duplicates with previews, sizes and timestamps so you can pick which version to keep. Optional DoD 5220.22-M secure-erase overwrites file content before unlinking — suitable for sensitive disposals where forensic recovery must be impossible.

  • Visual side-by-side comparison before delete
  • Batch selection rules (keep newest / largest / specific path)
  • DoD 5220.22-M three-pass overwrite
  • Quarantine option — move to archive folder before permanent delete
  • Session undo for non-secured deletions
03
Document
dup_A_v1.psd
Document
dup_A_v2.psd
Document
dup_A_final.psd
04
Reports for compliance

Exportable before/after reports for storage reviews.

Detailed reports of every scan: files reviewed, duplicates found, space reclaimed, secure deletions logged. CSV / PDF / JSON exports suitable for compliance audits, storage chargeback systems and operational reviews. The report itself is signed and tamper-evident.

  • Before/after storage breakdown by folder
  • Per-action audit log (CSV / PDF / JSON)
  • Tamper-evident report signing
  • Per-user activity summaries for shared systems
  • Storage chargeback-ready output
04
assistant
$ dustmatch report --scan latest
▸ Files reviewed: 500,142
▸ Duplicates found: 83,217
▸ Space reclaimed: 1.4 TB
▸ Audit hash: 0xab12...e9f
More capabilities

Everything else it does.

Live scan progress

Watch hashes complete in real time. Pause, resume, or cancel without losing partial work.

Smart auto-rules

Keep-newest, keep-largest, keep-in-path policies for unattended runs. All policy decisions audit-logged.

Archive instead of delete

Optionally move duplicates to a quarantine folder for a configurable retention period before final removal.

Custom exclusions

Glob patterns, file-size thresholds, age limits — all configurable per-job.

Network-aware scanning

Handles network-mapped drives and SMB shares without thrashing. Backs off automatically under load.

Scheduled runs

Cron-style scheduling for unattended maintenance. Email summary report when complete.

Licensing

Per-seat pricing for individuals and teams.

  • Solo and team plans available
  • Volume discounts above 25 seats
  • Cross-platform: Windows, macOS, Linux
  • DoD-spec secure-delete included on every tier
Frequently asked

Questions we hear often.

99.2% measured across a 50-million-file test corpus combining personal media, business documents and software archives. The three-stage pipeline (MD5 + fuzzy + binary similarity) catches significantly more duplicates than hash-only tools.

Talk to the team that actually builds the software.

Pilot deployments, volume licensing, product demos, security questionnaires — all handled by engineers and product leads, not a routing layer. We respond within one business day.

Schedule a discovery call
Half-hour walkthrough with someone who built the product — no sales script.
Run a pilot deployment
Full-feature evaluation with guided install, configured for your environment.
Email us directly
sales@royalsoftworks.com — we respond within one business day.

Send us a message

Leave your details and we'll follow up within one business day.