PDF Duplicate Finder

A powerful tool to find and manage duplicate PDF files on your computer. PDF Duplicate Finder helps you identify and remove duplicate PDF documents, saving disk space and organizing your files more efficiently.
β¨ Features
- π Smart PDF Comparison: Find duplicate PDFs based on content, not just file names or sizes
- π Text-based Comparison: Identify duplicates even with minor visual differences using advanced text analysis
- π Built-in PDF Viewer: Preview PDFs directly within the application
- π Dual-View Interface: View both file list and duplicates groups in separate tabs
- π― Advanced Filtering: Filter by file size, modification date, and name patterns
- π Fast Scanning: Optimized algorithms for quick scanning of large PDF collections
- π¨ Intuitive UI: Clean and user-friendly interface with light/dark theme support
- π Batch Processing: Process multiple files or entire folders at once
- π Detailed Analysis: View file details, previews, and comparison results
- π Advanced Tools: Multiple selection modes, filtering, and sorting options
- π Multi-language Support: Available in multiple languages
- π Progress Tracking: Real-time progress bar for file processing operations
- β± Recent Files: Quick access to recently opened files with context menu options
π¦ Installation
Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- Optional backends for PDF rendering (Auto falls back safely):
- PyMuPDF (fitz) β default and bundled via requirements
- Ghostscript (for Wand) β install Ghostscript and set its executable path in Settings
See PREREQUISITES.md for platform-specific setup.
Install from source
-
Clone the repository:
git clone https://github.com/Nsfr750/PDF_finder.git
cd PDF_finder
-
Create and activate a virtual environment (recommended):
python -m venv venv
.\venv\Scripts\activate # Windows
source venv/bin/activate # Linux/Mac
-
Install the required dependencies:
pip install -r requirements.txt
Usage
-
Launch the application:
-
Click βScan Folderβ to select a directory to scan for duplicate PDFs.
-
Review the results in the main window. After a scan completes, the file list is automatically populated with the scanned PDFs and duplicate groups.
-
Use the tools to manage duplicates:
- Mark files to keep
- Delete unwanted duplicates
- Preview files before taking action
Key Features in Detail
Smart PDF Comparison
- Compares PDF content using advanced hashing algorithms
- Detects similar documents even with different file names or metadata
- Configurable similarity threshold for fine-tuned results
- Multi-threaded scanning for faster processing
- Memory-efficient handling of large PDF files
- Progress tracking and cancellation support
User Experience
- Modern, responsive interface
- Customizable view options
- Comprehensive keyboard shortcuts
- Detailed file information and previews
- Toolbar with improved spacing and visual clarity
- Settings dialog includes a βTest backendsβ button to validate PyMuPDF and Ghostscript availability
PDF Backends and Fallback
- Choose your preferred backend in Settings β PDF Rendering
- Use βTest backendsβ to verify if Ghostscript are correctly configured
- If the selected backend fails, the app falls back to an available backend and shows a status-bar warning (localized)
Version History
See CHANGELOG.md for a complete list of changes in each version.
Contributing
Contributions are welcome! Please read our Contributing Guidelines for details on how to contribute to this project.
π License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
π Acknowledgments
- Thanks to all contributors who have helped improve PDF Duplicate Finder
- Built with β€οΈ using Python and PyQt6
π Known Bugs
- Language selection doesnβt work
π
Last Updated: August 2025
π Python Version: 3.8+
π License: GPL-3.0