Removing duplicated files in server

I have a wordpress install with heaps of duplicated images and file uploads, really, its a mess.
Need to reduce the size of the website and I thought I could run a bash script to remove the duplicated files and replace them with hard links.
Now, would this work? has anyone attempted before?

Tagged:

Comments

  • sha256sum your files, compare. Obviously check sizes/etc, but a collision is not horribly likely. I'd suggest a soft link.

    My pronouns are like/subscribe.

  • If you do not need files to be able to diverge later, then yes it would work - relatively straightforward, see above.

    Otherwise you need some pro dedup tool (or filesystem).

  • I have used fdupes in the past with some success. You could setup a script/line to remove any, but I generally just have the output put into a text file to manually review before removing anything

    Thanked by (1)lgsin
  • Increase disk space and forget it.

    Thanked by (1)localhost
  • InceptionHostingInceptionHosting Hosting ProviderOG

    I remember going through a similar thing about 10 years ago, it did not end well.

    Thanked by (1)vimalware

    https://inceptionhosting.com
    Please do not use the PM system here for Inception Hosting support issues.

  • Another possibility is to copy/rebase onto a btrfs filesystem and use bedup (extent-panel dedup). Then you get copy-on-write if you need to make modifications. ZFS is another option.

    Thanked by (1)vimalware
  • I disagree.
    You need to increase website size.
    The bigger the wordpress, the stronger you become.

    Thanked by (1)FlamingSpaceJunk
  • I have used fslint and it worked for me.

Sign In or Register to comment.