Firstly I have a ton of music files in three folders Archives, Sort and Sort2. The following steps are required to be done

  1. Put all the files in to C:\Cauldron\Personal\rhythm\input
  2. Count the files – There are 1538 files in total out of which 1523 files have one of the following extensions, mp3,MP3,m4a,aa
  3. Run through these files and populate non-music files in to C:\Cauldron\Personal\rhythm\output_crap
  4. Now I have a sanitized input folder with 1524 files
  5. Now run the md5 function. There are 540 duplicates in my sanitized input folder.Move all the duplicate files to C:\Cauldron\Personal\rhythm\output_duplicates
  6. effbot site is fantastic as I came to know of the way to extract file name from full path
  7. Now that all duplicates have been moved to duplicates folder , move the rest of the files to output folder
  8. The number of files that will be present in the output folder should match the files in the input folder.If they don’t match, it means that there are files with the same name but different checksums
  9. There were 984 files to be copied and program copied 978 files. This means that 6 files were overwritten. I don’t care
  10. So, now after removing video files, duplicates, there are 978 files that I need to sort in to various albums

978 Music files to be sorted – Someday I will do it, but for now Python has helped me to remove duplicates from my music directory