Update 2011-05-28: Looks like someone else already did a very good job creating something like the (Download) File Organizer Tool. It's called "Sick Beard" and does a bit more on some parts and a bit less on other parts.

Checkout http://sickbeard.com

You can use it standalone or in combination with SABnzbd+ (they both share a look-and-feel so even that is a plus.

So the "(Download) File Organizer Tool" project is now cancelled (was it even started ;)) My NAS download discs are getting full so I tried some disc cleaning. As I've automated my downloads I often download multiple versions of files (eg. tv series in different formats), so while cleaning out those duplicated files...{insert very boring process}... I thought of a great tool that could handle all this for me.
/me programmer == /me lazy == /me wants to automate as much as possible
I got the idea of a "(Download) File Organizer Tool", a tool that will do everything you want with your files automatically, LOL. A bit much, so I cramped it to:
  • Define one or more directories to work on (incl. setting like recursive or not, how deep to go recursive, specifically including or excluding directories and/or files)
  • Define actions to handle, where an action can be something like:
    • (for tv-series) On directory ... find all files that loosely have the same filename. Order them by mkv,x264,hdtv,avi. Remove the duplicates and keep top ordered file. Delete all files that don't match one of the order keywords. Move all files into a directory structure based on pattern ... (eg. {title}/season {season#}/{title}.s{season#}e{episode#}.{extension}). When unsure what to do, add file to "problem files"-list for manual fixing (and improving action for the next run).
  • When creating the action, it should be possible to dry-run it and see the result of what it would look like if it was executed for real
  • It should be possible to run all or specific actions at specific times, intervals or be trigger by an external tool (eg. cron job, SABnzbd+, ...)
It looks to me that all this can be handled with basic *nix commands, but maybe Groovy will be a better fit because of it's cross-platformness. The steps to take should be easy (for the above mentioned action):
  1. Define base directory to work on
  2. Get list of all files (full paths) in directory (recursively)
  3. Find duplicated filenames (case-insensitive, loosely/fuzzy compare)
  4. Order group of duplicated filenames by the given order of keywords
  5. Remove all files except the first/top one
  6. Recognize tv serie title, season, episode and extention
  7. If not exists, create directory structure based on found params
  8. Move and rename file to new directory structure
  9. Show list of files that were not processed at the end of the process as "problem files"
/me happy with the idea, now trying to find some time to build it
I've started a public project on GitHub: http://github.com/pvdissel/dfot And a Pivotaltracker project: http://www.pivotaltracker.com/projects/132698 Update (2010-10-29 @21:00) : After another major HD cleanup to get some free space. I also came to the conclusion that, to get a filtering algorithm to be able to work correctly with all kind of situations, I probably has to put weights to keywords. eg. the higher the order of a keyword, the higher is score. " hdtv" and "x264"  always +10 as we want High Definition content when available. "pdtv" -10. etc. Also some keywords should be automaticly be recorgnized to be names of release groups, but only when they are on specific locations in the filename. Because the title of a tv serie could be the same as a release group, but release groups are mostly placed at the front or somewhere at the end of the filename.

Published

22 October 2010

Tags