Groovy scripting using multiple files

Mon May 14 00:00:00 -0700 2012

Today I completed the second evening of the free Groovy course at @InterAccess given by @Frans_van_Buul, very interesting to see how Groovy works with in combination with Java code. But I want to use Groovy scripts as replacements for Bash/Batch scripts, the only thing I couldn’t find out until now, was how to work with multiple groovy files when using groovy in its scripting context. Finally found it and it couldn’t be easier!

main.groovy:

#!/usr/bin/env groovy
def sayer = new GhelloSayer()
println sayer.sayGHello()
println new Utils().gotTools()

GhelloSayer.groovy:

class GhelloSayer {
    def sayGHello() {
        return 'Ghello!'
    }
}

Utils.groovy:

def gotTools() {
    return 'Yes I haz'
}

If you look at these three groovy files, it’s very easy to see what Groovy does:

  • See if a .groovy file exists equal to the classname you want to instanciate, if so, it loads it
  • Non-main groovy scripts should have filenames that always start with a capital first-character, as if it’s a Java class source file. This way groovy will put the file in the classpath somehow and the file is loadable as if the filename is the classname and the groovy file content is the content of that class
  • This also works with packages, packages just start from the directory of the main script

Magic@work, but this finally makes it possible for me to really replace Bash/Batch scripts with Groovy.

Btw. main.groovy can ofcourse be named anything you like!

/me happy


Version Control != Dependency Management

Tue Jan 25 00:00:00 -0800 2011

This article was originally featured in the August of 2010 issue of phpArchitect.

Are you using svn:externals, git-submodules or something similar with your Version Control System (VCS) of choice for connecting third-party libraries to your PHP projects? This article explores ways to handle dependencies in PHP projects, just like projects in other programming languages have done for ages.

Dependencies

Many projects have some kind of external dependency, e.g. libraries, language extensions, system tools or other applications. All these dependencies need to be available for the project to work. In PHP, dependencies like language extensions can be checked to be loaded with calls to the “extension_loaded” or “get_loaded_extensions” functions. The availability of system tools and other application can be checked by calling the command with for example the PHP file_exists function in combination with is_executable. Availability of a required library can be checked by asking for it through the class_exists function when using auto-loading or simply require_once the library and wait for it to fail.

For language extensions, system tools and applications, their availability should be checked. These are dependencies that should not be included in the project package, because they are often platform depended and there is a fair chance that these dependencies are already available on the system running the project. The previous paragraph explained ways of checking them. These checks can be easily executed from an install/release script and let the install fail when something is wrong.

But how about library dependencies? The distributed project package should already contain the required libraries or at least fetch them automatically. Fetching libraries automatically on install sounds good because then the project package for distribution is smaller, saving storage and bandwidth. But it makes your customers, during the installation of the project, depended on the availability of the required libraries from possibly 3rd parties. Think about it. If the current version of your project is still used 5 years from now, can you guarantee that the 3rd parties still provide that specific library version? On the other hand, when the project is extendable with plugins and modules, fetching the those only when the customer chooses to use them is a fair solution. But the distributed project package should include all the default plugins and modules and all extensions should be fetched from your repository, so you have control over the availability.

svn:externals

How and when are required libraries included in your project? Many PHP projects, that use Subversion (svn) as their Version Control System (VCS) of choice, use the svn:externals functionality to bind libraries to their project. It’s an easy solution; with every svn checkout or svn update the external libraries are also checked-out and updated into the local working copy. Meaning, it’s possible to directly edit and commit on the external libraries. And svn gives much freedom:

  • Add an external pointing to the latest (HEAD) revision
  • Add an external pointing to a specific revision
  • Add an external from an external repository
  • Set the externals property anywhere in the project and including that external somewhere else in the project structure

This means that there is no central location to configure the externals. Everyone working on the project should just know where it is set or search through the svn properties of all the project directories. Dependency management becomes magic “that just works”, or does it?

Ever tried to branch or tag a project that uses svn:externals for dependency management? First branching/tagging all the externals, then updating the svn:externals to those branched/tagged externals and finally create the branch/tag for the project. That’s not workable.

Branching/tagging can be made easier by applying the convention that externals should always point to tags. It’s easier branching/tagging because the externals will then always be pointing to a stable tag version. But svn still offers commit permission on the external that is pointing to a tag. Specifying the revision that the external should point to is the safest way to work with externals, this way the external will always be exact at that revision.

So what’s wrong with using a VCS for dependency management?

  • No overview of external configuration, they can be defined and placed anywhere in the project
  • Externals must be controlled in the same type of VCS. There’s no way to add a git repository as external in svn, or a simple .tar.bz2-file
  • Committing on externals is possible

How do other programming language handle dependencies

The most common lines of an “INSTALL” file of Linux C/C++ projects are

$ ./configure
$ make
$ sudo make install

Ever tried that on a big project? Did “make” error-out the first time it ran? And the second time? Telling it missed a specific library and therefore could not continue? Dependencies! The project could not be compiled without having the correct dependencies available.

Java build tools like Apache Maven and Apache Ant with Apache Ivy went a step further. Maven defines the project in a Project Object Model (pom.xml) file, which describes the software project being built. Containing it’s dependencies on other projects/modules and external libraries, a list of the involved developers, links to the version control repository, issue tracker and more. Maven has a central repository which can be used to automatically fetch the project dependencies from, when they exist. Everyone can create their own Maven repository that can be used as a proxy to the central repository. This means the Java project can be build and all needed dependencies are automatically fetched from the central repository or any other repository that is provided in the Maven global settings or project POM configuration. Apache Ant with Apache Ivy works somewhat the same. This all leads to the fact that with a build tool like Maven, there is a single configuration file per project for defining dependencies. That’s easy and clear for everyone.

The C/C++/Java languages have the advantage to be compiled languages. If a required dependency is missing, the project won’t compile. PHP does not have this advantage. It will run any code until it fails. Therefore PHP projects need some checking for the availability of dependencies.

So, how to apply this on PHP projects?

There is nothing fully equivalent to Maven for PHP except Maven itself with a PHP plug-in. This is what the “Maven for PHP” project does. It’s probably also possible to use Apache Ant with Ivy or other build tools from other programming languages to build PHP projects with. But this has the disadvantage that the build tool is written in a different language than PHP, most probably has no support for PHP specific tools by default and wanting to extend it forces to use the language that the build tool is written in. This makes using a non-native PHP build tool a bit harder.

But there are native PHP tools that provide parts of the features that build tools like Apache Maven provides. We have Phing, a build tool based on Apache Ant, which has many tasks by default for executing filesystem related commands. Optionally it also contains tasks for many commonly used tools and release methods in the PHP world, like; DbDeploy, FtpDeploy, JsMin, PearPackage, PharPackage, PhpCodeSniffer, PHPUnit and much more. It’s also very easy to extend, just create a PHP class which extends from the Phing Task class and implement a few functions. It can be made as complex as needed, but it’s all in PHP, your language of choice. For the repository part of a Maven like native PHP tool, we have PEAR for PHP4 and 5 packages and PEAR2 for PHP 5.3.1+ packages. PEAR and PEAR2 are “a framework, packaging and distribution system for reusable PHP Components”. There is a PEAR and PEAR2 framework specific repository, “Channel” in PEAR terms, at http://pear.php.net and http://pear2.php.net respectively. PEAR2 is the next-generation version of PEAR using the new PHP 5+ language features where possible, like namespaces and supports the Phar packaging format. PEAR2 also has a new installer, for installing packages and dependencies, called Pyrus. Pyrus is an easy to use installer which can easily be used standalone without installation from the command-line or be integrated into a project. Just like with Maven, it’s possible to create your own PEAR channel as a repository for projects and dependencies.

All this completes the combined feature list of Phing, PEAR/PEAR2 and Pyrus equal to that of Maven.

One drawback is that Phing does not (yet) has a task for installing dependencies from Pear/Pear2/Pyrus, but tasks doing just that can easily be written. When you do, please contribute it to the Phing project. The same is true for Version Control Systems other than CVS and Subversion, and everything else you want support for.

To conclude

For more information on referenced tools and a direct link to the “Extending Phing” documentation of Phing see the related url’s list. I also want to hint you on the article “Going Industrial” by Stéphen Périn, which was published in the phpArchitect issue of Januari 2010, for more interesting insights and hints on what tools and practices could be used to improve you PHP development life cycle.

Requirements

  • PHP:

    • 5.0.2+ (for Phing)
    • 5.2+ (for Phar)
    • 5.3.1+ (for Pyrus)

(Download) File Organizer Tool

Fri Oct 22 00:00:00 -0700 2010

Update 2011-05-28: Looks like someone else already did a very good job creating something like the (Download) File Organizer Tool. It's called "Sick Beard" and does a bit more on some parts and a bit less on other parts.

Checkout http://sickbeard.com

You can use it standalone or in combination with SABnzbd+ (they both share a look-and-feel so even that is a plus.

So the "(Download) File Organizer Tool" project is now cancelled (was it even started ;)) My NAS download discs are getting full so I tried some disc cleaning. As I've automated my downloads I often download multiple versions of files (eg. tv series in different formats), so while cleaning out those duplicated files...{insert very boring process}... I thought of a great tool that could handle all this for me.
/me programmer == /me lazy == /me wants to automate as much as possible
I got the idea of a "(Download) File Organizer Tool", a tool that will do everything you want with your files automatically, LOL. A bit much, so I cramped it to:
  • Define one or more directories to work on (incl. setting like recursive or not, how deep to go recursive, specifically including or excluding directories and/or files)
  • Define actions to handle, where an action can be something like:
    • (for tv-series) On directory ... find all files that loosely have the same filename. Order them by mkv,x264,hdtv,avi. Remove the duplicates and keep top ordered file. Delete all files that don't match one of the order keywords. Move all files into a directory structure based on pattern ... (eg. {title}/season {season#}/{title}.s{season#}e{episode#}.{extension}). When unsure what to do, add file to "problem files"-list for manual fixing (and improving action for the next run).
  • When creating the action, it should be possible to dry-run it and see the result of what it would look like if it was executed for real
  • It should be possible to run all or specific actions at specific times, intervals or be trigger by an external tool (eg. cron job, SABnzbd+, ...)
It looks to me that all this can be handled with basic *nix commands, but maybe Groovy will be a better fit because of it's cross-platformness. The steps to take should be easy (for the above mentioned action):
  1. Define base directory to work on
  2. Get list of all files (full paths) in directory (recursively)
  3. Find duplicated filenames (case-insensitive, loosely/fuzzy compare)
  4. Order group of duplicated filenames by the given order of keywords
  5. Remove all files except the first/top one
  6. Recognize tv serie title, season, episode and extention
  7. If not exists, create directory structure based on found params
  8. Move and rename file to new directory structure
  9. Show list of files that were not processed at the end of the process as "problem files"
/me happy with the idea, now trying to find some time to build it
I've started a public project on GitHub: http://github.com/pvdissel/dfot And a Pivotaltracker project: http://www.pivotaltracker.com/projects/132698 Update (2010-10-29 @21:00) : After another major HD cleanup to get some free space. I also came to the conclusion that, to get a filtering algorithm to be able to work correctly with all kind of situations, I probably has to put weights to keywords. eg. the higher the order of a keyword, the higher is score. " hdtv" and "x264"  always +10 as we want High Definition content when available. "pdtv" -10. etc. Also some keywords should be automaticly be recorgnized to be names of release groups, but only when they are on specific locations in the filename. Because the title of a tv serie could be the same as a release group, but release groups are mostly placed at the front or somewhere at the end of the filename.