Extending the LIMPOPO MAGE-TAB Parser

Advanced Usage

The Handler concept

The Limpopo parser utilises something like a XML push streaming model - handlers exist for aspects of the MAGE-TAB document and data is pushed to them in turn.

There are 'read', 'validate' and 'conversion' handlers, and during parsing handlers are executed in this order. Read handlers parse supplied MAGE-TAB documents into the Limpopo object model and are executed first. Once the full model has been read, validate handlers can validate all or parts of this model, next. Finally, once the model has been read and validated, conversion handlers can take the Limpopo model and convert it into any other type of model objects, or simply write information about parts of the model out.

This design makes it possible to extend the set of handlers Limpopo uses by writing your own, to do custom validation, adapt MAGE-TAB into your own object model, or even provide support for parsing your own customised extensions to the MAGE-TAB format.

To create a handler, you simply need to extend the appropriate abstract handler class. For example, to create a simple handler that writes the parsed accession number to an output stream, you would need to do something like this:

@ServiceProvider
public class MyAccessionHandler extends IDFConversionHandler<OutputStream> {
    @Override
    protected boolean canConvertData(IDF data) {
        return data.accession != null && !data.accession.equals("");
    }

    @Override
    protected void convertData(IDF data, OutputStream out) throws ConversionException {
        try {
            out.write(data.accession.getBytes());
            out.flush();
        }
        catch (IOException e) {
            throw new ConversionException(e);
        }
    }
}

This is a trivial example: in this case, our new handler simply reports on the accession of the parsed MAGE-TAB file whenever you run the parser with an OutputStream output resource passed to the constructor. But you can obviously adapt this to a variety of purposes: for example, MAGE-TAB can be persisted to a database using hibernate by generating database objects here. Or, you could create handlers that wrote MAGE-TAB documents out in a different format. You could even override the readValues() method to "parse" MAGE-TAB files directly from a database, or from a dedicated format.

A couple of things to note about this class - first, the @ServiceProvider annotation. You MUST add this to all handlers - this is how Limpopo detects handlers on the classpath that can be used. There is zero configuration required apart from this - as long as you remember to add this, and include your new class on your classpath, this handler will always be triggered when a parse operation is run with the correct output type.

Secondly, note the typing - IDFConversionHandlerOutputStream. This determines the type of objects this handler converts to. The OutputStream parameter could be anything, and an object of that type will be passed to the convertData() method. Limpopo uses this type to locate appropriate handlers during parsing - so, for example, if you create a parser with an output resource of a File, this handler would not be trigger during conversion.

Finally, you should implement the canConvertData() method. This takes the appropriate object type as a parameter - in this case, our check determines whether there is an non-null, non-empty accession on this IDF, but you could do more complex checks if required. If this method returns false, nothing will be converted.

Validation

Using the default constructor, the parser will not do any validation. The document is read into memory verbatim, excluding any badly named tags or formatting errors. You could consider that the parser does a series of syntactic checks but no semantic validation.

Normally, however, you will wish to check the validity of this file, by doing checking that the references between entries are complete and correct, named protocols exist, and so on. You are free to define validation handlers and plug them into the parser as you chose, as discussed above.

    MAGETABValidator validator = new MAGETABValidator();
    MAGETABParser parser = new MAGETABParser(validator);

When you do this, any listeners you register to the parser are automatically cascaded to the validator, so you don't need to register them again. You are of course free to register specific listeners to the validator only if you wish.

Adding handlers is the easiest way to create validation functionality. However, you can also directly extend the AbstractValidator class and supply it to the parser. In this case, any validation handlers will not be used and instead the code you supply will be performed after parsing is complete.

To create a custom validator, you'll just need to extend the validate method and supply it to your parser. Your validator must be typed by MAGETABInvestigation

public class MyValidator extends AbstractValidator<MAGETABInvestigation> {
    public void validate(MAGETABInvestigation investigation) throws ValidateException {
        try {
            // do custom validation here
            ...
        }
        catch (Exception e) {
            ErrorItem error = ... // create an error item to describe what went wrong
            fireErrorItemEvent(error);
            fireValidationFailedEvent(new ProgressEvent());
            // false indicates a non-critical exception, i.e. one that won't force the validator to stop immediately
            throw new ValidateException(error, false, e);
        }
    }
}

Note that you can use ErrorItem generation code and methods to provide listener feedback from any errors that may be encountered. Take a look at the ErrorItemFactory class for ways to generate error items, and once you have an item you can supply it to the listener using the method fireErrorItemEvent(ErrorItem item) on the Validator.

Finally, if you wish to provide new error codes for your validator to use, you can make them available to the factory by providing a properties file on the classpath. This file should be put at the location "META-INF/magetab/errorcodes.properties" and should be formatted like a standard java properties file where the key is the integer value of the code and the value a message string that describes the error. Note that existing codes won't be overwritten, though.

Conversion

As for validation, conversion does not happen using the default constructor. But using exactly the same mechanism you can add conversion handlers to the classpath which will be run after parsing and validation (if any) as described above.

You can then use, for example:

    File outputFile = new File("output.txt");
    MAGETABParser<File> parser = new MAGETABParser<File>(outputFile);

to create a parser that adds object conversion. Note you must pass your "output resource" to the constructor and that this object is typed by the type parameter on the parser. When you do this, any "matched" conversion handlers (that can write to File objects) are invoked after parsing and validation is complete.