Below is some sample code illustrating some of the main features you might want to use when working with the LIMPOPO MAGE-TAB Parser. For more discussion on these features see Extensions.
// make a new parser parser = new MAGETABParser(); // add an error item listener to the parser, this one just reports parsing errors as stdout parser.addErrorItemListener(new ErrorItemListener() { public void errorOccurred(ErrorItem item) { // locate the error code from the enum, to check the generic message ErrorCode code = null; for (ErrorCode ec : ErrorCode.values()) { if (item.getErrorNum() == ec.getIntegerValue()) { code = ec; break; } } if (code != null) { // this just dumps out some info about the type of error System.out.println("Listener reported error..."); System.out.println("\tError Code: " + item.getErrorNum() + " [" + code.getErrorMessage() + "]"); System.out.println("\tError message: " + item.getMesg()); System.out.println("\tCaller: " + item.getCaller()); } } }); // now, parse from a file File idfFile = new File("/path/to/MAGE-TAB/documents/file.idf.txt"); // print some stdout info System.out.println("Parsing " + idfFile.getAbsolutePath() + "..."); // do parse MAGETABInvestigation investigation = parser.parse(idfFile);
The above is the most basic form of parsing you'll want to do. Whilst it is not compulsory to register an error item listener, you should always do so as without this any errors encountered during parsing will be ignored. The parser may complete without throwing an exception, but this does not mean that the full MAGE-TAB document could be correctly parsed - some lines may have been skipped or ignored.
You can also parse only IDF or SDRF files, in exactly the same way:
// make a new parser parser = new IDFParser(); // add an error item listener to the parser, that just reports parsing errors as stdout parser.addErrorItemListener(new ErrorItemListener() { public void errorOccurred(ErrorItem item) { // locate the error code from the enum, to check the generic message ErrorCode code = null; for (ErrorCode ec : ErrorCode.values()) { if (item.getErrorNum() == ec.getIntegerValue()) { code = ec; break; } } if (code != null) { // this just dumps out some info about the type of error System.out.println("Listener reported error..."); System.out.println("\tError Code: " + item.getErrorNum() + " [" + code.getErrorMessage() + "]"); System.out.println("\tError message: " + item.getMesg()); System.out.println("\tCaller: " + item.getCaller()); } } }); // now, parse from a file File idfFile = new File("/path/to/MAGE-TAB/documents/file.idf.txt"); // print some stdout info System.out.println("Parsing " + idfFile.getAbsolutePath() + "..."); // do parse IDF idf = parser.parse(idfFile);
It is also possible to parse from a URL simply by using:
URL idfURL = new URL("http:///path/to/MAGE-TAB/documents/file.idf.txt"); MAGETABInvestigation investigation = parser.parse(idfURL);
The other option is to parse from an input stream, although this is slightly more complicated because to parse an entire MAGE-TAB document set requires resolving links between IDF and SDRF files. This means you must set the location of the IDF prior to parsing, in order to make sure the MAGETABParser can determine the location of the SDRF. So, for example
// get the URL of our idf to parse URL idfURL = new URL("http:///path/to/MAGE-TAB/documents/file.idf.txt"); // create our investigation so we can set the IDF location MAGETABInvestigation investigation = new MAGETABInvestigation(); investigation.IDF.setLocation(idfURL); // and parse a stream from the URL, into our investigation object parser.parse(idfURL.openStream(), investigation);
By default, the parser works in a serial mode. However, it is also possible to parse MAGE-TAB documents in parallel. Parallel parsing won't tend to result in significant speedups, as most MAGE-TAB documents parse very quickly anyway. However, for very large files, or for cases where you have extended the parser to enable datafile parsing or other custom parsing operations that may be slow to run, running in parallel may be very desirable. Enable parallel parsing by passing an ExecutorService as a parameter when parsing: the parser will use the supplied service to execute all parsing tasks. Note that parsing in parallel requires that you pass an input stream containing the data to parse, and therefore also that you set the location as described above.
// create an executor service that uses 16 parallel threads ExecutorService service = Executors.newFixedThreadPool(16); // get the URL of our idf to parse URL idfURL = new URL("http:///path/to/MAGE-TAB/documents/file.idf.txt"); // create our investigation so we can set the IDF location MAGETABInvestigation investigation = new MAGETABInvestigation(); investigation.IDF.setLocation(idfURL); // and parse a stream from the URL, using our new service parser.parse(idfURL.openStream(), investigation, service);
When you invoke the parser in a parallel mode, the parse() method is asynchronous and will not block until parsing is complete. Instead it will return once all parsing tasks have been submitted to the service. If you are using this form of the method you should always register a progress listener to be informed when parsing is complete:
// create progress listener ProgressListener pl = new ProgressListenerAdapter() { public void parsingStarted(ProgressEvent evt) { System.out.println("Started parsing..."); } public void parsingCompleted(ProgressEvent evt) { System.out.println("Finished parsing!); } }; // set listener on parser parser.addProgressListener(pl); // and parse parser.parse(idfURL.openStream(), investigation);
Full progress listener support is enabled if you want to listen for the precise completion time of the parsing. However it is safe, and usually easier, to shutdown the executor service as soon as the parse() method returns as at this point all new tasks will have been submitted (although will not necessarily have finished executing, so do a graceful shutdown).
You can add more advanced features, like validation and data conversion. Validation exists to validate the parsed MAGE-TAB object model, and conversion can be used to convert the parsed MAGE-TAB document from the limpopo object model (a MAGETABInvestigation) into any other set of objects (or simply print information to standard out). You should read the section on extending the parser to see how to add new validation and conversion functionality - you have to extend the parser by adding new handlers.
To create a validating parser you would do:
MAGETABValidator validator = new MAGETABValidator(); MAGETABParser parser = new MAGETABParser(validator);
To create a converting parser (without validation), you need to specify the object you want to convert to. In the example below, this is a file but it could be any type of object, depending on what conversion handlers are available - see Extensions for more on this.
// create a file that we're going to use to convert information about our MAGETABInvestigation to - e.g. we might // just want to report some stats File f = new File(output.txt); MAGETABParser<File> parser = new MAGETABParser<File>(converter, f); // note the typing by the object being produced
And finally, to create a validating, converting parser you would use:
File f = new File(output.txt); MAGETABValidator validator = new MAGETABValidator(); MAGETABParser<File> parser = new MAGETABParser<File>(validator, f);