Installation Instructions for the EAGER Pipeline¶

We provide three kinds of installation instructions.

Note

We do not provide a docker image anymore, as the much more flexible Singularity method superseded the deager application and our Docker based approach.

VirtualBox¶

Warning

This should only be used for testing purposes as the image does not get updated at all and was only intended to try out the pipeline!

Note

This has some performance drawbacks due to virtualization techniques (typically ~20%).

We provide a VirtualBox based operating system image to end users that contains all the required software tools.

Download the corresponding VirtualBox Image

Unpack the image

Load the image with VirtualBox, click on File, Open, Image and select the unpacked image file

Click on Start in VirtualBox and wait a couple of seconds until you see a regular desktop environment in VirtualBox

You may run the pipeline’s two components now typing either eager or eagercli.

Two small videos illustrating the whole setup process can be found online and here.

Singularity¶

Note

This is the default way to use EAGER in a containerized environment. Best user experience, minimum performance drawbacks.

In order to use this approach, you will need a running Linux operating system at hand (e.g. ArchLinux, Ubuntu > 14.04, CentOS 7 or similar).

Warning

In theory, this should work on OSX, but due to the nature of OSX using a Virtualization technique based on VirtualBox, you could instead use the VirtualBox image on such systems, too.

First of all, install Singularity on your machine that you would like to use for the setup. To do this, follow the instructions from the authors here. There are installation instructions for OSX and Windows, too - but these will have some performance drawbacks. Once you have a working singularity installation, there is just three commands you will need to run for getting EAGER to work:

First of all, download the pipeline at a location where you want to run your analysis, e.g. /home/<username>/Downloads. Switch to that directory and type this in the commandline:

singularity pull shub://apeltzer/EAGER-GUI:master

Running the GUI¶

Now we can run the GUI for

singularity exec -B /path/to/your/data:/data /home/<username>/Downloads/apeltzer-EAGER-GUI-master.img eager
#/path/to/your/data = Path where you store RAW sequencing data, a reference genome in FastA format and the folder where you store your results in the end.
#/home/<username>/Downloads/apeltzer-EAGER-GUI-master.img is the name of the previously downloaded image file.

This will open the EAGER graphical user interface (GUI), that is required for configuring the pipeline. Make sure to remember this path, as you will need it for the pipeline execution later on. Within the GUI, you can find your data in /data. You can navigate there when opening input files, the reference genome or the results and should also not select any folders or files in other directories.

Note

The path/to/your/data can be any path accessible from your workstation, so for example a departments data storage in the network would work, too.

Warning

Please make sure, that you have a following :/data after entering the path to your data storage. Otherwise, you will not be able to run a configuration.

After you are done with configuring your data, please close the graphical user interface.

Running the Analysis¶

You can now run the actual analysis procedure with eagercli by issuing the following command.

singularity exec -B /path/to/your/data:/data  /home/<username>/Downloads/apeltzer-EAGER-GUI-master.img eagercli /data
#again, keep the same path to your data and specify the ".img" path as before.

This will run the analysis procedure on your machine using the eagercli application inside the container.

Note

The results will be stored in the folder you selected in the configuration procedure. A good practice would be to have a separate folder inside your path/to/your/data just for this purpose.

Reproducibility¶

An important feature of this Singularity based approach is, that you can rerun both configuration and analysis whenever you want it. Simply keep the downloaded “pulled” image file with your whole analysis and you can reproduce everything in the future. For your convenience, we even created a small script that can be used e.g. for a publication to state which versions of each tool were used to produce a result (!). You can see these by running

singularity exec -B /path/to/your/data:/data /home/<username>/Downloads/apeltzer-EAGER-GUI-master.img eagerVersions utilized_versions.txt

This will produce a textfile, containing information of the used tools within the selected image that were used to produce a result. Version tags of all the tools are then available in that specific textfile, too.

Manual Installation¶

Note

This is the native installation of the EAGER pipeline. It requires you to download tools manually, compile them and set paths accordingly in order for the pipeline to work on your operating system.

The manual installation on an infrastructure without access to a docker container is a bit more complex than installing the docker image, as all the requirements and subsequent tools for EAGER need to be linked correctly on the system running the pipeline in the end. This has certain requirements:

Java 8 Environment, preferably the Oracle JDK8

GNU Bash

After this, the following tools need to be installed by the user, ideally system wide or (if this is not possible due to access rights), by manually compiling them. In parentheses you can find the version(s) EAGER has been tested with.

Note

The EAGER-GUI, EAGER-CLI and all other components developed within the EAGER pipeline can be downloaded from Bintray as pre-compiled JAR files. You don’t need to re-compile these applications manually. In case you prefer to, please use IntelliJ IDE to do so.

List of Tools tested with EAGER:

ANGSD(v0.910)

AdapterRemoval (v2.2.1)

BAM2TDF(v14)

BGZip (depending on your linux distribution, you have this already installed)

Bowtie 2(v2+

BWA (v0.7.15+)

CircularMapper(latest)

Clip & Merge(latest)

Schmutzi (latest)

DeDup (latest)

EAGER (latest)

EAGER-CLI (latest)

FastX-Tools (v0.0.13)

FastQC (v0.11.4)

GATK (v3.7+)

LibraryComplexityPlotter (latest)

mapDamage (v2.0+)

MTNucRatioCalculator (latest)

Picard-Tools (v2+)

Preseq (v2.0+)

QualiMap (v2.3)

ReportTable (latest)

Samtools (v1.4.0+)

Stampy (current)

Tabix (v1.3.0)

VCF2Genome (latest)

In order to make installation more easy, I provide installation files for linking the tools correctly. You will have to adjust in each file (open with a text editor) the correct location to the executables. Once you’ve done this and installed all the tools required for EAGER, you can simply add the location of these scripts to your path, e.g.

PATH=/data/eager-links/:$PATH

This will add links to the respective tools in order to allow EAGER to find the corresponding tools. If you for example already have working installations of BWA, samtools or similar, you will only need to install the missing tools of course. Please make sure, that you have the proper versions of the tools installed that EAGER needs or otherwise you might have to define these in your path as well.

Now you can check by e.g. entering eager whether you get a message about running EAGER. If you set EAGER up on a cluster infrastructure, you may need to have X11 forwarding enabled there to run the pipeline. For windows clients, there is a howto available here. For Linux client machines, you’d probably only have to run:

ssh you@yourheadnode.yourcluster -Y

If you are uncertain on how to run X11 forwarded applications on your local infrastructure, your IT department should be able to set this up for you or will help you in achieving this.