Example 1: Mouse BAC M_BA0093H02

(Data from Washington University Genome Sequencing Center)

We ran EULER on this collection of files:

Input Files
DNA Reads Fasta File
Quality Values File
Mate-pair Naming Rules File
EULER Controls File

Like other assemblers, EULER produces a contig assembly. (Actually, several assemblies at various stages, as explained below.) In addition, EULER also produces the EULER report and repeat graphs. All the input and output files are indexed in the file index.html. You may examine the files individually from there; download a gzipped tar file of the files important to most users; or a gzipped tar file of all files, including those only important to developers.

EULER outputs 3 types of graphs in the format of the graph drawing software Graphviz.

  1. EULER repeat graph.
    EULER-ET generates the EULER repeat graph based on the input reads themselves. The files describing this graph are
    M_BA0093H02.RAW.fasta.screen.fil.edge
    M_BA0093H02.RAW.fasta.screen.fil.graph.
    The first file describes the sequences of each edge in the graph and the second describes the topology of the graph. The contigs derived from this repeat graph are output to
    M_BA0093H02.RAW.fasta.screen.fil.contig.
    The connected components of the repeat graph are output to separate graphviz files:
    M_BA0093H02.RAW.fasta.screen.fil_et_comp1.gvz
    M_BA0093H02.RAW.fasta.screen.fil_et_comp3.gvz
    M_BA0093H02.RAW.fasta.screen.fil_et_comp4.gvz
    (Component 2 is the symmetric complement of component 1, so its graph is not output.)

    The graph of the first component can be downloaded here in PDF format.

    Each edge in the graph is labeled with two numbers. The first number indicates the index of this edge, which can be used to retrieve the its sequence from the corresponding edge file

    M_BA0093H02.RAW.fasta.screen.fil.edge.

    The second number shows the length of the sequence of this edge. For example, the middle edge in this component is labeled 62(3221), which means this edge corresponds to the sequence "edge62" in the edge file and its length is 3221.

    Each vertex in the graph represent a unique 20mer, and the number labeling it corresponds its index in the topology file. The graph topology file is currently used by the developers only and normal users may omit it.

    EULER-DB may simplify the repeat graph and outputs a second set of graphviz graphs with the same format. The names of these files are

    M_BA0093H02.RAW.fasta.screen.fil_db_comp1.gvz
    M_BA0093H02.RAW.fasta.screen.fil_db_comp3.gvz
    (In this example, components 2 and 4 are the symmetric complements of components 1 and 3, respectively, so the graphs are not output.)

    The corresponding graph description files are

    M_BA0093H02.RAW.fasta.screen.fil.mate.edge
    M_BA0093H02.RAW.fasta.screen.fil.mate.graph
    M_BA0093H02.RAW.fasta.screen.fil.mate.contig.

  2. EULER-Connect
    EULER-Connect connects some of the contigs in
    M_BA0093H02.RAW.fasta.screen.fil.mate.con
    generated by EULER-DB plus EULER-Consensus. They are connected into longer ones by incorporating the unreliable regions in the reads. The resulting contigs are output to the file
    M_BA0093H02.RAW.fasta.screen.fil.mate.con.connt.
    It also outputs the connections between the input contigs in a graphviz format file
    M_BA0093H02.RAW.fasta.screen.fil.mate.con_connt.gvz
    so that the user may recognize the suspicious links manually.

    This graph can be downloaded here in PDF format.

    In this graph, each vertex represents an input contig. For example, vertex 2 means "contig2" in the input contig file. Each edge represents a potential connection between two contigs and the two numbers labeling them (e.g., 42(3)) represent the interval length and the number of reads supporting such connection, respectively.
     

  3. EULER-SF
    EULER-SF outputs the potential connection between the input contigs by incorporating mate-pairs.

    In the current EULER pipeline, EULER-SF is run twice, with different inputs.

    In EULER-SF pass 1, the input are contigs from EULER-DB:

    M_BA0093H02.RAW.fasta.screen.fil.mate.con
    This generates result files in graphviz format:
    M_BA0093H02.RAW.fasta.screen.fil.mate.con_sf_comp1.gvz
    through
    M_BA0093H02.RAW.fasta.screen.fil.mate.con_sf_comp4.gvz
    The graph of the first component can be downloaded here in PDF format.

    The labels in this graph are similar to those in the graph output by EULER-Connect.

    In EULER-SF pass 2, the input contigs are in the contig file generated by EULER-Connect:

    M_BA0093H02.RAW.fasta.screen.fil.mate.con.connt
    The resulting graphs are named
    M_BA0093H02.RAW.fasta.screen.fil.mate.con.connt_sf_comp1.gvz
    M_BA0093H02.RAW.fasta.screen.fil.mate.con.connt_sf_comp2.gvz

Example 2: Mouse BAC M_BA0294I17

(Data from Washington University Genome Sequencing Center)

We ran EULER on this collection of files:

Input Files
DNA Reads Fasta File
Quality Values File
Mate-pair Naming Rules File
EULER Controls File

In addition to a contig assembly, which is also produced by other assemblers, EULER produces the EULER report and repeat graphs. All the input and output files are indexed in the file index.html. You may examine the files individually from there; download a gzipped tar file of the files important to most users; or a gzipped tar file of all files, including those only important to developers.

In this project, EULER resolved all the repeats, so each component in the resulting graph consists of a single edge.