Help
Introduction
Pergola is a tool to process and convert longitudinal behavioral data into genomic data formats. This way data
becomes compatible with genomics software tools for its analysis and visualization.
The required input to Pergola consists in:
- A CSV file containing
behavioral data
- A mapping file setting the correspondence between the input and the output.
Run a job
-
Provide a sequence of behavioral events.
a. Upload a CSV file from your local disk.
b. Copy and paste the sequence of events into the text area.
-
Provide a mappings file setting the correspondence to pergola ontology terms of the fields in your input data.
a. Upload the file from your local disk.
b. Configure it using the mappings menu.
-
Set your options as desired. Otherwise, default settings are used.
-
Submit your job
-
Be patient. The execution time depends on the length of the input data.
Outputs
Pergola generates the following outputs:
-
BEHAVIORAL TRAJECTORY:
n
files containing a track of behavior corresponding to the number of different items under the field set as
"track" in mapping file.
-
FASTA FILE:
A fasta file that can be used to load a genome to render data in a genome browser.
-
CHROM SIZES:
A txt file containing the length of your experiment assigned to a chromosome label.
Some genomic tools need this file for downstream analyses.
Pergola can process data in different ways. The available options are listed below.
Data input options
-
Input file: A sequence of behavioral events in CSV format in the text area or uploaded from a file. The minimum
information required is a column providing a time stamp and a second column containing a numerical value
(see example below).
Time |
Value |
1 |
0.02 |
2 |
0.03 |
3 |
0.05 |
4 |
0.09 |
A more complete example will be our sample file:
CAGE |
StartT |
Nature |
EndT |
Value |
1 |
1335985200 |
food_sc |
1335985232 |
0.02 |
1 |
1335986151 |
food_sc |
1335986427 |
0.1 |
1 |
1335986420 |
water |
1335986451 |
0.08 |
1 |
1335986541 |
water |
1335986553 |
0.02 |
1 |
1335986832 |
water |
1335986844 |
0.02 |
1 |
1335986845 |
food_sc |
1335986947 |
0.02 |
-
Field separator: The character used to separate the columns of the field. In our sample file Tab. The other
possible characters are "," and ";".
-
Header: Header row containing column headings for each column of data in the first row of the input file.
-
Mapping file: File sets the information contained in each column of the input file. The file is format
following the
External Mapping File Format
specifications from the Gene Ontology Consortium (see below):
Left part sets the column to read in the input file while right column after ">" symbol sets to which term in
the Pergola ontology corresponds the input column. See
here
the available Pergola ontology terms.
If input file does not have header you can use the column order instead of the column header, for
instance in our example above the first column will correspond to track (the other columns will be set in
the same manner):
behavioral_file:1 |
> pergola:track |
Visualization/Processing options
NOTE: Visualization options are both important to set the behavior of the visualization and in case of downloading the
data to choose how this data will be processed.
-
Track types: Track types can be set to BED or GFF format under the Discrete interval tracks menu.
These options allow the rendering of the data in the form of discrete intervals, as an example a meal will be displayed
as a bar expanding the duration of the event. The Continuous tracks option allows to choose among BedGraph
and bigWig file formats. Both formats will be visualized as a continuous score along the behavioral trajectory such
as velocity or any statistical score derived from a longitudinal trajectory.
More info here
-
Track line: BED and GFF formats will be generated with a track line. Some genome browser use this line to
determine the default display options.
-
Relative time points: Time points from the input data are transformed relative to the first time point
inside the file that as a result becomes 0.
-
Mean by window length: All values within a time window are averaged by the window length (sum of values/window length).
-
Mean by window counts: All values within a time window are averaged by the number of samples in the window (sum of values/window counts).
-
Min time: All values within a time window are averaged by the number of samples in the window (sum of values/window counts).
-
Max time: Minimum and maximum time to process. Data can be trimmed and only data within the intervals set by these values will
be processed. It is also possible to set only one of the options (only minimum time or maximum time).
-
Intervals generation: Generates intervals from an input file containing only a time stamp for each recorded
event a situation that prevents the visualization of the data. When this options is checked, Pergola creates intervals
that correspond to tn and tn+1-1 and assigns the original value of tn to the new interval.
Access via webservice
It is possible to access programmatically this website and retrieve formatted files. We provide some example code for convenience in this Gist.
It can be run by simply typing: python pergola-webservice.py myconfig.json myoutput.zip
For JSON configuration definition, an example below. You can learn about the different available parameters by checking the '?' icons in the main page.
{
"params": {
"separator": "{TAB}",
"outformatannot": "bed",
"outformatcont": "bedGraph",
"trackline": true,
"relativecoord": true
},
"files": {
"infile": {
"filename": "input.tsv",
"content-type": "text/csv",
"content-file": "input.csv"
},
"mapfile": {
"filename": "mapfile.txt",
"content-type": "text/plain",
"content-file": "b2p.txt"
}
}
}