Help

Introduction

Pergola is a tool to process and convert longitudinal behavioral data into genomic data formats. This way data becomes compatible with genomics software tools for its analysis and visualization.

The required input to Pergola consists in:

A CSV file containing behavioral data
A mapping file setting the correspondence between the input and the output.

Run a job

Provide a sequence of behavioral events.
a. Upload a CSV file from your local disk.
b. Copy and paste the sequence of events into the text area.
Provide a mappings file setting the correspondence to pergola ontology terms of the fields in your input data.
a. Upload the file from your local disk.
b. Configure it using the mappings menu.
Set your options as desired. Otherwise, default settings are used.
Submit your job
Be patient. The execution time depends on the length of the input data.

Outputs

Pergola generates the following outputs:

BEHAVIORAL TRAJECTORY: n files containing a track of behavior corresponding to the number of different items under the field set as "track" in mapping file.
FASTA FILE: A fasta file that can be used to load a genome to render data in a genome browser.
CHROM SIZES: A txt file containing the length of your experiment assigned to a chromosome label. Some genomic tools need this file for downstream analyses.

Pergola can process data in different ways. The available options are listed below.

Data input options

Input file: A sequence of behavioral events in CSV format in the text area or uploaded from a file. The minimum information required is a column providing a time stamp and a second column containing a numerical value (see example below).

Time	Value
1	0.02
2	0.03
3	0.05
4	0.09

A more complete example will be our sample file:

CAGE	StartT	Nature	EndT	Value
1	1335985200	food_sc	1335985232	0.02
1	1335986151	food_sc	1335986427	0.1
1	1335986420	water	1335986451	0.08
1	1335986541	water	1335986553	0.02
1	1335986832	water	1335986844	0.02
1	1335986845	food_sc	1335986947	0.02

Field separator: The character used to separate the columns of the field. In our sample file Tab. The other possible characters are "," and ";".
Header: Header row containing column headings for each column of data in the first row of the input file.
Mapping file: File sets the information contained in each column of the input file. The file is format following the External Mapping File Format specifications from the Gene Ontology Consortium (see below):

! Mapping of behavioural fields into pergola ontology terms

!
!
behavioral_file:CAGE	> pergola:track
behavioral_file:StartT	> pergola:start
behavioral_file:Nature	> pergola:data_types
behavioral_file:Value	> pergola:data_value

Left part sets the column to read in the input file while right column after ">" symbol sets to which term in the Pergola ontology corresponds the input column. See here the available Pergola ontology terms.

If input file does not have header you can use the column order instead of the column header, for instance in our example above the first column will correspond to track (the other columns will be set in the same manner):

behavioral_file:1

> pergola:track

Visualization/Processing options

NOTE: Visualization options are both important to set the behavior of the visualization and in case of downloading the data to choose how this data will be processed.

Track types: Track types can be set to BED or GFF format under the Discrete interval tracks menu. These options allow the rendering of the data in the form of discrete intervals, as an example a meal will be displayed as a bar expanding the duration of the event. The Continuous tracks option allows to choose among BedGraph and bigWig file formats. Both formats will be visualized as a continuous score along the behavioral trajectory such as velocity or any statistical score derived from a longitudinal trajectory. More info here
Track line: BED and GFF formats will be generated with a track line. Some genome browser use this line to determine the default display options.
Relative time points: Time points from the input data are transformed relative to the first time point inside the file that as a result becomes 0.
Mean by window length: All values within a time window are averaged by the window length (sum of values/window length).
Mean by window counts: All values within a time window are averaged by the number of samples in the window (sum of values/window counts).
Min time: All values within a time window are averaged by the number of samples in the window (sum of values/window counts).
Max time: Minimum and maximum time to process. Data can be trimmed and only data within the intervals set by these values will be processed. It is also possible to set only one of the options (only minimum time or maximum time).
Intervals generation: Generates intervals from an input file containing only a time stamp for each recorded event a situation that prevents the visualization of the data. When this options is checked, Pergola creates intervals that correspond to t_n and t_n+1-1 and assigns the original value of t_n to the new interval.

Access via webservice

It is possible to access programmatically this website and retrieve formatted files. We provide some example code for convenience in this Gist.

It can be run by simply typing: python pergola-webservice.py myconfig.json myoutput.zip

For JSON configuration definition, an example below. You can learn about the different available parameters by checking the '?' icons in the main page.

{
	"params": {
		"separator": "{TAB}",
		"outformatannot": "bed",
		"outformatcont": "bedGraph",
		"trackline": true,
		"relativecoord": true
	},
	"files": {
		"infile": {
			"filename": "input.tsv",
			"content-type": "text/csv",
			"content-file": "input.csv"
		},
		"mapfile": {
			"filename": "mapfile.txt",
			"content-type": "text/plain",
			"content-file": "b2p.txt"
		}
	}
}