JaponicusDB welcomes submissions of published large-scale phenotype data sets. We have devised a tab-delimited text file format for bulk phenotype data. A similar format is used for the downloadable file of single-allele phenotype data (with one more column at the start of each line to identify JaponicusDB as the source; note that, because Database is column 1 in the downloadable file, column numbers differ by 1 between the download and upload formats).
Include a header line that labels the columns – use the entry in the Contents column below as the column header text.
1 |
Gene systematic ID |
SPBC11B10.09 |
Yes |
No |
2 |
FYPO ID |
FYPO:0000001 |
Yes |
No |
3 |
Allele description |
G146D |
Yes |
No |
4 |
Expression |
overexpression |
Yes |
No |
5 |
Parental strain |
975 h+ |
Yes |
No |
6 |
Background strain name |
SP286 |
No |
No |
7 |
Background genotype description |
h+ ura4-D18 leu1-32 ade6-M210 |
No |
No |
8 |
Gene name |
cdc2 |
No |
No |
9 |
Allele name |
cdc2-1w |
No |
No |
10 |
Allele synonym |
wee2-1 |
No |
Yes |
11 |
Allele type |
amino acid mutation |
Yes |
No |
12 |
Evidence |
ECO:0000336 |
Yes |
No |
13 |
Condition |
at high temperature |
Yes |
Yes |
14 |
Penetrance |
85% |
No |
No |
15 |
Severity |
medium |
No |
No |
16 |
Extension |
assayed_using(PomBase:SPBC582.03) |
No |
Yes |
17 |
Reference |
PMID:23697806 |
Yes |
No |
18 |
taxon |
taxon:4897 |
Yes |
No |
19 |
Date |
2012-01-01 |
Yes |
No |
20 |
Ploidy |
homozygous diploid |
No |
No |
Notes:
Please include all 19 columns. If you have nothing to put in one of the non-mandatory columns, include the header and leave the column blank in the rest of the rows.
- Include the systematic ID for each gene. You can look up systematic IDs on gene pages, or refer to the file of all gene names from the dataset download page.
- For help finding suitable ontology (FYPO) terms to describe your phenotypes, see the FYPO summary page and the FAQ on browsing FYPO. If you can’t find a term you need, email the helpdesk for assistance; we can add new FYPO terms as needed.
- The allele description specifically describes the change; see table below.
- In the Expression column, use one of these values: ‘overexpression’, ‘knockdown’, ‘endogenous’, ‘null’, ‘not specified’. Deletions should always have ‘null’ expression.
- The Parental strain column is for the parental strain designation, such as 972 h-, 975 h+, etc. This column must be filled in, but you can use “unknown” if you don’t know the original background.
- Use the Strain name (background) column for a lab’s in-house name/ID/designation for the background strain (i.e. the derivative of the parental strain that has selectable marker alleles etc.). The description in the Genotype column should match this background strain.
- The Genotype description column is for alleles in the background, such as selectable markers; these details are optional. To avoid redundancy, do not repeat the allele of interest (from column 3 or 9) in the genotype column.
- Gene names are optional. If you include them, use standard names in column 8 (see gene pages or the file of All Gene Names from the dataset download page).
- Allele names are optional. If you include them, use column 9 for the preferred allele name, and put any alternative names in column 10.
- See note 9 above. Separate multiple entries with pipes (|).
- Allowed allele types, example descriptions, etc. are shown in the table below.
- For the Evidence column, we use a small selection from the Evidence Ontology (ECO). You are welcome to enquire with us in advance to find out which ECO terms/IDs fit your experiments, but we can accept files with brief descriptions (such as those in the Canto phenotype evidence option list), which curators will convert to ECO IDs.
- Similarly, Conditions use a small ontology maintained in-house by PomBase curators, and we can either advise you about which terms/IDs to use, or convert from text to IDs when we receive your file. Use multiple entries in cases where more than one condition detail applies at the same time (e.g. high temperature, minimal medium). Separate multiple entries with commas (,). Use separate lines if a phenotype is observed under more than one set of conditions (e.g. high and low temperature).
- Penetrance describes the proportion of a population that shows a cell-level phenotype. Use decimals, percents, or “high” (above 80%), “medium” (20-80%), or “low” (less than 20%). We will convert to suitable IDs for loading. Penetrance data will be displayed as annotation extensions on gene pages.
- Severity (formerly designated “expressivity) uses”high” (synonym: strong), “medium”, or “low” (synonym: weak). We will convert to suitable IDs for loading. Severity data will be displayed as annotation extensions on gene pages.
- The Extension column can be used to record when a mutation in one gene affects another gene or its product. For example, if a mutation in gene A decreases its ability to phosphorylate protein B, you can use the phenotype “decreased protein kinase activity” and put the ID for gene B in an extension. Multiple extensions can be included for a phenotype annotation. Separate extensions with a comma (,) if they combine to form a “compound” extension (two or more genes assayed together), or with a pipe (|) if they are independent. Most phenotype extensions will be independent and pipe-separated.
- The Reference column has the publication’s PubMed ID (PMID).
- The taxon will usually be 4897 (the NCBI taxon ID for Schizosaccharomyces pombe), although if you have an NCBI taxon ID for a specific S. japonicus strain you are welcome to use it
- The date is the date on which the annotations are created; you may use the paper publication date or the date on which you prepare your data file. Format: YYYY-MM-DD
- We can currently capture only haploid and homozygous diploid datasets via PHAF files. Allowed values for this column are “haploid” and “homozygous diploid”. If the column is empty the dataset is assumed to be haploid. If you have a phenotype dataset for non-homozygous diploids please contact the Helpdesk
Details for allele types and descriptions:
General note: Nucleotide and amino acid positions should reflect the current sequence data in JaponicusDB.
For protein-coding genes, number nucleotide residues from 1 starting with the A of the initiator ATG.
For histones, amino acid residue numbering assumes that the initiator methionine is removed.
amino acid mutation |
S123A |
use one-letter code; if more than one change, separate with comma(s) |
deletion |
deletion |
use this description for complete deletions |
nucleotide mutation |
C123A |
if more than one change, separate with comma(s) |
disruption |
pab1::ura4+ |
expression will usually, but not always, be null |
other |
RGTPI inserted after I254 |
include a brief text description |
partial amino acid deletion |
1-100 or A123* |
indicate deleted residues; use comma-separated ranges for discontinuous deleted segments; use * for nonsense mutations. |
partial nucleotide deletion |
500-800 |
indicate deleted residues; use comma-separated ranges for discontinuous deleted segments |
unknown |
unknown |
an allele name is required if the type and description are unknown |
wild type |
wild type |
use with altered expression (overexpression or knockdown) for single-allele phenotypes |
Please contact the JaponicusDB curators if you have any questions about what to use for Evidence, Conditions, etc., or anything else you need to represent your data in this format.
Return to the Fission Yeast Phenotype Ontology page