FlyBase Archive server: last updated November 2004
FlyBase .. Aberrations .. Anatomy .. BLAST .. Genes .. Annotation/Sequences .. Gene Products .. Maps .. People .. References .. Stocks .. Transgenes/Transposons .|. Help .. Searches .. News .. Site

Drosophila melanogaster genome annotation release 3.2.0

date 16 March 2004

Annotated genes and genome data, from the FlyBase Chado Annotation Database are available thru searches, maps and reports at Genome Annotations & Sequences

Notes


Drosophila melanogaster genome annotation release 3.2.0 date 03162004

DATA CONTENTS

  Feature counts in release 3.2 (r320, March 2004) 
  compared to release 3.1 (Dec 2003, r310d and Spring 2003, r310g)
  
  Feature                            r320      r310d     r310g
  -----------------------------------------------------------------
  BAC                                  949       949        --
  CDS                                18746     18109     18122
  DNA_motif                              5         0         0
  EST                               304257    302509        --
  aberration_junction                   87         0         0
  cDNA_clone                         10204     10197        --
  enhancer                              27         0         0
  gene                               13473     13369     13377
  insertion_site                       424         0         0
  mRNA                               18810     18153     18122
  mature_peptide                         8         0         0
  ncRNA                                 65        95        60
  oligonucleotide                   193813    193168        --
  point_mutation                       476         0         0
  polyA_site                           101         0         0
  processed_transcript               16748     14677        --
  protein                           233812    211135        --
  protein_binding_site                  85         0         0
  pseudogene                            39        17        19
  rRNA                                  85         0         0
  region                                28         0         0
  regulatory_region                    136         0         0
  repeat_region                       3390      3021        --
  rescue_fragment                      135         0         0
  segment                              437       437       437
  sequence_variant                     225         0         0
  signal_peptide                         1         0         0
  snRNA                                 28        28        28
  snoRNA                                28        28        28
  so                                 16244     14334         0
  tRNA                                 288       288       288
  transcription_start_site           16997     16832        --
  transposable_element                1567      1571      1508
  transposable_element_inserti..      4566      4346        --
  -----------------------------------------------------------------
          -- data unavailable for this feature


Data are taken from Postgres Chado database, release 3.2.0 date 03162004 
Copy at ftp://flybase.net/genomes/Drosophila_melanogaster/
dmel_r3.2.0_03162004/pgsql/chado_r3.2_19.gz, Mar 16 2004


WEB FUNCTIONS

Updates to data, with some software changes, for

-- Gene annotation reports - updated and extended symbols, synonyms, IDs,
   annotation notes.  Other Features added. 
  
-- Genome maps (gbrowse) - added new feature types

-- Sequence reports -- new features mat/signal peptides, etc.

See http://flybase.net/annot/


SYMBOLS and IDS

Symbols and IDs for annotations in this release have been updated
to close correspondence with gene data.

The transcript and translation/CDS symbols and IDs for FlyBase
have changed some over last year.  An annotation has an ID
of CG00000 (with a corresponding FBan00000 which is being de-emphasized),
Its mRNA and CDS have -Rx and -Px suffixes respectively, where letter
'x' extends to as many variants as found.

In the release 3.2, the standard symbols for gene annotation CG00000
have been replaced with accepted gene name (where available), thus
CG8094, CG8094-RA, CG8094-PA become gene 'Hex-C', Hex-C-RA, Hex-C-PA. 
The CG8094 ID is supported as a more computable alternative to this
symbolic name, but will be less visible than the more consistant and
memorable gene names There is still some quandry in data files about
when to use 'Hex-C-PA' or CG8094-PA.



BULK FILE SET

See ftp://flybase.net/genomes/Drosophila_melanogaster/current/

blast/   - updated NCBI blast database set for transcripts, translations and transposons
dna/     - contains dna in fasta and/or raw format files per chromosome-arm; no change from release 3 data.

fasta/   - dna and protein data per chromosome and feature type
feats-all/ - intermediate files of all feature locations in tabular format
gff/     - GFF v2 standard feature files per chromosome
gnomap/  - Gnomap standard feature files per chromosome (drive genome map views)

pgsql/   - Postgres Chado database dump, source of most of these files
srs/     - SRS search indices

fbobs/     - Acode format annotation object data files for web services
xml-chado/ - Chado format XML database output of genes, dna and other features, per scaffold
xml-game/  - GAME format XML database output of genes, dna and other features, per scaffold


Bulk files compared to those of release 3.1:

  whole_genome_*    -- create by catenating each chr file set
  heterochromatin_* and (2h,3h,Xh,Yh,U) -- 'heterosomes' to be added
  euchromatin_*     -- create by catenating each chr file set, excluding 'heterosomes'  
  per chromosome set
    2L_3_UTR, 2L_5_UTR         == dmel_2L_three_prime_UTR, dmel_2L_five_prime_UTR
    2L_CDS                     == dmel_2L_CDS
    2L_annotation              == catenate dmel_2L_gene with (tRNA,miscRNA,transposon) set 
    2L_annotation_extend5000   == dmel_2L_gene_extended5000, minus (tRNA,miscRNA,transposon) set
    2L_annotation_extend2000   .. not planned
    2L_annotation_extend500    .. not planned
    2L_exon                    .. not planned
    2L_genomic                 == dmel_2L_chromosome (chromosome arm dna, same as rel3.1)
    2L_genomic_scaffolds       == dmel_2L_scaffolds  (segment dna, same as rel3.1)
    2L_intron                  .. not planned
    2L_masked_genomic          .. not planned
    2L_noncoding-gene          == catenate (tRNA,miscRNA,transposon,pseudogene) 
    2L_protein-coding-gene     == dmel_2L_gene
    2L_splice_site             .. not planned
    2L_tRNA                    == dmel_2L_tRNA
    2L_transcript              == dmel_2L_transcript   
    2L_translation             == dmel_2L_translation (curated translations)
    2L_transposable_element    == dmel_2L_transposon 
    2L_unique_intergenic       .. not planned  
    2L_unique_intron           .. not planned
  
  Not in past release: dmel_2L_miscRNA  dmel_2L_pseudogene

  File name format:
    $org_$chr_$feature_$release.$format
  
  $org in (dmel)
  $chr in (2L 2R 3L 3R X 4), (2h 3h Xh Yh U)
  $feature in (
    gene, mRNA, CDS, CDS-translation, 
    transposon/transposable_element, pseudogene,
    tRNA, miscRNA=ncRNA,snRNA,snoRNA,rRNA
    gene-extended5000 
    chromosome-arm
    scaffold
  )
  
  $release in (
   r3.1.0g (gadfly, summer 2003 )
   r3.1.0d (chado r3.1.0_12182003)
   r3.2.0a (chado r3.2.0_12052003)
   r3.2.0c (chado r3.2.0_03162004)
  )
  
  $format in (
    .fasta(.gz)
    .gff(.gz)
    .chado.xml(.gz)
    .game.xml(.gz)
  )



ANNOTATION RELEASE 3.1 HOLD-OVERS

ftp://flybase.net/genomes/Drosophila_melanogaster/dmel_RELEASE3-1/
Annotations_and_Evidence/  GFF/                       blastdb/
FASTA/                     README

Annotations_and_Evidence/
------
>>> euchromatic scaffolds, updated in r3.2 release
AE002603.xml.gz
..
AE003847.xml.gz

>>> heterochromatin and centromere scaffolds - no r3.2 equivalent yet
AABU01000058.xml.gz
..
AABU01002775.xml.gz
2L_wgs3_centromere_extension.xml.gz
2R_wgs3_centromere_extension.xml.gz
3L_wgs3_centromere_extension.xml.gz
3R_wgs3_centromere_extension.xml.gz
X_wgs3_centromere_extensionB.xml.gz
linked_1.xml.gz
linked_2.xml.gz
linked_3.xml.gz
linked_4.xml.gz
linked_5.xml.gz
linked_6.xml.gz
linked_7.xml.gz

FASTA 
-------------
Heterochromatin sections are not yet available for r3.2
  2h, 3H, Xh, Yh, U (heterochromatin, unclassified)

Block dna (fasta) sections are identical for r3.2
   scaffolds, genomic, masked_genomic  



CHADO DATABASE LOOKUP SERVICE

SERVICE URL 
  http://flybase.net/apollo-cgi/chado2apollo.cgi

Information and software at
  http://bugbane.bio.indiana.edu:7092/apollo/  

EXAMPLES
  http://flybase.net/apollo-cgi/chado2apollo.cgi?scaffold=AE003650
  http://flybase.net/apollo-cgi/chado2apollo.cgi?gene=cact
  http://flybase.net/apollo-cgi/chado2apollo.cgi?range=2L:300000-310000
  http://flybase.net/apollo-cgi/chado2apollo.cgi?band=34A

This provides support for Apollo genome browser/editor,
returning GAME XML gene and genome objects in response
to basic queries of 'gene' name/ID, 'scaffold' section,
genome base 'range' or cytological 'band'.

It currently works well for scaffold chunks of data, using pre-generated
XML.  But it is very slow (5 - 10 minutes) at generating XML, for the
gene region queries.  The default operation now returns pre-generated
scaffolds to any query.  We will work to improve this.



Send comments to us at flybase-help AT morgan.harvard.edu
FlyBase-NG uses Argos: A Replicable Genome infOrmation System