FlyBase Archive server: last updated November 2004
FlyBase .. Aberrations .. Anatomy .. BLAST .. Genes .. Annotation/Sequences .. Gene Products .. Maps .. People .. References .. Stocks .. Transgenes/Transposons .|. Help .. Searches .. News .. Site

FlyBase Next Generation

As part of the integration of FlyBase services, the Gadfly annotation database and Berkeley Fly BLAST are being retired. Equivalent services are now available from the FlyBase server at Indiana University. This is now the primary public server at http://flybase.net/

We welcome your input. Please e-mail flybase-help AT morgan.harvard.edu with suggestions or questions.

FlyBase-NG includes more changes 'under the hood' than you see on its public web pages. These are part of an evolution to a next generation of genome databases and information systems, and will be ongoing for some time. As FlyBase moves into its second decade, we want to ensure that the best of new genome database methods from the collective wisdom of bioinformatics are used, without sacrificing the best parts that have taken the work of a decade to match to needs of Drosophila, genomics and biosciences research communities.


New database under construction

FlyBase got its name in part from SyBase: commercial relational database software used since early days in the project. We are moving to a more completely publicly copyable and usable database, which is shared with other genome database projects and which can be used without commercial software.

This new database is named Chado (after "the Way of Tea", a Japanese tea ceremony, a pleasant name to be home to all those colorful Drosophila gene names). It includes a new design or schema for structuring Drosophila or any other genome information, which is still being worked out. It works with new database software, the freely usable PostgreSQL system (we hope also with other database software). It includes new data exchange format (Chado XML). And significantly, this includes a much larger group of bioinformaticians sharing the efforts of developing and using these parts: Find Chado database parts are all available in the Generic Model Organism Database group with web site http://www.GMOD.org/ .

Over the coming year more of Drosophila and other genome data will be moving to a home in Chado, and more options for database searching and data mining will become available.

Current FlyBase services now and for a while to come straddle older and newer methods, and it will be in steps as we test and work on this that the best of these methods come into operation.


Genome annotation updates

The initial work with Chado database has been a migration of genome annotation data, from Gadfly database (MySQL based, with GAME XML data exchange format). At the close of December 2003, we have provided the first public use of Chado data in an update of these annotations. This data set has the same genome sequence and annotation locations as the last GadFly release 3.1, but it contains updated IDs and gene names. This now forms the annotation data searching and reporting at FlyBase.

Some statistics comparing this Chado release 3.1 (called r3.1.0_12182003) with prior Gadfly database release 3.1, as well as the incipient update release 3.2 are listed at http://flybase.net/annot/prerelease/

Overview of genome annotation web updates

Contents Current Retired
Main page /annot/ http://www.fruitfly.org/annot/
  generally equivalent options; NG main page is clean and fully functional.
Basic Query Form NG Query GadFly Query
  equivalent with some additions, deletions; NG is fully functional. NG Query form moves some options for output format to following Result lists. Search results should be similar, but some differences reflect newer IDs, symbols, and related annotation data at NG. NG Batch query by symbol is IN PROGRESS (still; see batch query by ID at /annot/)
Result lists NG Results: Arm X GadFly Results: Arm X
  roughly equivalent with some additions, deletions; NG is in good shape here, but needs: Refine query redirect/link to FBgn and other data classes Gene Ontology terms have been dropped by design; NG includes batch download for results as both reports and sequences. ( GenBank Scaffold and CG ID columns are now added. )
Basic Report NG report: Fas2 GadFly report: Fas2
  equivalent with some additions, deletions; NG is in good shape. including now correct mRNA lengths (intronless). Sequence feature coloring and hyperlinks in sequence reports (fasta, genbank, embl formats) are now available.
NG protein analyses (InterPro) work around is hyperlink to current gadfly for these; data reports to be added in 2004.
Genome maps NG GBrowse GadFly GBrowse
  equivalent with some additions, deletions; NG is good now; Major addition in NG are cytologically mapped features (aberrations); other GMOD Gbrowse options. Loss in NG is search map by symbols (to be added; use main flybase gene/annot search for now);
Fly BLAST NG BLAST BDGP BLAST
  equivalent data sets. Current Fly SwissProt has been added. Identification of "your BLAST hit" from BLAST reports to genome map views is now available (using whole chromosome arm database). Major change is to NCBI BLAST (NG) from WU-BLAST (BDGP); this will change some results. Replication of NG includes full replication of BLAST databases and functions. WU-BLAST will be offered later on at NG servers. Pattern Search service to be added later. NG may want some improvements with BLAST result hyperlinks.
Bulk Data ftp://flybase.net/
genomes/Drosophila_melanogaster/current/
ftp://ftp.fruitfly.org/
pub/download/current_release/
  Release 3.2 data is due now in February 2004. There is an interim update of Release 3.1.0 data at ftp://flybase.net/genomes/Drosophila_melanogaster/dmel_r3.1.0_12182003/ This includes Chado format XML annotation and evidence per scaffold, fasta sequence and gff feature format files. The sequence dna and feature locations are identical with earlier Release 3.1.0, but IDs and gene symbols have been updated.

FlyBase-NG Replication

FlyBase-NG can be copied, run on your local computer or informatics center, and is designed for automatic updates to keep it current. It works on popular Unix systems including MacOSX, Linux and Solaris.

We welcome help from bioinformatics centers, including industry and governmental, who wish to provide a copy of FlyBase for local and regional users.

The new genome database/web server infrastructure is called Argos, and is fully open-source, copyable and reusable. See Argos Server documents and installation information (http://flybase.net:8081/) and GMOD project (http://www.gmod.org/argos/)

Argos can be used for other organism genome databases. FlyBase-specific parts are separate from common genome web database parts, which include BLAST, GBrowse, Web server, database and informatics middleware. The euGenes multi-eukaryote genome database, a new Daphnia genome database and others are available using Argos infrastructure.

This Argos underpinning for the new FlyBase server provides a general method for making it robust to high volume usage: the server is automatically clonable, and compute intensive calls can be passed on to any of these clones, transparent to users who see only the main server URL. This is now done with data reports and BLAST calls. You will see at the bottom of many FlyBase web pages: "Run on computer xxxx". Some of our genome web/database software needs to be re-engineered to be used this way, an ongoing task. In time most of FlyBase's web database tasks will be distributed among several computers.

One obvious benefit of this is that many of the computed web pages appear much faster, and as usage increases these will be kept running fast by adding more clone servers. E.g., you now get gene reports about 5 times faster than before (1.2 seconds now versus 6.5 seconds for flybase-old).


Reviews and Previews

Some folks want old data, as is a common practice in sciences and industry to go back and check on your old work. A new service at FlyBase is the maintenance of archives. We will periodically create frozen copies of the FlyBase database/server, and continue to provide these for public use. You can find archived FlyBase servers with older data at http://flybase.net:8081/flybase-archive/

Some folks want the newest data, even if it hasn't yet passed all quality checks. We have extended our long-running method for adding daily updates to include previews of major new releases of data that is still in the works. Find preview data servers at http://flybase.net:8081/flybase-preview/


Growing pains at FlyBase.net

FlyBase usage today (Jan 2004):
Hits per Day  52,000 (ave)  98,000 (max)
Usage groups: 23% commercial, 41% unresolved, 14% edu
FlyBase usage a year ago (Jan 2003):
Hits per Day  33,000 (ave)  58,000 (max)
Usage groups: 12% commercial, 16% unresolved, 30% edu

Statistics message: there has been about 2x growth with many more commercial users, robots, data miners and other high-volume users. FlyBase is not as busy as the NASA Mars Rover landing day by a large factor yet, but as it grows we are ready to use similar methods of distributing usage among as many servers as needed.

It is annoying to find the main FlyBase web server taking coffee breaks in the middle of the morning. Most of the recent server outages have been caused by misbehaving, over-eager robots and data-miners, and Microsoft Explorer web-archive-everything calls. These are just a small percent of clients, but when a single robot misbehaves it can drive a web server to its knees.

We switched in December from ancient Apache web software to the most widely used web server, and added complexity to handle a higher volume of compute intensive programs (blast, etc.). This came with usual problems of "newer is better" software: slower, more complex, and more memory intensive, etc. As well this new web server has a greater tendency to tie up the entire computer when something goes wrong due to a traffic-jam style backup of problems. The cure for this is attention to failure details, and adding various checks and blocks to keep it stable under a wide range of web client uses, including those valued data miners in biology who want lots of data right away.

----------
Don Gilbert ; 17 January 2004

Send comments to us at flybase-help AT morgan.harvard.edu
FlyBase-NG uses Argos: A Replicable Genome infOrmation System