FlyBase Archive server: last updated November 2004
FlyBase .. Aberrations .. Anatomy .. BLAST .. Genes .. Annotation/Sequences .. Gene Products .. Maps .. People .. References .. Stocks .. Transgenes/Transposons .|. Help .. Searches .. News .. Site

RefMan Sections     RefMan Table of Contents     FlyBase Documents
FlyBase Reference Manual C. Using FlyBase on the Web
Last updated: 27 August 2004

 

C.1. Reaching the FlyBase Homepage

FlyBase includes data files, documents, indices, forms, and images and is maintained on a multi-protocol server that supports a variety of tools and formats. The best use of FlyBase is therefore made with a recent-release Web client. The URL for the primary FlyBase server is:

Full mirrors of FlyBase http services are available at:

FlyBase is partially mirrored at several sites outside the U.S.A. These servers will provide faster access to those in Europe, Australia, Japan and the Pacific region. For copyright reasons, these mirrors do not contain Redbook files; all other FlyBase components are included. The URL's are:

C.2. Report Options - Viewing FlyBase Data

All of the search tools and query options described in this Reference Manual are intended to lead you to reports of FlyBase data on genes, aberrations, clones, transposons, or other objects of potential interest. Some reports are very large and will be shown to you initially in a shortened format. The format options for the report you are viewing are listed in the Available Reports bar. The current format will appear in italic type on a colored ground; other options appear as hyperlinked text. Click on one of these links to view a different report format.

C.3. FlyBase Search Tools

The search engine behind FlyBase text searches is SRS. FlyBase sections with searchable text include Genes, Aberrations,Annotation and Sequences, People, Stocks, Transposons, News archives, and References. Most data class pages include a relatively simple search option plus a link to a search form that supports more complex queries of the data in that class. The Genes, Sequences, and Aberrations simple searches find only D. melanogaster records; the advanced Genes and Aberrations query forms allow you to change the species default from D. melanogaster to all species, or to any individual species. Quick Search, the search tool included on the FlyBase homepage, allows multiple data sections to be searched at one time, but also finds only D. melanogaster records. The All Searches Tool provides access to all FlyBase searches from a single page.

The FlyBase Site Map gives a comprehensive listing of the searches and resources available on FlyBase. In addition to links to all FlyBase resources, the FlyBase Site Map also contains brief descriptions of each resource.

To search all FlyBase data sections for D. melanogaster records with Quick Search, highlight All Sections in the scrolling data sections window, place your search term in the query window and select Search Everything. Beware that this approach may produce a very large number of hits. Use the Search Symbols/Names option if your search term is a gene, allele, aberration, transposon or transposon insertion symbol or name (symbols, symbol synonyms and names will be searched).

In addition to text searches, FlyBase offers several special-purpose search tools such as CytoSearch, which are described in section C.3.6 below.

C.3.1. Simple Queries

CAUTION: Use the advanced All options queries for Genes and Aberrations if you are looking for information on non-melanogaster Drosophila species.

Simple or Quick queries consist of a single input field into which you type one or more search terms. 'Any field' searches data in all fields; select an alternative from the pull-down menu to limit your search to one field or a predefined group of fields. Press the Enter key on your keyboard or select the Submit button to run the search. The term(s) you provide will be compared to an index of words that appear in the data of the section being searched (such as Genes, or Clones). When multiple records are found, the page returned to your screen begins with a list of record titles - the hit list - each linked to a report. Navigation buttons that allow you to display additional pages of record titles, if applicable, follow this list. Below the navigation buttons are options that allow you to refine your query by searching your list of results for additional keywords. Batch download options, including options for report content, are also found beneath the results list.

Records can be viewed (and printed, downloaded, or e-mailed if supported by your browser) singly by selecting the link in the record title, or by placing the item number of the record to be viewed in the Items input field at the bottom of the page and then selecting the Fetch items button.

To view multiple records at once, toggle the Items option to All, or supply item numbers in the Items input field. Change the Report content if desired and then select the Fetch items button.

Conjunction options - the logical operators 'and', 'or', and 'but not' - and wild cards

When multiple terms are specified in a simple query without logical operators FlyBase returns records that contain all of the terms. To find records that contain any of the terms use 'or' between each pair of words. To find records that have one term but not another, use 'not' between pairs of words ('or' and 'not' operations cannot be combined).

The wild card * (asterisk) can be used in place of specific leading or trailing characters. Use wild cards to find all terms with a shared root. For example, a search of 'Any field' in Genes for the term *meio* will identify genes that include 'meiosis', 'meiotic', 'premeiotic', or 'meiocyte' in some field of the record. Similarly, a search of the 'Symbol' field in Stocks for *bxd* will find stocks that carry either aberrations or alleles that include bxd in the symbol.

To search for specific alleles, superscripts should be placed in square brackets, as in bcd[1].

C.3.2. All Options Queries

All options search forms allow you to specify search criteria for more than one field at a time, inlcuding species where relevant, most allow additional fields to be searched, and in some cases allow additional fields to be targeted compared to the Simple searches. Very focused questions can be asked of the data by providing multiple field-specific search terms with appropriate logical operators. In some cases, such as the Genes complex search, pull-down menus of controlled vocabulary terms provide all possible choices of search terms for a field.

To perform a multiple-field-limited query, select the appropriate field(s) from the pull-down menu and type your search term(s) into one or more of the input boxes, or select a controlled vocabulary term where available. When using any of the extended query options the input boxes at the top of the form can be left blank (do not use a lone wild-card).

To search for multiple terms in the same field, use logical operators within the input box as described above for Simple queries. The default logical operator applied to multiple search terms in different fields is 'and', as it is within fields -- only records having all specified terms will be found. To apply 'or' or 'but not' logical operators to terms in different fields, select the desired operator from the pull-down menu that appears between the two field-option boxes.

Note about FlyBase Search strategies: The Body Parts search option in Gene/Allele Search uses SRS word-by-word indexing in retrieving records. Thus an Allele Search for "embryonic tracheal system" will return those alleles that include the statement "Phenotype manifest in: embryonic tracheal system", and, additionally, those that include "Phenotype manifest in: embryonic/larval tracheal system" as well as those that include both "Phenotype manifest in: tracheal system" and "Phenotype manifest in: cuticle | embryonic". In fact all Alleles that include "Phenotype manifest in" statements with "embryonic", "tracheal" and "system" will be returned. The Gene Expression Summary browser (see section C.3.5.2) uses exact matches in retrieving records. Thus were the number 12 to appear in the Allele column for "embryonic tracheal system" each of the 12 allele records would include the exact statement "Phenotype manifest in: embryonic tracheal system". Alleles that include "Phenotype manifest in: embryonic/larval tracheal system" are not represented in this 12.

To find all records with information -- any information -- in a given field, use '?*' as your search term. To find all records for which a certain field is blank use '!?*' as your search term. Note, however, that this search can be slow, and may not work well for very large datasets.

Select the Submit button to run your search. Select the Clear button to return all input boxes and pull-down menus to their default values before beginning a new search. As with Simple Query, an option to further refine your query is provided at the bottom of the page that lists records matching your first set of search terms.

C.3.2.1. Examples of Genes Queries

Similar complex query forms are available for most FlyBase data sections. The examples below of Genes queries illustrate some basic query strategies that can be applied to all of the data types that offer complex queries. All query results provide, at the bottom of the form, the option to refine the query by applying an additional round of criteria to the records found in the first query.

Find a gene record using the gene symbol or gene name

To find the record for the white gene ask for w or white in the Symbol/synonym/name field.

If you aren't sure of a gene name but know part of it, use a wild card to search for a partial word. A search of Symbol/synonym/name for rough* will find rough, Rough eye, rough deal, roughest, roughex, roughoid, and 12 other genes with this word or partial word in the name or synonym.

Find records of genes based on the presence or absence of information in a given field

Suppose you want to find all Drosophila genes for which a function of the gene product has been assigned. Use the pull-down menu in one of the field selection boxes to select the 'Product function' field and use ?* as your query string. This finds all gene records with information in the product function field. To find all records with no information about the product function, use !?* as your query string. Use the batch download option on the query result page to create a file of the gene reports identified by this search.

Find in vitro generated gene fusions

In addition to classical genes, the FlyBase genes list includes in vitro generated gene fusions. We define these as fusions that are produced by creating an open reading frame that includes protein coding sequences from at least two different genes. In some cases, only one of the components of the fusion gene is from D. melanogaster. In these cases, a modifier preceding the gene component indicates the other species of origin. For example, if you search for *sev* in the symbol field of Genes, then one of the items reported back to you is:

sev::Dvir\sev

This is a protein fusion in which one component comes from the Drosophila virilis sevenless gene (Dvir\sev) and the other comes from the D. melanogaster sevenless gene (sev). Note that the double colon (::) symbol indicates that this is a gene fusion.

Where the citation reports the exact species being used, the four letter abbreviation is used to identify the species of origin. FlyBase maintains a listing of valid four letter abbreviations for drosophilids and other species listed in our database. Occasionally, the citation is not exact about the origin of a heterologous gene. For example, if you search with btd*, one of the listings reported back to you is:

btd::mammal\Sp1

Here, the btd gene component comes from D. melanogaster, while the Sp1 component comes from an unspecified mammal.

If you want to be sure to capture all gene fusions including a segment of a given gene, we suggest that you bracket your query characters with wild-cards. Thus, only if you search with *Ubx* will you recover all possible gene fusions in addition to the classical Ubx gene:
abd-A::Ubx
Antp::Ubx
Dfd::Ubx
Scer\GAL4::Ubx
Ubx -- 89E1-89E3 (Ultrabithorax)

Note that FlyBase does not consider transgenes in which the coding region of one gene is fused to the regulatory region of another gene as gene fusions in this sense. Rather, in such cases, the transgenes are listed as alleles of the gene that contributes the protein coding information.

C.3.2.2. Greek Symbols in Queries

When you enter a term in one of the query forms, software tries to guess which parts of the terms should be converted from English to Greek codes. Terms that include alpha, beta, gamma and other English equivalents of Greek letters have the Greek codes substituted automatically. The software isn't, however, always perfect at guessing when to substitute codes. If you want to be sure to match Greek symbols, use the appropriate Greek code ( formed as "&-letter-gr-;" ) rather than the English equivalent in your search. For example, alpha = &agr; and Alpha = &Agr;. See Greek codes for a complete table of codes.

C.3.2.3. Queries for Other Database Accession Identifiers

Both the Genes and Alleles query forms allow you to search the External accession number field (select this field from pull-down menu). This field carries the foreign database accession 'numbers' of an object included in the FlyBase Gene or Allele report. The full accession format used by FlyBase is Database/Accession or Database:Accession, for example TREMBL/Q24420, MGI:109617, DDBJ/GENBANK/EMBL/X95244. Accessions are searchable in several ways. Use the full format (see Reference Manual F.3. for a list of the database abbreviations used by FlyBase) for the most specific results (in many cases, using the full name of the database in place of the abbreviation will also work, for example SWP and SWISS-PROT work equally well). You can also search for the 'number' only, for example X95244. This approach will find multiple records if the same accession identifier is in use by different databases. You can also search on the database abbreviation/name alone, for example, SWP to identify all FlyBase records that include SWISS-PROT accession links, or EMBL to identify all FlyBase records that include EMBL accession links.

C.3.3. Searching Stock Lists

Stock lists present something of a special case for searching because only allele, aberration and insertion symbols are included in genotypes, not gene symbols alone or gene full names. All stock center stock lists contain stock numbers, genotypes, and stock center name. Some lists contain additional searchable information such as aberration breakpoints, transposon insertion sites, stock donors, and assorted comments (see section B.11.2. of this Reference Manual for the format of Bloomington and Szeged lists). Bloomington and Szeged stock information is available through Allele, Aberration and Transposon Insertion reports as well as directly from the Stocks data section. A direct search of Stocks is required to find information about private laboratory stocks, which are not linked to Allele, Aberration and Insertion reports. In addition, recently added stock center stocks may appear in the Stocks section before the links to Alleles, etc. have been updated (this will essentially always be the case for Bloomington, which updates its list on FlyBase every time new stocks are added to the collection). See Stocks Search Help for searching instructions and a set of examples that illustrate generally useful query strategies for stocks.

There are two primary search options:

A query may not always be the best way to find the information you need. Stock centers offer their own web sites that include browsing files for stocks useful for a specific purpose, such as mapping, or organized by map location, such as deficiency stocks (including deficiency kits), or by species. See:

C.3.4. Searching for Transgene Constructs and Transposon Insertions

The Transgene Constructs query tool allows you to search for constructs by symbol, by reference citation, or by characteristics such as associated gene or included alleles. Queries may be made for specific types of transgene constructs, such as cloning vectors, enhancer traps, reporter constructs, or constructs used to characterize a particular gene. Resulting transposon reports include information on the construct type (plasmid or synthetic transposable element) and size, its full genotype, its constituent segments, related constructs, insertions of the construct, references, and, if available, links to an annotated map and FlyBase-compiled sequence of the construct.

Example searches

A search of the Symbol/synonym field for 'P{lacW}' or 'lacW' will return the transposon record for P{lacW}. A search of this field for '*lac*' will return a list of all constructs that include 'lac' within the construct symbol or a synonym for the that symbol. Note that regular brackets in a transposon symbol (such as P[lacW]) will not be accepted; curly brackets must be used (regular brackets are used to indicate superscripts).

A search with the two selectable-field query boxes at the top of the page left blank and 'vital | bcd' selected from the menu in the Reporter Construct box will return a list of constructs that carry GFP reporters of the bicoid gene. A search with the two selectable-field query boxes at the top of the page left blank and either 'general promoter | Ubi-p63E' or 'targeted expression | GMR' selected from the menu in the Cloning Vector box will return a list of cloning vectors with that promoter.

A search with the two selectable-field query boxes at the top of the page left blank and 'dsh' selected from the 'Rescue construct' menu box will return a rescue construct for the disheveled gene.

Insertion Search allows you to search for specific insertions into the genome of a given transposon. Resulting insertion reports include references, a stock number if the insertion is available from the Bloomington or Szeged stock center and a link to the relevant transposon report.

Example searches

A search of the Symbol/synonym field for 'P{Car20y}*' will return a list of all insertions of the P{Car20y} transposon known to FlyBase. This type of search may be done using natural transposons, also; for example, 'gypsy*' will return a list of almost 200 characterized insertions of the gypsy natural transposon.

A search of the Symbol/synonym field for '*179Y' will return the record for the P{GawB}179Y insertion.

A search of the Allele symbol field for 'crp[*' will return a list of all crp alleles known to be caused by the insertion of a transposon.

A search of the Allele symbol or FB ID (FBal#) field for * will return a list of all alleles known to be caused by the insertion of a transposon (almost 12,000 at this writing).

Report formats

Most transgene constructs are described briefly in terms of use or function, with a brief molecular description and links to component alleles, specific insertions, and references; for these constructs only a text report is provided. For most cloning vectors, enhancer traps, and other frequently used constructs, FlyBase has provides a more complete description, which includes progenitor constructs, compiled sequence, and an annotated map.

For the most completely described transposons and constructs, three different types of reports are available:

C.3.5. Searching References

When using the "Any field" search field setting of the FlyBase References Query Form, the search engine will examine abstracts stored within FlyBase, such as those of the Drosophila Meeting Abstracts, but not abstracts reflected in Reference reports via a link to PubMed, whose content is not stored internal to FlyBase.

C.3.6. Multipurpose Search Tools

CytoSearch
Gene Expression Summaries
Anatomy Image Browser

C.3.6.1. CytoSearch

 

CytoSearch lists are regional maps of the Drosophila melanogaster genome incorporating both sequence-based and cytology-based map data. Sequence-based data trumps cytology when both are available, cytology trumps meiotic data when both are available, and estimated cytology is used when only meiotic data are available. The FlyBase correspondence tables for cytological and sequence level maps are used to estimate cytology from sequence range and sequence range from cytology, for both the underlying data and the query input.

 
 
Input Options
Example input and notes
 
  Cytolocation  67C, 12A-D, 43, 92A1  
  Range limit is 2 Mbp, but your browser may timeout even within that range if volume of data is too large. If you encounter a page-not-found error, try narrowing the range or selecting fewer features.  
  Sequence region 2R:2206427..2374251  
  All features whose directly determined or estimated sequence range overlaps this range will be returned.  
  FlyBase gene ID# FBgn0003174  
  All features whose directly determined or estimated sequence range overlaps the range of this gene will be returned.  
  Gene symbol  cnn, alphaTub67C  
  This is a case-insensitive search of valid gene symbols. If the gene symbol is unique without regard to case, all features whose sequence range overlaps the range of this gene will be returned. Use the case-sensitive option for valid symbols that are distinguished from another valid symbols only by case.  
  Gene symbol (case-sensitive) w, W, B, b  
  Use this option for gene symbols that differ from other valid symbols only by case.  
  Gene synonym CG3619, omb, ri  
  This is a case-insensitive search of both valid symbols and synonyms. If your input is not a unique symbol or synonym, a list of options with valid gene symbols will be displayed; use the valid gene symbol with the ‘Gene symbol’ menu option in a new CytoSearch query.  
  Annotation DB ID/symbol/synonym CG3757-RA, FBtr0088962, AAF59274.2, CG11101-PA, FBpp0088036, FBan0011101  
  Any valid symbol or ID in the annotation database can be used with this option. This is a direct query of the chado PostgreSQL database, primarily intended for debugging use by FlyBase (but knock yourself out).  
  Other features  
 
  • Data on results page can be downloaded as a tab-separated-values file.
  • Sequence range on results page links to GBrowse, which provides a graphical display of annotation and aberrations data for that region.
  • Gene, insertion and aberration symbols on results page link to their respective FlyBase reports.

 

 

  Known problems  
 
  • Heterochromatic cytological locations are not supported. Heterochromatic bands will be supported when the heterochromatin sequence data are added to the annotation database.
  • Some features are present in duplicate, under different symbols (the invalid symbol is unlinked). This should be corrected with the next update of the annotation database.
  • Some transgene insertions for which FlyBase has sequence-based map data appear as cytologically mapped insertions. This should be corrected with the next update of the annotation database.
  • Only cytology-based aberration data and estimated cytology for sequence-based aberration data are currently available to CytoSearch. Observed sequence ranges will be added for the DrosDel and Exelixis deficiency collections in the near future.
  • Some genes without annotation data, e.g., Bl, are missing from the dataset. This should be corrected with the next update of the annotation database.
 

C.3.6.2. Gene Expression Summaries

Gene Expression Summaries allow you to identify genes, alleles and gene products that are associated with specific Drosophila anatomical features (including subcellular structures) and/or stages of development. The association between gene and body-part may be based on the phenotype of a mutant allele or on the distribution of a transcript or polypeptide of either an endogenous or a transposon-borne gene. Note that FlyBase only captures information on wild-type expression patterns, and does not capture negative observations (i.e., FlyBase does not capture any information that a gene is not expressed in a particular tissue).

The Expression Summary entry page combines summaries for the two top-level terms 'whole organism' and 'developmental stage'. Every term in the hierarchy has a comparable Expression Summary that is available from a Term Report. Gene, allele and gene product records can be retrieved for one or more of the terms listed in an Expression Summary, and a Term Report for any term can be retrieved. You can choose to see data for all classes of genetic object or for any subset, and for all developmental stages, or a subset. The numbers in the columns next to each term are the counts of records available in each category for that term and all of its sub-parts.

Note about FlyBase Search strategies: The Gene Expression Summary browser uses exact matches in retrieving records. Thus were the number 12 to appear in the Allele column for "embryonic tracheal system" each of the 12 allele records would include the exact statement "Phenotype manifest in: embryonic tracheal system". Alleles that include "Phenotype manifest in: embryonic/larval tracheal system" are not represented in this 12. The Body Parts search option in Gene/Allele Search (see section C.3.2) uses SRS word-by-word indexing in retrieving records. Thus an Allele Search for "embryonic tracheal system" will return those alleles that include the statement "Phenotype manifest in: embryonic tracheal system", and, additionally, those that include "Phenotype manifest in: embryonic/larval tracheal system" as well as those that include both "Phenotype manifest in: tracheal system" and "Phenotype manifest in: cuticle | embryonic". In fact all Alleles that include "Phenotype manifest in" statements with "embryonic", "tracheal" and "system" will be returned.

Genes
includes all genes identified with an anatomical feature via mutant phenotype or wild-type gene product expression and returns a Gene Report.
Alleles
includes all alleles with a mutant phenotype associated with an anatomical feature and returns an Allele Report (the genes represented by these alleles are included in the Genes link).
Reporters
includes expression of common reporters Ecol\lacZ, Scer\GAL4 and Avic\GFP. Returns a Transcript Report for promoter:reporter fusion alleles and for the 'alleles' created by enhancer-trap insertions (genomic enhancers regulating basal promoters of reporter genes carried by transposon insertions).
Transcripts
includes all transcripts of endogenous genes associated with a specific stage of development or anatomical feature and returns a Transcript Report.
Polypeptides
includes all polypeptides associated with a specific stage of development or anatomical feature and returns a Polypeptide Report.

How to retrieve records from the Expression Summary

How to retrieve other Expression Summaries

C.3.6.3. Anatomy Image Browser

Anatomy Images Browser is a set of clickable images of various anatomical structures and developmental stages linked to the FlyBase controlled vocabulary for Drosophila development and anatomy. The controlled vocabulary terms associated with an image are themselves linked to FlyBase Term Reports, which include links to gene, mutant allele, transcript and protein records that include phenotype or expression pattern information relevant to the linked anatomical structure.

C.4 Data Selection-Impact on Query Results

As FlyBase literature curation and genome annotation proceed additional data items are added to the records for entities such as genes, alleles and chromosome aberrations.  The entire set of data associated with a gene, allele or aberration is displayed in the corresponding Full Report .  In order to facilitate some searches/report formats an automated process chooses particular items of data for certain fields from the full set of values for those fields, and the search engine returns the chosen items only.  This process is called data selection.  Data items are selected if they represent the unique value for a specific field for the gene/allele/aberration, or, when the data items are not unique, if all curated values agree.  Selection thus works against comprehensive data retrieval for items which have been heavily studied in a variety of systems, and are consequently richly annotated, and FlyBase intends to update the selection process when possible.  In the meantime, searches which currently operate on selected data include:

    Batch Query , when the Select Fields feature is used in conjunction with
    requesting HTML/Text output format, or when the Spreadsheet output
    format is activated.

    Batch Download Select Fields option offered as a follow-on query from a
    Genes or Alleles search.

Data selection may be applied to the following data fields:

    Gene data:
    *e full name of gene or allele
    *b genetic location
    *c cytological location
    *w discoverer
    *B alternative genetic location
    *r information on wild-type biological role
    *s molecular information for genes and alleles
    *D comments on cytological location
    *M probable ortholog in reference species of drosophilid
    *l transposable element data
    *n aberrations causing position-effect variegation of gene
    *J protein domain information

    Allele data:
    *G insertion chromosome associated with allele
    *P aberration causing the allele
    *I transgene construct that carries allele
    *v information on availability
    *k phenotypic information on alleles

    Aberration data:
    *B breakpoints
    *C class of aberration
    *b genetic map position (for some small insertions and transposons/transgene constructs)
    *N new cytological order
    *w discoverer(s)
    *o origin/mutagen  
    *O progenitor genotype if relevant to aberration
    *c comments on cytology     
    *p phenotypic data     
    *q genetic data with respect to genes
    *s molecular data
    *S alleles
    *T genetic data with respect to other aberrations     
    *V position effect variegation information

So when performing field-specific queries, e.g. with the 'select field' options of Batch Query /Download, you must bear in mind that you are not necessarily retrieving the comprehensive listings for any fields that appear in the above list.  If it is important that the listings you retrieve represent the full dataset then it is prudent to use Gene Query , Allele Query , Aberrations Query or Batch Query (Full Data options) to retrieve the full record(s) of interest, download the reports and process locally to obtain the terms sought.  If you need help with this contact flybase-help at morgan.harvard.edu (reformat as standard e-mail address).

Further documentation on Batch Queries can be found in Reference Manual D. Bulk FlyBase Data Retrieval .










Send comments to us at flybase-help AT morgan.harvard.edu
FlyBase-NG uses Argos: A Replicable Genome infOrmation System