Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.

News

Update History

Information

Introduction to BioCyc
5500 Databases
Guided Tour
Pathway Tools Software
Pathway Tools Blog
Publications
Linking to BioCyc
Webinars
Contact Us

Services

Subscribe to BioCyc
Metabolic Posters
Genome Posters
Software/Database Downloads
Registry
Web Services

Web Site User’s Guide
for Pathway Tools-Based Web Sites

Contents

    1  Overview

    2  Selecting the Database to Search
        2.1  By Name
        2.2  By Taxonomy
        2.3  By Organism Properties

    3  Searching Pathway/Genome Databases
        3.1  Quick Search
        3.2  Search Menu: Object Searches
        3.3  Search Menu → BLAST search
        3.4  Search Menu → Google This Site
        3.5  Search Menu → Search Full-text Articles

    4  Ontology Searches

    5  Web Accounts

    6  Genome Browser
        6.1  Displaying External Tracks on the Genome Browser
        6.2  Comparative Genome Browser

    7  SmartTables
        7.1  SmartTable Structure and Display
        7.2  SmartTable Directory
        7.3  Creating a SmartTable
        7.4  Manipulating SmartTable Contents
        7.5  SmartTable Transformations
        7.6  Exporting and Sharing a SmartTable
        7.7  Browsing SmartTables and Users

    8  Omics Data Analysis

    9  Cellular Overview (Metabolic Map Diagram)
        9.1  Summary of Commands
        9.2  Searching and Highlighting
        9.3  Cellular Omics Viewer — Overlay Experimental Data

    10  Regulatory Overview
        10.1  Summary of Commands
        10.2  Regulatory Omics Viewer

    11  Comparative Analysis
        11.1  Compare Objects Across Databases
        11.2  Compare Individual Pathways and Reactions
        11.3  Comparative Analysis Tables

    12  Sequence Search and Alignment
        12.1  BLAST Search
        12.2  Alignment Viewer
        12.3  PatMatch Sequence Search

    13  Metabolic Route Search

    14  How to Learn More

1  Overview

This document describes how to use Web sites based on the Pathway Tools software from SRI International. Since multiple Web sites such as BioCyc, YeastCyc, AraCyc, and MouseCyc are all based on the same underlying software, the same usage instructions apply to all. (Note that differences in configuration and in software version may introduce some variability among sites).

Please note that the desktop version of Pathway Tools that you can install locally provides some additional operations compared to the Web capabilities described here. Click here for more details.

2  Selecting the Database to Search

Unless otherwise indicated, all Pathway/Genome Database searches are restricted to a single database. In most cases, a database describes a single organism – although a small number of multi-organism Pathway/Genome Databases exist (examples include MetaCyc and PlantCyc). The database against which searches will be conducted is indicated below the Quick Search box in the page banner.

To search a different database, click on the ‘change organism database’ link below the Quick Search box. In the dialog that pops up, you can either search for the organism of interest by starting to type its name, by browsing the organism taxonomy, or by querying various properties.

If the site supports user accounts, and you are logged in, you may select one database as your preferred database. This database will be your default selection when starting a new web session.

Once you have selected the desired database from one of the tabs described below, click OK to exit the dialog. This will navigate to the page of summary statistics for the selected database.

Note that if you follow a link to a page for a different organism database, then the selected database for searching will change to match the organism of the currently displayed page.

2.1  By Name

By default, the By Name tab will be initially selected. If a small number of databases is available, a full scrollable list of databases is present to select from. When a large number of databases is available, you must start typing or select a starting letter from the alphabetical index to the left of the database list in order to see the list of matching databases. If you start typing an organism name or select a starting letter, the full list of databases (if available) will be replaced by a list of databases matching the typed string or starting with the selected letter — you can use the mouse or the up/down arrows on your keyboard to select the desired database. An organism name will match the string you type if any word in its name (i.e. genus, species or strain name) starts with the string you type.

In the list of matching databases, some database names may be displayed with a gray background – these indicated databases that have had some level of manual review and/or curation. Tier 1 databases, i.e. those that have received at least a year of literature-based curation, will have a dark gray background. Tier 2 databases, i.e. those with a lower level of manual curation, will have a light gray background. All others are Tier 3 databases, which means they have been computationally generated with little or no manual review.

Lists of your recently used databases and the site’s most popular databases provide shortcuts for selecting those databases.

2.2  By Taxonomy

The By Taxonomy tab allows you to select an organism by browsing for it. After the name of each class of organisms is listed the number of organism databases in that class. The taxonomy tree does not include all taxonomy classes, only those that contain at least one organism database – if a particular taxon does not appear in the tree, it means there is no database available for it or its children. Clicking on a class name will show or hide its list of child taxa. Clicking on an organism name will select that database and show its name at the top.

You may search for any taxon by starting to type its name in the text box. If you select one of the options from the resulting auto-complete box, the taxonomy will automatically expand to show the selected taxon (you must still click on the organism name in the taxonomy to select that database, however).

2.3  By Organism Properties

The By Organism Properties tab allows you to query for all organisms that have (or do not have) some property. The types of properties that can be queried (known as the organism “metadata”) include such attributes as when and where and from what host the sample was collected, whether or not the organism is a pathogen, its relationship to oxygen (e.g. aerobic or anaerobic), etc. Not all organism databases contain data for each of these attributes. In the list of properties from which to select, the number of databases that have values for that property is listed in parentheses.

After selecting a property, you can constrain its value, or just select all databases that have (or do not have) any value for that property. To select from a list of all available values, click in the text box. In the resulting list of possibilities, the number in parentheses after each value is the total number of organisms that match that value. If you start to type, the list of visible options will be limited to those that match the string you have typed. Multiple options may be selected by clicking in the text box again after selecting a value – in that case, an organism will satisfy the constraint if it matches any of the selected values (i.e. the values are connected by an implicit OR). For properties whose values consist of free text, you may also query by substring. The first few values that match your substring are shown, but you are not obligated to select any of them. For properties whose values are numeric, a variety of numeric operators are available, as well as the option to select from all available values. If you specify an = constraint, an organism will satisfy the constraint if its value falls within a small range on either side of the specified value – the size of this range depends on the property, and is indicated below with the description of each property. To specify a different range, use a combination of < and > constraints.

Up to six different constraints may be specified (use the “Add Constraint” button to add a new constraint, up to the limit). These may be connected by either AND (an organism must satisfy both constraints) or OR (an organism may satisfy either constraint). Since there is no way to group constraints, if you are are building a query that combines both ANDs and ORs, ordering becomes very important. Queries are processed in a left-to-right order, so X AND Y OR P AND Q is interpreted as ((X AND Y) OR P) AND Q, which may not match what was intended. If the ordering of constraints do not allow for a desired query, you may be better off splitting your query into multiple queries and searching for the desired organism one part of the query at a time.

The following properties are available for searching:

  • Environment: This property encompasses terms that describe the environmental features and habitats where the sample was taken. This can include biome-level terms, such as desert, deciduous woodland, coral reef; geographic features such as harbor, cliff, lake; and/or environmental material such as air, soil, water. It can also include terms related to host environment (e.g. blood, skin, oral cavity, gut). This slot combines the MIGS concepts biome, feature, material, body_habitat, body_site and body_product. Ideally, terms should be taken from the EnvO or the FMA ontologies, but can also be free text. An organism may have multiple different values for this property.

  • Geographic Location: The geographical origin of the sample, defined by country or sea name, and/or specific region name. This property can have multiple values, e.g. one might be a country name, another a region name, and another text describing the specific location.

  • Latitude: The latitude of the geographical origin of the sample. Values are reported in decimal degrees, in the WGS84 system. Positive numbers are North, negative numbers are South. If you specify an = constraint for this property, all organisms whose latitude is within 10 degrees of the requested value will be included in the result. If you wish a different size range, you will need to specify it explicitly by combining < and > constraints.

  • Longitude: The longitude of the geographical origin of the sample. Values are reported in decimal degrees, in the WGS84 system. Positive numbers are East, negative numbers are West. If you specify an = constraint for this property, all organisms whose longitude is within 10 degrees of the requested value will be included in the result. If you wish a different size range, you will need to specify it explicitly by combining < and > constraints.

  • Depth/Altitude: The depth or altitude in meters at which the sample was collected. Negative numbers are depths, positive numbers are altitudes. If you specify an = constraint for this property, all organisms whose depth or altitude is within 20% of the requested value will be included in the result. If you wish a different size range, you will need to specify it explicitly by combining < and > constraints.

  • Collection Date: The year the sample was collected.

  • Relationship to Oxygen: Whether the organism is an aerobe or anaerobe, and what form.

  • Trophic Level: The position of the organism in a food chain.

  • Temperature Range: A qualitative description of what kind of temperature range the organism grows best in. A mesophile grows best in moderate temperatures, typically between 20 and 45 degrees Celsius. A psychrophile prefers colder environments, whereas a thermophile prefers warmer ones, and a hyperthermophile thrives in extremely hot environments of 60 degrees Celsius and higher.

  • Biotic Relationship: Whether the organism is free-living or in a host, and if the latter, what type of relationship is observed.

  • Pathogenicity: The general class of organisms to which the organism is pathogenic.

  • Host: The host from which the sample was isolated.

  • Health/Disease State: The health or disease state of the specific host at the time of collection.

  • Ploidy: The ploidy level of the genome, e.g. haploid, diploid, triploid, allopolyploid.

Once you have specified the desired constraints, use the “Find Organisms” button to search for all matching organisms. In the resulting table, which includes all properties for which at least one of the matching organisms has a value, you may click on any column heading to sort by that column. Click on a row to select that organism.

3  Searching Pathway/Genome Databases

3.1  Quick Search

The Quick Search box in the upper right hand corner of every page is useful if you know the name (or part of the name) or database identifier of the object you are searching for. You may use this box to search for genes, proteins, compounds, RNAs, reactions, pathways, operons, and GO terms. If the query string matches a single object, the page for that object will be displayed immediately. If there are multiple matches, the full list of matches will be shown, organized by the type of object (e.g. gene, protein, etc.).

Some examples of what can be entered into the Quick Search box include:

  • The name of a compound, gene, protein, pathway or other object. Spaces, punctuation and capitalization are ignored. An object will be returned if the query string matches either its common name or one of its synonyms.
    Examples: pyruvate, trpA

  • Any substring of one of the above names that is 3 or more characters in length.
    Examples: kinase, pyr

  • An EC number (full or partial).
    Examples: 1.2.3.3, 1.3.99

  • A PGDB internal object identifier for any compound, gene, protein, pathway, reaction, transcription-unit or schema class. Correct capitalization may be required.
    Examples: CPLX0‑3661, HEMN‑RXN

  • A PGDB internal object identifier for any compound, gene, protein, pathway, reaction, transcription-unit or schema class in some other PGDB served at the same site, followed by ’@’ and the PGDB identifier (no spaces).
    Examples: trp@ecoo157, HEMN‑RXN@META

  • An identifier from some external database to which we maintain links, e.g., a UniProt identifier. Correct capitalization and punctuation is required. Note that our set of links is not complete – just because a search for an external ID returns no result does not mean that we do not have the object in our database.
    Examples: P00561, NP_414543, C00047

A few additional rules govern searches:

  • To match several words or text-fragments simultaneously, type in the words separated by spaces to find an object with all the words in its name, or separated by commas to find objects with any of the words in its name. For example, if you enter nitrate camphor, the program will search for a single object that has both nitrate and camphor in its name. However, entering nitrate, camphor would result in a search for objects which have either nitrate or camphor in their names.

  • If your query text is one or two characters in length, only exact text matches will be returned because of the many matches that would otherwise result. For longer text fragments, the search will return all objects that contain the text rather than match it exactly.

  • Searches may be qualified. Currently we allow two qualifiers:

    1. search:exact Example: trpa search:exact
      This search will be limited to exact matches. In the example given, assuming the current organism is E. coli K-12, without the qualifier there will be several matches including genes, proteins and transcription units. With the qualifier you will be taken directly to the trpa gene page.

    2. type:<type-qualifier> Example: atp type:compound
      This search will be limited to the specified type. In the example given, assuming the current organism is E. coli K-12, without the qualifier a large number of results will be returned of various types. With the qualifier, just the seven compounds with ATP in the name will be returned.
      Allowable type-qualifiers include pathway, gene, enzyme, rna, go-terms, compound, reaction, operon, and organism.

3.2  Search Menu: Object Searches

The Search menu contains links to specialized search pages for Compounds, Genes/Proteins/RNAs, Reactions and Pathways. Each such page contains options for searching using a number of different criteria, either individually or in combination. When the page is initially loaded, only the name searches are active, but by clicking on the different search bars, you can enable or disable additional search criteria. If multiple search criteria are specified for a given search, then unless otherwise specified the results must satisfy all of them (that is, an AND connector is used to combine the different criteria).

The results of all object searches is a table containing the names of all objects that satisfy the search, with hyperlinks to their corresponding data pages, along with any additional columns relevant to the particular search. The table will initially be sorted alphabetically by name, but small triangles in the column headers allow the user to sort by any column, in either ascending or descending order.

The sections below describe the different search criteria that are available for each object type.

3.2.1  Search Menu → Search genes, proteins or RNAs

  • Search by gene name or database identifier
    Enter a gene name, name fragment, or identifier (either the internal Pathway/Genome Database identifier, or an identifier from some other database). The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected gene, regardless of any other search criteria you may have specified (i.e., other search criteria are ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search by product name, database identifier or EC number
    Enter a protein or RNA name, name fragment, identifier (either the internal Pathway/Genome Database identifier or an identifier from some other database, such as UniProt), or a fully specified EC number. The software will attempt to do auto-completion, as for the gene name field.

  • Search/Filter by sequence length
    Enter a minimum and/or maximum sequence length, and specify whether the units referred to are nucleotides or amino acids. If either the minimum or maximum field is left blank, then the sequence length is unconstrained in that direction.

  • Search/Filter by replicon and/or gene map position
    Enter a minimum and/or maximum gene map position, where the units are the number of base pairs from the start of the replicon. The results will include any gene that overlaps any portion of the specified region. If either the minimum or maximum field is left blank, then the map position is unconstrained in that direction. If the selected organism has multiple replicons, then this search option will include a checkable list of replicons – you may select one or more replicons either instead of or in conjunction with the map position in order to constrain the search to genes on a particular replicon.

  • Search/Filter by product molecular weight
    Enter a minimum and/or maximum molecular weight for the gene product in kilodaltons. If either the minimum or maximum field is left blank, then the sequence length is unconstrained in that direction.

  • Search/Filter by pI
    Enter a minimum and/or maximum pI (isoelectric point) for the gene product. (Typically little information about pI is available for databases other than EcoCyc or MetaCyc.)

  • Search/Filter by small molecule regulator, cofactor, substrate or ligand
    This search option is for retrieving all proteins affected by a specified small molecule in any of several ways. An example might be to search for all enzymes inhibited by ADP, or all enzymes that use Mg2+ as a cofactor. Enter the name of a small molecule. We recommend taking advantage of the auto-complete facility to select the correct small molecule, as only an exact match to a compound name can be accepted here. Check all roles that you are interested in for this compound. Note that we consider cofactors to include only compounds that are not modified in any way during the reaction. Molecules such as NAD, which are modified, are considered to be substrates, not cofactors. (Relatively little information about activators, inhibitors, etc. is typically available for databases other than EcoCyc or MetaCyc.)

  • Search/Filter by evidence code
    The evidence ontology appears here in browseable form. Each evidence code includes in parentheses after its name the number of gene products that have their function annotated with that code. Selecting one or more codes to filter on allows you to restrict your search, for example, to all proteins whose function has been established experimentally. The Pathway Tools evidence codes and ontology are described here.

  • Search/Filter by cell component
    The cell component ontology appears here in browseable form, along with the numbers of gene products associated with each cell component. Selecting one or more components allows you to restrict your search to proteins known to be present in those cellular locations. (Note that relatively little information about cellular locations of gene products is available for databases other than EcoCyc or MetaCyc.) The Pathway Tools cell component ontology is described here.

  • Search/Filter by Gene Ontology
    If the selected database has been annotated using Gene Ontology, then you will see a browseable ontology here. Only terms that have one or more gene products annotated to them or their children will be present, and the number in parentheses after each term name indicates the number of gene products annotated to that term or one of its children. You may browse this ontology to a particular term to see all gene products annotated with that term. Clicking on a gene product will then take you directly to the data page for that gene product, just as clicking on a term name will take you to the data page for that term. Alternatively, you can use the checkboxes to indicate that your search should be restricted to include only gene products annotated with the checked terms or their children. If you wish to filter by only a single term, and you know the name or ID for that term, you also have the option of typing it in the text box (using auto-completion to ensure you select the correct term).

  • Search/Filter by MultiFun term
    If the selected database has been annotated using the MultiFun ontology, then you will see a browseable ontology here. Only terms that have one or more genes annotated to them or their children will be present, and the number in parentheses after each term name indicates the number of genes annotated to that term or one of its children. You may browse this ontology to a particular term to see all genes annotated with that term. Clicking on a gene will then take you directly to the data page for that gene, just as clicking on a term name will take you to the data page for that term. Alternatively, you can use the checkboxes to indicate that your search should be restricted to include only genes annotated with the checked terms or their children.

  • Search/Filter by organism
    This search option will be available only if the selected database is a multi-organism database (such as MetaCyc), and allows you to browse directly for proteins from a particular organism, or to restrict your search to one or more taxonomic groups.

  • Search/Filter by publication
    This search option is useful for retrieving a list of all genes or gene products that cite a given publication or author. Enter either the PubMed ID, the author surname, or part or all of an article title.

  • Search/Filter by existence of protein features
    This search option generates a browsable ontology of protein features. Select one or more feature types to search for proteins annotated with those features.

3.2.2  Search Menu → Search compounds

  • Search for compound by name or ID
    Enter a compound name, name fragment, or identifier (either the internal Pathway/Genome Database identifier, or an identifier from some other database such as PubChem or LIGAND). The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected compound, regardless of other search criteria you may have specified (i.e., other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by ontology
    This option allows you to browse the compound ontology. Each compound class includes in parentheses after its name the number of instance-level compound objects that are members of that class. Clicking a + icon shows the classes and compounds that belong to a particular class. The ontology may be used in one of two ways. By selectively clicking on + icons, you can browse to find a compound or compound class of interest, and click directly on its name to visit the data page for that compound. Alternatively, you can check the checkbox next to one or more class names to limit your search (which may also include other search criteria) so as to only include compounds that belong to one of the checked classes.

  • Search/Filter by molecular weight
    This option can be used to specify either a minimum molecular weight value, a maximum molecular weight value, or both. If either the minimum or maximum field is left blank, then the molecular weight is unconstrained in that direction.

  • Search/Filter by chemical formula (partial or full)
    If one or more element symbols are entered without a number, then the result will include any compound containing those elements (and possibly some others). If an element symbol is followed by a number, then only compounds with exactly that number of that element in its chemical formula will be included in the result. For example, the query string C12N will retrieve all compounds with exactly 12 carbons, one or more nitrogens, and possibly some other elements. The search is case-insensitive unless case is needed to disambiguate. For example, either co or CO will retrieve all compounds containing both carbon and oxygen, but Co will instead retrieve all compounds containing cobalt.

  • Search by InChI string
    InChI is short for International Chemical Identifier, and offers a way to search for a molecule by its chemical structure. We support only exact string matching for InChI strings.

3.2.3  Search Menu → Search reactions

  • Search for reaction by EC number or name
    Enter a reaction EC number or name (typically an enzyme name). EC numbers can be either full or partial. The software will attempt to do auto-completion on the name or EC number. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected reaction or reaction class, regardless of any other search criteria you may have specified (i.e., other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by substrates or products
    Enter a compound name to retrieve all reactions in which that compound participates either as a substrate or product. Multiple compounds can be specified, separated by either OR, AND or AND NOT. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by whether or not reaction is catalyzed by an enzyme
    Specify whether to include only enzyme-catalyzed reactions for which an enzyme has been identified, enzyme-catalyzed reactions for which no enzyme has been identified, or spontaneous reactions.

  • Search/Filter by ontology
    This option allows you to browse the Pathway Tools reaction ontology. Each reaction class includes in parentheses after its name the number of reactions that are members of that class. The ontology may be used in one of two ways. By selectively clicking on + icons, you can browse to find a reaction of interest, and click directly on its name to visit the data page for that reaction. Alternatively, you can check the checkbox next to one or more class names to limit your search (which may also include other search criteria) so as to only include reactions that belong to one of the checked classes. Note that there are two parallel reaction classification systems, one in which reactions are classified by conversion type (this includes the entire EC hierarchy), and another in which the reactions are classified by substrate. Most reactions in the database have parents in both classification systems.

3.2.4  Search Menu → Search pathways

  • Search for pathway by name
    Enter a pathway name, name fragment, or internal Pathway/Genome Database identifier. The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected compound. This is true regardless of any other search criteria you may have specified (i.e. other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by ontology
    This option allows you to browse the Pathway Tools pathway ontology. Each pathway class includes in parentheses after its name the number of reactions that are members of that class. The ontology may be used in one of two ways. By selectively clicking on + icons, you can browse to find a pathway of interest, and click directly on its name to visit the data page for that pathway. Alternatively, you can check the checkbox next to one or more class names to limit your search (which may also include other search criteria) so as to only include pathways that belong to one of the checked classes.

  • Search/Filter by number of reactions
    Enter a minimum and/or maximum number of desired reactions in the pathway. If either the minimum or maximum field is left blank, then the number of reactions is unconstrained in that direction.

  • Search/Filter by substrates present
    Enter one or more compound names to retrieve all pathways in which those compounds participate as a reactant, a product, or an intermediate. If you enter more than one compound, then the pathway must involve all specified compounds in order to be included in the results. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by evidence code
    The Pathway Tools evidence ontology appears here in browseable form. Each evidence code includes in parentheses after its name the number of pathways that have their function annotated with that code. Selecting one or more codes to filter on allows you to restrict your search, for example, to all pathways whose presence has been established experimentally. The Pathway Tools evidence codes and ontology are described here.

  • Search/Filter by organism
    This search option will be available only if a multi-organiam database (such as MetaCyc) is the selected database, and allows you to browse for pathways that are curated as occurring in a particular organism based on experimental information. The fact that a pathway is not stated to be present in a given organism does not mean that the organism does not have the pathway – pathways are curated for only a small subset of the organisms in which they appear.

  • Search/Filter by expected taxonomic range
    This search option will be available only if a multi-organism database (such as MetaCyc) is the selected database. Each pathway in MetaCyc has been annotated with its expected taxonomic range. This search option allows you to restrict your search to include only those pathways you could reasonably expect to see for a given taxonomic grouping, for example, to restrict your search to pathways seen in plants.

  • Search/Filter by publication
    This search option is useful for retrieving a list of all pathways that cite (either directly or through one of the pathway’s enzymes, genes, subpathways or substrates) a given publication or author. Enter either the PubMed ID, the author surname, or part or all of an article title.

3.2.5  Search Menu → Search growth media

Some databases may include sets of growth media, along with information about whether or not the organism can grow on a particular medium and under what conditions (for example, gene knockout studies can indicate whether the organism can grow on a particular medium in the absence of a particular gene). To see the full list of growth media for a database, including an indication of which media have associated knockout data, click on the All Growth Media for this Organism button. Use the other fields of this form to search for growth media that meet certain criteria.

  • Search for growth media by name
    Enter a growth medium name or name fragment. The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected compound. This is true regardless of any other search criteria you may have specified (i.e. other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by compounds present in the medium
    Enter up to four compound names to retrieve all growth media that contain either any or all of the specified compounds. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by compounds not present in the medium
    Enter up to four compound names to retrieve all growth media that do not contain any of the specified compounds. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by observed growth
    Select one or more growth levels to retrieve media on which any of the selected levels of growth have been observed. If no gene knockout is specified, then the growth levels refer to wildtype growth. If a gene is specified, then the growth levels refer to knockouts of that gene. When specifying a gene, we recommend using the auto-complete facility to select the correct gene, as only an exact name match can be accepted here.

3.2.6  Search Menu → Advanced Search

The Advanced Search tool facilitates generation of queries that are more complex than those supported by the object search tools described above. Using the Advanced Search tool, you can write queries that combine data from multiple organisms or multiple types of objects, and you can search fields that are not supported by the individual object search pages. Detailed instructions for using the Advanced Search tool to construct complex queries are available here.

3.3  Search Menu → BLAST search

This facility (not available for MetaCyc) allows you to perform sequence-similarity searches using the BLAST program to compare your protein or nucleic acid sequence against the complete genome of the selected organism database.

3.4  Search Menu → Google This Site

The Search Menu → Google This Site command uses Google to perform a full text search over this entire Web site. Searches will not be restricted to the selected database, and can locate text strings found in page comments, help pages, and other page content not queryable by other means. Submitting this form will direct the user outside this Web site to a page generated by Google. A Google full text search is also offered as an option when a Quick Search fails to return any result (or does not return the desired result).

3.5  Search Menu → Search Full-text Articles

Textpresso is a package for indexing and searching a corpus of biological literature. Textpresso searches are available for searching a large Escherichia coli literature corpus only at the BioCyc Web site, and are available only when EcoCyc is the selected database.

4  Ontology Searches

An ontology is a carefully constructed vocabulary of terms, often called a controlled vocabulary. The terms are organized into a classification hierarchy (also called a taxonomy). Ontologies can be used to browse and search for objects by drilling down from more general categories to more specific ones. Each Pathway/Genome Database contains several ontologies. Those that can be searched are available from the Ontologies sub-menu in the Search menu. These ontologies can also be accessed from the object search page for their particular object type. The browseable ontologies are:

  • Genome → Browse Gene Ontology
    Not all databases contain Gene Ontology (GO) annotations, but for those that do, GO can be browsed to see which gene products are assigned to which GO terms. Each database only contains those terms to which one or more gene products are actually assigned, so a term may be missing from the browseable ontology even though it is a valid GO term. GO can also be browsed from the Search Menu → Genes/Proteins/RNAs page.

  • Metabolism → Browse Pathway Ontology
    The Pathway Tools pathway ontology classifies pathways into groups based on their biological functions, and based on the classes of metabolites that they produce and/or consume. It is also accessible from the Search Menu → Pathways page.

  • Metabolism → Browse Enzyme Commission Ontology
    <a Enzyme Commission numbers (EC numbers) form a classification scheme for enzymes, based on the chemical reactions they catalyze. Pathway/Genome Databases use EC numbers to organize enzyme-catalyzed reactions (rather than the enzymes themselves) based on type of transformation and class of substrates. The EC ontology can also be browsed from the Search Menu → Reactions page (as a child of Chemical-Reactions). Both Search Menu → Reactions and Search Menu → Genes/Proteins/RNAs pages allow searching by EC number.

  • Metabolism → Browse Compound Ontology
    The Pathway Tools compound ontology describes small molecules, that is, chemical compounds that are not macromolecules. It is also accessible from the Search Menu → Compounds page.

5  Web Accounts

Pathway Tools Web accounts give users the ability to customize their experience when accessing PGDBs via the Web, and to store SmartTables of objects in their account.

Web site accounts provide several benefits. Through your account you can:

  • Define SmartTables of genes, pathways, metabolites, and more for analysis and to share with colleagues

  • Customize the appearance of pages on this Web site

  • Store organism sets for comparative operations

  • Receive important email updates about this Web site

To create an account, click “Create New Account” at the top right of most Web pages. (If those words are missing it probably means that Web Accounts are not enabled for this Pathway Tools Web site. The Pathway Tools User Guide describes how to enable and configure Web Accounts for a Pathway Tools Web site.)

6  Genome Browser

The genome browser can be used to examine one replicon (chromosome or plasmid) at a time. Its tracks capability can be used to visualize high-throughput datasets in a genome context.

The genome browser can be invoked by

  • Selecting Genome → Genome Browser from the main menu

  • Clicking on a replicon listed in the organism summary page (that page can be created by selecting Analysis → Summary Statistics

  • Clicking on the “Genome Browser” button in gene pages, on the Map Position line

At the top of the genome-browser page, the full length of the chromosome is shown at low resolution. A region of the chromosome can be selected for display at much higher magnification in the lower part of the screen. The selected region will be drawn using as many lines as will comfortably fit on the Web browser page. The full chromosome view at the very top indicates the magnified region by means of a red, rectangular cursor.

Selection of the magnified region can be achieved by the following methods:

  • Clicking on a vertical tick mark within the full chromosome line at the top will show the immediate neighborhood of that position. The tick marks in the magnified region can also be clicked on, to recenter the region around the selected tick mark quickly.

  • Start and end base-pair positions can be entered in the corresponding text entry boxes; clicking the Go button displays that region.

  • The region around a gene can be shown by entering the gene name in the corresponding text entry box and clicking on the Go button. The selected gene will be visually highlighted.

  • The panel of navigation arrows to the left of the legend can be used for moving to a nearby region. The panel allows lateral translation to the left or right, and also serves to zoom in or out.

The magnified section indicates the transcription direction of genes by rectangular blocks with an arrow at one end, pointing from the 5’ to the 3’ end. ORFs for actual or inferred proteins have symmetrical arrowheads (with the arrow apex in the center), whereas RNA genes have an asymmetrical arrowhead (with the apex at the top edge). Phantom- and pseudo-genes are crossed out with a big, diagonal X. When a gene wraps across more than one line, a zigzag at the end of the line indicates that the gene continues on the next line. Clicking on a gene brings up the corresponding gene description page.

Gene arrows filled with solid colors have transcription unit (operon) information available. All the adjacent genes that are part of a given operon are assigned the same color. Genes that have not been assigned to any transcription unit are not colored. Additionally, transcription-units are indicated by a gray background area behind the genes, spanning the entire region of the operon.

Moving the mouse-cursor over the genes reveals their product name and the length in base pairs of the intergenic region between the chosen gene and its neighboring genes to the left and right. If the number of base pairs carries a minus sign, the genes overlap by that many bases. As an example:

  Gene: xdhB

  Product: putative xanthine dehydrogenase subunit, FAD-binding domain

  Intergenic distances (bp): xdhA< +11 xdhB -3 >xdhC

This means that there are 11 bp to the left of xdhB before xdhA is reached, but to the right, xdhC overlaps with xdhB by 3 bp.

If the overlap between adjacent genes is more than a small amount, the shorter gene is drawn above the longer gene to avoid visual clashes.

When zooming in to a great level of detail, transcription start sites and terminators are drawn. Transcription start sites are indicated by small arrows that point toward the 3’ end of the transcript. Moving the mouse-cursor over a transcription start site reveals the operon it is part of. The transcription factors controlling the operon are also shown, with a plus sign meaning activation and a minus sign meaning inhibition. Clicking on a transcription start site brings up the corresponding transcription unit description page.

6.1  Displaying External Tracks on the Genome Browser

External datasets can be shown alongside the display of a replicon region, in form of additional tracks that are uploaded by the user. The supported tracks file format is GFF, version 2. A short description of this format can be found on the help page, reached by clicking on the green icon containing a question mark, on the far right side of the genome browser’s navigational controls.

The GFF file allows definition of segments on the chromosome that are denoted by a start and stop base-pair position. In an attribute field of the file, a name can be assigned to the segment, and in a score field, a numerical value (such as an expression value) can be supplied. This allows a broad range of different data types to be shown in the genome browser, aligned with the genes and transcription units that a PGDB already describes. This could include alternate gene predictions, or the results of expression experiments. Each specified segment can state a source and feature value, allowing different segment types to be supplied in one file. The external track mode of the genome browser will display different combinations of source/feature values grouped together. If in these groups some of the shown segments overlap due to their base-pair positions, such horizontal segments will be displayed on separate lines, to avoid visual clashes.

To view data from such a GFF file in an external track, first open the genome browser. Next click the “Show Tracks” button to the right of the gene name dialog box. This will enter the external tracks mode, in which the magnified genome region will no longer wrap to fill the screen, instead making room for external tracks that will be displayed underneath. Vertical hair lines will be shown for easier visual alignment of features in external tracks with the magnified region.

Next, add tracks data from an external data file using the controls at the bottom of the page. The data file can be specified through a Web site URL (click the “Add Track” button to the right of “Load track data from GFF file via URL”), or from a file on your computer’s hard disk (click “Browse...” to find the file, then click its associated “Add Track” button). Depending upon the size of your GFF file, it can take several minutes to upload a file. During this time, the page will not respond, and you should not click more controls. After the file has finished successfully uploading and being parsed, it will let you know by refreshing the page.

The external tracks display will show the feature name on the left, the sequence name if one is included, and the appropriate color to match the feature’s score, if a score value was found in the GFF file. Following the display of a track, you can continue to browse the genome normally, using the standard Left, Right, Zoom Out, and Zoom In controls, and the Gene Name box.

You can display data from more than one GFF file at the same time. Load each file individually using the procedure described above. Tracks from the first file loaded will appear just below the gene line. Tracks from the second file loaded will appear below those from the first, and so on. The order of the tracks can be changed, by left-clicking on the underlined track titles on the left side, which name the feature type. The popup menu allows the chosen track to be moved up or down by one step relative to the current ordering.

The horizontal bars represent the feature data found in the GFF track file. These are arranged in rows distributed vertically, so as to help prevent overlapping features from running into each other and being indistinguishable. The number of distributed rows may vary with the zoom scale, so that features can fit; there is no other meaning to the number of lines. The length of each horizontal bar shows the extent of each individual feature reading. The color is drawn from a spectrum that shows the magnitude of a score. In order to get a better feel for this magnitude, a graph of the same track feature data is also plotted above the horizontal bars. In the default graph mode, each feature score is represented by a horizontal line spanning the feature’s start and end base-pair coordinates. The magnitude of the score is represented as the height on the graph. This offers an intuitive method of viewing trends and anomalies in the data at a glance.

In the bar graph mode, the rectangular area between the feature’s horizontal line and the baseline (corresponding to a score of zero) is filled by a solid color. This is useful for features that tend to be very short, which may otherwise be hard to see.

It is possible to choose to display, or turn off the display, of either the horizontal bars or the graph plot or both, for each of multiple tracks viewed simultaneously. Reference a pull-down selector control next to the listing of the track at the bottom of the page, which switches between “Show both graph and horizontal”, “Show both bar graph and horizontal”, “Show only graph”, “Show only bar graph”, “Show only horizontal”, and “Both invisible”. This control allows you to stack graphs from different tracks close to each other, so that you can compare them and see fine differences between them.

It is also possible to shift the plotted range of this graph for each track file viewed. Beside the listing of the track there is also a line saying “graph Y range from [ ] to [ ]” with a “Set” button. Fill in the desired lower and upper Y coordinates of the range, press the “Set” button, and that particular graph will be redisplayed with that setting. Entries may be in integers or decimals. The lower range must be less than the upper range coordinate. Score values that fall outside the range will result in the display of a horizontal line just a little bit outside the graph range, to visually indicate this over- or underflow condition.

In graph mode, the entire track is assigned a color from a predefined set of colors. However, it is possible for the user to choose the color of a track, by adding a new header comment line close to the top of the GFF file, before uploading the file. An example line looks like this:

##color green

Several common color names can be substituted for "green".

6.2  Comparative Genome Browser

The comparative genome browser can be used to examine several replicons (chromosomes or plasmids) simultaneously, side by side. This view facilitates comparison of related organisms to observe similarities and differences in their gene arrangements. For the alignment to work, ortholog links must exist among genes of the organisms to be compared.

The comparative genome browser is usually entered from a page describing a gene. To invoke it, select Align in Multi-Genome Browser from the operations box on the right side of the page. You will first be asked to specify the organisms whose genome regions you wish to compare. The selected set of organisms is remembered for some time by the Web browser. If you wish to change them, use the command Change organisms/databases for comparison operations.

When the comparative genome browser is invoked from a gene page, that gene and its organism orchestrate the rest of the alignment. In the display, the top-most replicon is the reference, against which the comparisons are made by following the ortholog links for every gene of the top replicon in its visible section. The selected gene that is the focus of the comparison is highlighted on each replicon by a thick outline and a slanted hashed background. These selected genes are lined up at the center position of their lengths. The magnified region can be adjusted by the following methods:

  • An alignment for a new gene can be displayed by entering the gene name in the gene entry box, then clicking the “Go” button.

  • The panel of navigation arrows can be used to translate the view left or right, and to zoom in and out.

Genes with solid colors have links to orthologs. Corresponding orthologs are assigned the same color, out of a set of a dozen colors that will be reused repeatedly. Genes for which no ortholog links were found in the PGDB are not colored. The other display features are the same as described for the regular genome browser.

7  SmartTables

A SmartTable is a collection of Pathway Tools objects, such as genes or pathways, together with associated data. SmartTables allow you to store experimental results (e.g., a set of genes of interest from an experimental study), analyze those results (e.g., perform an enrichment analysis to learn if those genes share common biological processes, or paint those genes into a metabolic map diagram), and share SmartTables with colleagues.

SmartTables can be created from tabular data files, and from query results, and SmartTables can be exported to files. Transformations, enrichment analysis, filtering, and other operations on SmartTables can be performed. Example transformations include:

  • Transform a gene SmartTable to a SmartTable of pathways in which the genes participate

  • Transform a SmartTable of genes to a SmartTable of promoters, or transcription binding sites, or transcriptional regulators, that control those genes

  • Transform a SmartTable of pathways to a SmartTable of metabolites that are substrates in the pathway

Web SmartTables are stored in a user’s web account, so to create SmartTables you must have an account and be logged in. Users who aren’t logged in can view and download SmartTables that others have made public. A SmartTable has a persistent URL, so they can be used as a data publishing and sharing platform. SmartTables can be private, public, or shared with a selected SmartTable of users.

Firefox is the recommended browser to use with SmartTables. Other browsers will work but have not been as thoroughly tested with SmartTables and thus minor issues may arise. Use of Internet Explorer is discouraged, but, for the most part, will work as well.

A number of SmartTables operations can also be invoked via web services.

7.1  SmartTable Structure and Display

Some terminology: A SmartTable consists of a set of rows and columns. A cell is the intersection of a row and a column, and can contain one or more values, which may be Pathway Tools objects (such as genes or pathways), numbers, or strings.

A SmartTable is displayed on its own web page (see the figure below). The URL of this page is persistent and may be bookmarked or shared. At the top of this page are some metadata about the SmartTable, such as its title and a textual description (these can both be edited by clicking on them). Information about the SmartTable’s contents and sharing status is also displayed.

In this example, we started with a SmartTable of genes (in the first column after the checkboxes), and added some properties.

Typically the first column of a SmartTable will be a set of related Pathway Tools frames (eg, a set of genes from a search or from an experimental result) and other columns will be properties or other values derived from the first column (eg, the products of the genes in the first column). The blue column headings are clickable and can be used to select individual columns for certain operations. A SmartTable must always have at least one column present in order to be valid.

If a SmartTable has more elements than will fit on a page, paging controls will be displayed above the column headings. All rows can also be dispalyed on one page.

The checkboxes on the left are used to select subsets of the SmartTable’s rows for deleting or copying to a new SmartTable. Note that checkboxes work properly over multiple pages — that is, some rows can be checked, a new page can be navigated to and check some more, and the ones on the first page will still be considerered checked. Checking/unchecking the checkbox in the header will check or uncheck all rows in the SmartTable (not just the ones on the current page). This checkbox behavior also applies to any lists of SmartTables.

7.2  SmartTable Directory

The SmartTable directory page provides a list of accessible SmartTables. It may be accessed via any of the items under the SmartTables menu. The directory is composed of several tabs:

By default the SmartTable directory is ordered by update time (most recently changed first), but it can be resorted using the sort arrows in column headings.

7.3  Creating a SmartTable

There are a number of different ways to create a SmartTable (note that you must be registered and logged-in before these commands will be visible):

7.3.1  From a Search

The results of web searches (e.g., from the Search → Search compounds page) can be converted to a SmartTable by means of the “Turn into a SmartTable” button.

7.3.2  From Scratch

An empty SmartTable can be created and filled in by hand. To do this:

  1. Go to the SmartTables directory page (SmartTables → My SmartTables)

  2. Select the New → Empty SmartTable action from the operations box on the right. This creates a SmartTable with a single column and no rows.

  3. Add a row by clicking the “Add row” link at the bottom of the display.

  4. The row has an autocompleting text field. Enter an object name (e.g., a gene or metabolite name) and hit Enter.

  5. Repeat steps 3 and 4 for the rest of the SmartTable.

7.3.3  Via Tab-Separated File Import

A SmartTable can be created by importing a text file in tab-separated value format.

  1. Go to the SmartTable directory page.

  2. Select the New → SmartTable from Uploaded File... action from the operations box on the right.

  3. A panel will appear that will prompt for a file to be seleced and uploaded.

Unless “Try to make objects” is selected in the upload menu, values in uploaded files are initially just strings. To turn them into recognized database objects (e.g., genes) after importing, select the appropriate column and use the Column → Set Type... action.

7.3.4  Via Replicon Coordinates File Import

A SmartTable can also be created by importing a text file that specifies the coordinates of replicons in a tab-separated file format. The format is as follows:

  • Column 1: replicon name (as listed in organism summary) – defaults to first replicon stored in PGDB, invalid/blank value uses default

  • Column 2: start position

  • Column 3 (optional): end position — defaults to start

  • Column 4 (optional): name for the region

Replicons can be specified in the file by either frame name or common name. Nucleotide coordinates for the start and end positions are relative to the replicon specified. If only either a start or end position is given, it is defined as a single nucleotide region. Any invalid data may result in a row containing “NIL” and the row may have other unexpected results.

The resulting SmartTable will contain either one or two columns - the first column will be of regions specified and the second will be optional name labels, if supplied.

To perform an import via a file of replicon coordinates, do the following:

  1. Go to the SmartTable directory page.

  2. Select the New → SmartTable from Replicon Coordinates... action from the operations box on the right.

  3. A panel will appear that will prompt for a file to be specified and uploaded.

7.3.5  From an Existing SmartTable

There are a number of ways to create new SmartTables from existing SmartTables.

A SmartTable can be copied via the New → Copy of this SmartTable action. Additionally, if the SmartTable can only be viewed but not edited, such as “Special SmartTables”, a message will appear prompting the user to create a writeable copy of the SmartTable.

A column of a SmartTable can be used and have its contents turned into a new SmartTable, using the + icon that appears in column headings, or using the New → SmartTable from Column action (these are equivalent operations).

Rows of a SmartTable can be used to create a new SmartTable that shares the same column headings by selecting the desired rows using the checkboxes at the beginning of each row, then using the New → SmartTable from Selected Rows action.

See also the Filtering operation which has the option of creating a new SmartTable based on a filtered subset of rows.

7.4  Manipulating SmartTable Contents

SmartTables can be manipulated in a large number of ways, both at a fine level of granularity (such as editing individual cells), and by applying transformations to an entire SmartTable.

7.4.1  Adding a Property Column

Property columns show attributes (slot values) of an object, such as the molecular weight of a compound or the pI of a protein. The most common situation is to add a property column for the objects listed in the first column of the SmartTable, but the Add Property Column dropdown menu will list available properties to show for the currently selected column. Frequently used properties include Common-Name, Comment, Citations, and Creation-Date. The ability to create a property column or an enrichment column from another property column may not be available.

7.4.2  Adding an Empty Column

Columns can be added to a SmartTable from the Add → Column action (which creates an empty editable column), or by using the transform and property selectors (see below).

7.4.3  Editing a Column

Editable columns (which are those that are not defined by a transform or other computation) can be edited by clicking the edit icon in the column header. This changes the cells to editable fields. Clicking the icon a second time will turn off editing for that column.

7.4.4  Adding a Row

A row can be added by means of the link at the bottom of a SmartTable, or using the Add → Row action (they are equivalent). Any editable cells in the new row are displayed in edit mode, so values can be entered.

Additionally, certain object pages, such as those for a gene or protein, have an “Add to SmartTable” button, which places the object in an existing SmartTable.

7.4.5  Deleting Rows

Rows can be deleted by selecting them using the checkboxes on the left of the display, then choosing the Delete → Delete checked rows action.

7.4.6  Moving and Deleting Columns

Columns can be rearranged with the Column → Move ... menu items. They can be deleted either with the Columns → Delete menu item. These operations apply to the selected column. A column can also be deleted by clicking on the “–” icon in the column header. This icon will not be present if deleting the column is not currently a valid action, such as when the SmartTable has only one column.

7.4.7  Sorting

SmartTables can be resorted on the values of any column by means of the sorting controls (triangles) in column headers.

7.4.8  Filtering

Filtering means selecting a subset of rows from a SmartTable according to some criterion. The filter menu context may differ between column types. For example, numeric columns will be given options to specify a range value condition, such as greater than, equal to, less than, and so on. Likewise, string columns have options to filter based on various substring conditions. To filter, select the appropriate column and choose the Filter action. A dialog appears that allows for selection based on the filtering criterion.

The filter can either modify the SmartTable in place or create a new SmartTable with a specified name. In either case, if the resulting SmartTable is empty, an error is displayed instead of completing the operation.

7.4.9  Set Type

The values in cells have a type, which may be either a Pathway Tools object (e.g., a gene) or a string or number. Generally values in a single column will all be of the same type, but this is not required. The type can be controlled by means of the Column → Set Type... action. In general this is used after importing data from a file, to turn string values into Pathway Tools objects.

7.4.10  Set Operations

Under the Set Operations... action, various set operations based on set theory, such as union, intersection, and difference, can be performed between the current SmartTable and a second SmartTable. A new SmartTable can be created or the current SmartTable can be modified in-place. For example, these operations can compute the intersection (items common to both) of two SmartTables.

7.5  SmartTable Transformations

Transformations allow for a function (a computational procedure) to be applied to all cells within a selected column in order to generate a new column in that SmartTable. To perform a transformation, select a column, then click on the Transformations drop-down menu. Depending on the type of the column selected, different transformations will be available. Overall the difference between properties and transformations is that properties of an object are stored in the database containing that object, whereas transformations are computed by the software.

In this example, a column of compounds is selected, therefore the transformations available in the menu are those that apply to chemical compounds. The “Pathways of compound” transformation will generate a new column where each cell in the new column contains the set of metabolic pathways in which the compound in the starting cell occurs. The available transformations are generally listed in SmartTables of related function, such as transformations that generate different sets of metabolic pathways related to a compound.

Imagine that we want to create a new SmartTable consisting of all pathways that the preceding SmartTable of metabolites are in, that is, to create a new SmartTable consisting of the result of the preceding transformation. We can do so by clicking the “+” at the top of the column containing the pathways. That operation will create a new SmartTable containing all pathways in the preceding list, with duplicates removed.

The easiest way to see what transformations are available for a column type in question is to view a SmartTable containing that type of column and examine the transformations drop-down menu.

Other example transformations include computing an amino-acid sequence from a nucleotide sequence, transforming a gene to its gene product, and transforming a gene to a list of the genes that regulate that gene.

7.5.1  Enrichment Analysis of SmartTables

Enrichment analysis is a computational technique for identifying known categories of objects (e.g., pathways) that are statistically over-represented in a set of objects (e.g., genes that are significantly up-regulated in an expression experiment). For example, enrichment analysis allows us to ask whether a set of genes contains more genes regulated by a given transcriptional regulator than one would expect to occur by chance, or more metabolites in a given metabolic pathway than one would expect to occur by chance. Please see the Pathway Tools Users Manual for more information on enrichment, including a description of the parameters available on the web.

Enrichment analysis can be invoked on a SmartTable of objects in a SmartTable by:

  1. Selecting the column to be operated on (such as a column of genes or a column of compounds)

  2. Choosing an item from the Enrichments selector and clicking the button

  3. Choosing parameters from the dialog

This operation always creates a new SmartTable, which contains three columns: the enriched objects, the p-value, and the matched objects from the original SmartTable. The new SmartTable will be sorted by p-value, lowest (most significant matches) first.

7.6  Exporting and Sharing a SmartTable

Once a SmartTable is defined, there are a few things that can be done with it (other than browse it on the web). The SmartTable can be exported in a variety of ways or shared with others.

7.6.1  Export to a Spreadsheet File

SmartTables can be exported to tab-separated value format files using the SmartTables → Export → to Spreadsheet File ... menu command. When selected, the option is given whether to export the frame names of objects stored in the SmartTable or to use the common name of the objects. Keep in mind that, generally, it’s easier to re-import data by using frame names in the generated file, but the file will also be more difficult to read.

7.6.2  Export to a FASTA File

SmartTables with a gene column can be exported to FASTA format files using the Export → to FASTA File... action. The sequences used will be the currently selected column and the names used will be a string representation of the values in the first column.

7.6.3  Export to PortEco

A genes column selected in a SmartTable can be exported to PortEco using the Export → Export Genes to PortEco Cluster My Genes command. From there, the genes are given as input to PortEco’s gene expression clustering tool. Visit PortEco’s Gene Expression web site for additional information.

7.6.4  Paint Data (on Cellular Overview)

Objects of the appropriate types (any types that have frame representations in the current PGDB, such as compounds, reactions, or genes) can be displayed over the cellular overview using the Paint Data → On Cellular Overvew command. Be sure to select the appropriate column first.

If the first column of the SmartTable contains objects (e.g. genes, compounds), and one or more other columns contain numerical data values, then the SmartTable can be displayed on the Cellular Overview Omics Viewer using the command Paint Data → On Cellular Overvew Omics Viewer. You will be asked to select the data columns you wish to display, and to specify what kinds of values they are (e.g. absolute or relative, log or linear). Another way to paint data from a SmartTable on the Cellular or Regulatory Overview is to navigate to the desired overview and use the command Overlay Experimental Data → From SmartTable.

7.6.5  Sharing a SmartTable

By default, SmartTables are readable and writeable only by their creator. Access can be granted to other users by means of the Sharing dialog, available via the Sharing... command.

Access by the general public is controlled by the first two checkboxes. “Public?” means that anyone can view the contents of the SmartTable; “Public and writable?” means that anyone can view and edit the contents of the SmartTable (editing is restricted to logged-in users).

Access can also be controlled on a per-user level using the “Share with users” boxes, which accept email addresses of registered Pathway Tools users.

7.7  Browsing SmartTables and Users

7.7.1  User Pages and Directory

As part of SmartTables, an enhanced public user page has been created, which can be accessed by clicking on any user name in the SmartTable directory (try the Public SmartTables tab). A user page displays the user’s name, an optional user-settable graphic picture, and a list of the user’s public SmartTables. There is also a user directory available.

7.7.2  Browsing a SmartTable

Under the Browse this SmartTable command, the current SmartTable can be browsed one row at a time. Depending on the type of data in the SmartTable, various text and image elements will be displayed in a single page for a row. In the upper-left corner of the page, a grey box will be shown that displays the name of the SmartTable being browsed as well as a Next link to move to the next row’s page. The Clear link can be used to stop browsing and stay in the current page.

8  Omics Data Analysis

Pathway Tools based Web sites offer multiple tools for analysis of gene expression, metabolomics, and other large-scale datasets.

Omics data file format is described in Section 9.3.1.

A number of these capabilities are also available as web services.

Multi-Omics Analysis

The following tools can be used for analysis of combined datasets from multiple high-throughput technologies.

  • Paint multi-omics data onto metabolic map — Colors reaction arrows in the metabolic-map diagram with colors indicating gene-expression and/or protein-expression levels; color metabolite nodes in the diagram with colors indicating metabolomics data. Data can be uploaded from a file or imported from a recently visited SmartTable. The uploaded data can contain a mixture of rows describing genes, proteins, and metabolites.
    [documentation]
    [To start: Metabolism → Cellular Overview then Right Operations Menu → Overlay Experimental Data]
    When uploading a file that contains multiple types of data, be sure to specify that the items in the first column can be any of genes, proteins compounds, etc.

  • Paint multi-omics data onto pathway diagram — Allows visualization of large-scale datasets on individual pathways.
    [file format documentation]
    [To start: Visit a pathway page, then select Right Operations Menu → Customize or Overlay Omics Data on Pathway Diagram]
    In the pop-up window, in addition to customizing which pathway elements appear in the diagram, you may specify a file of Omics data to be displayed. If the file contains multiple types of data, be sure to specify that the items in the first column can be any of genes, proteins compounds, etc.

Gene Expression and Proteomics Analysis

Many of the following tools can accept proteomics as well as gene-expression data.

  • Paint gene-expression data onto metabolic map — Colors reaction arrows in the metabolic-map diagram with colors indicating gene-expression and/or protein-expression levels. Data can be uploaded from a file, imported from PortEco or GEO, or imported from a recently visited SmartTable.
    [documentation]
    [To start: Metabolism → Cellular Overview then Right Operations Menu → Overlay Experimental Data]

  • Table of Highly Over/Under-Expressed Pathways — When painting a dataset onto the metabolic map, the upload dialogue offers the option of generating a table of those pathways with one or more genes whose data value exceeds a user-specified threshold. If multiple data columns are specified, then the table will include all pathways that exceed the threshold for at least one of the data columns (the table will include a separate column for each data column).
    [To start: Use previous tool but for the Show data: field, select either As a table of pathway diagrams or Both on this diagram and as a table in a new tab and specify the desired threshold.]

  • Paint gene-expression data onto single pathway diagram.
    [file format documentation]
    [To start: Visit a pathway page, then select Right Operations Menu → Customize or Overlay Omics Data on Pathway Diagram]
    In the pop-up window, in addition to customizing which pathway elements appear in the diagram, you may specify a file of Omics data to be displayed.

  • Paint gene-expression data onto regulatory map — Colors genes in the regulatory overview diagram with colors indicating gene-expression levels. Data can be uploaded from a file, imported from PortEco or GEO, or imported from a recently visited SmartTable.
    [documentation]
    [To start: Genome → Regulatory Overview then Right Operations Menu → Overlay Experimental Data]

  • Paint gene-expression data onto genome map diagram — Colors genes in the genome map with colors indicating gene-expression levels. This tool is not yet available for Web sites, but does function in the desktop version of Pathway Tools.

  • Enrichment Analysis — Given a SmartTable of genes, determines whether that gene set is statistically over-represented for genes within certain metabolic pathways, or for genes in certain Gene Ontology categories, or for genes that are regulated by shared regulators.
    [documentation]
    [To start: Visit a SmartTable page]

  • SmartTable Transformations — Given a SmartTable of genes or proteins (e.g., the highly expressed genes from an expression dataset), transform those genes to the set of pathways containing the genes, or to the set of regulators that regulate those genes.
    [documentation]
    [To start: Visit a SmartTable page]

  • Genome Browser Tracks — Superimpose positional datasets such as ChIP-chip or RNA seq data on genome regions for visual interpretation.
    [documentation]
    [To start: Genome → Genome Browser then, if not following this direct link, click Show Tracks button.]

Metabolomics Analysis

  • Monoisotopic mass search — Enables searching of multiple monoisotopic masses against all metabolites in the selected PGDB.
    To start: Search → Search Compounds.

  • Paint metabolomics data onto metabolic map — Colors metabolite nodes in the metabolic-map diagram with colors indicating observed metabolite levels. Data can be uploaded from a file, or imported from a recently visited SmartTable.
    [documentation]
    To start: Metabolism → Cellular Overview then Right Operations Menu → Overlay Experimental Data
    When uploading a file, be sure to specify that the items in the first column are compound names and/or identifiers.

  • Paint metabolomics data onto single pathway diagram.
    [file format documentation]
    To start: Visit a pathway page, then select Right Operations Menu → Customize or Overlay Omics Data on Pathway Diagram
    In the pop-up window, in addition to customizing which pathway elements appear in the diagram, you may specify a file of metabolomics data to be displayed. Be sure to specify that the items in the first column are compound names and/or identifiers.

  • Metabolite Enrichment Analysis — Given a set of metabolites, determines whether that metabolite set is statistically over-represented for metabolites within certain metabolic pathways.
    [documentation]
    To start: Visit a SmartTable page.

  • SmartTable Transformations — Given a SmartTable of metabolites (e.g., the highly expressed metabolites from a metabolomics dataset), transform those metabolites to the set of pathways containing the metabolites, or to the set of reactions containing those metabolites.
    [documentation]
    To start: Visit a SmartTable page.

Omics Pop-Ups for Cellular Overview

The Cellular Overview enables the user to drill-down to see the data available for specific genes or metabolites. First, mouse over a reaction or metabolite in the Cellular Overview and lock the resulting tooltip in place to create a caption window.

Omics Pop-Ups enable users to see bar charts, X–Y plots, or heat maps of omics data for single genes or metabolites, or for all genes or metabolites within a pathway. The pop-ups can be customized for a publication or to otherwise make them more legible. To view an omics pop-up for single genes or metabolites, first examine the associated caption. The caption pop-up will include an “Omics” button, if there is omics data associated with the selected node. Selecting the “Omics” button transforms the pop-up into a graphic display of the data. Right-click on a reaction node in a pathway for which there is omics data to expose a menu including the item “Display Omics Data for Every Node in Pathway: <pathway name>”. The graphics will include the moics data for every gene or metabolite in the pathway to which this reaction belongs.

9  Cellular Overview (Metabolic Map Diagram)

The Cellular Overview diagram depicts the biochemical machinery of an organism as described in a PGDB. Each node in the diagram (such as the small circles and triangles) represents a single metabolite, and each blue line represents a single bioreaction. This page describes the organization of the Cellular Overview and the operations users can perform to interrogate it. Different PGDBs will have different components of the diagram present or absent depending on what was included by the PGDB authors.

Note: The Cellular Overview has been tested on Internet Explorer 8.0, Firefox 3.5, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the Cellular Overview since its performance can be very poor. The performance of the three other browsers are much better compared to Internet Explorer.

Note: The desktop version of Pathway Tools that you can install locally provides different and additional operations on the Web Overview. Click here for more details.

Organization of the Cellular Overview: Within the cytoplasmic membrane, the small-molecule metabolism of the organism is depicted in several regions. The glycolysis and the TCA cycle pathways, if present, will be placed in the middle of the diagram to separate predominately catabolic pathways on the right from pathways of anabolism and intermediary metabolism on the left. The existence of anaplerotic pathways prevents rigid classification. The majority of pathways operate in the downward direction. Signal transduction pathways, if present, run along the bottom of the diagram. Pathways are grouped into related clusters as indicated by the shaded regions.

The large group of individual reactions at the right of the diagram represent reactions of small-molecule metabolism that have not been assigned to any pathway.

The shapes of the metabolite icons represent various compound classes. The different shapes used are as follows:

  • Triangle: Amino Acids

  • Square: Carbohydrates and Derivatives

  • Diamond: Proteins and Modified Proteins

  • Vertical Ellipse: Purines

  • Horizontal Ellipse: Pyrimidines

  • T: tRNAs

  • Circle: All other compounds

  • Filled shape: Phosphorylated compound

The one or more cellular membranes of the organism are depicted, depending on the cellular architecture of the organism, and on whether that architecture was specified when the PGDB was created. Transporters will be depicted in the membrane in which they reside as blue lines whose arrowhead indicates the direction of transport. For gram-negative bacteria, periplasmic proteins will be depicted when identified in the PGDB.

Getting Started: The Cellular Overview is accessible from the command Metabolism → Cellular Overview. The current selected organism, as displayed on the right in the banner of the Web page, is used to generate the Cellular Overview diagram. The generation of the diagram can take some time if it was not previously generated by the Web server.

Once the Cellular Overview diagram is displayed, the most common operation is to move it left, right, up or down, since sometimes the entire overview cannot fit in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the widget located on the left top of the screen.

To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the Cellular Overview. By increasing the zoom level (i.e., going up in the ladder), names of compounds, enzymes, reactions, and pathways are eventually displayed.

Note that depending on the speed of the server, generating large Cellular Overviews (i.e., a zoom-in near the top of the ladder) might require some time.

Mousing over a Cellular Overview icon (e.g., a ‘tee’ icon for a tRNA) displays information about the object in a small tooltip popup. Click the ‘Keep Open’ button to keep that informational window open; drag the window by its title to re-position it.

Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys can be customized on your Mac via the system preferences panel.

All the commands for the Cellular Overview are available from the right-clicking menu or the operations box on the right side of the page.

The Cellular Overview can display your experimental data — see Section 9.3.

MetaCyc, which is a multi-organism database, has no cellular diagram.

9.1  Summary of Commands

9.1.1  Summary of Mouse Commands

  • Left-Click on a object open a tooltip (i.e., small window) to display basic data about the object. The tooltip contains further Web links to display more data about the object or objects related to the clicked object.

  • Double-Left-Click in a blank area location does a zoom-in centered at that location.

  • Left-Click (and holding) in a blank area allows to pan (i.e., move) the entire Cellular Overview left, right, up and down. You need to hold down the mouse button to do the panning.

  • Right-Click in a blank area opens a menu to invoke general commands applicable to the entire Cellular Overview. These commands are also available in the top menu bar under the menu ‘Cellular Overview’. All searching (highlighting) commands are under these menus. See the following list for an explanation of the general commands.

9.1.2  Summary of Menu Commands

The commands in the Cellular Overview menu are:

  • Overlay Experimental Data (Omics Viewer) provides several options for uploading experimental data (such as gene expression or metabolomics data) to overlay, as colors, on the cellular diagram.

  • Highlight Pathway(s) provides two searching mechanisms for pathways in the cellular diagram: by name or frame ID, or by a substring search. The substring search is based on the name, synonyms, and frame ID of the pathways. Highlighting is done on the reaction(s) of the pathway(s) found.

  • Highlight Reaction(s) provides four searching mechanisms for reactions in the cellular diagram: by name or frame ID, by substring, by EC number, or by enzyme name. The substring search is based on the name, synonyms, and frame ID of the reactions. Highlighting is done on the reaction(s) found.

  • Highlight Gene(s) provides three searching mechanisms for genes in the cellular diagram: by name or frame ID, by substring, or from a file. The substring search is based on the name, synonyms, and frame ID of the genes. The searching based on a file uses the gene names provided in a file located on your computer. Highlighting is done on the reactions and proteins corresponding to the gene(s) found.

  • Highlight Enzyme(s) provides two searching mechanisms for enzymes in the cellular diagram: by name or frame ID, or by a substring search. The substring search is based on the name, synonyms, and frame ID of the enzymes. This substring search is identical to the reaction search based on enzymes. Highlighting is done on the reactions and proteins corresponding to the enzyme(s) found.

  • Highlight Compound(s) provides two searching mechanisms for compounds in the cellular diagram: by name or frame ID, or by a substring search. The substring search is based on the name, synonyms, and frame ID of the compounds. Highlighting is done on the compound(s) found.

  • Clear All Highlighting removes all the highlighting from the cellular diagram.

  • Show Legend opens a small window to show a legend of the icons used in the cellular diagram.

  • Help opens a new Web page to present a documentation on the Cellular Overview.

The following sections describe in more detail these operations and some others.

9.2  Searching and Highlighting

In this document, ‘Searching’ and ‘Highlighting’ are synonymous terms. There are several commands to search for reactions, pathways, enzymes, genes, and compounds. The search commands are available from the right-click menu and the the Cellular Overview menu from the top menu bar.

When a search is done, the objects found are highlighted in the Cellular Overview diagram which also creates a new overlay. The list of overlays is shown in the Layer Switcher panel on the right of the Overview Web page. This panel might be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. You cannot delete an individual overlay. But all highlighting, i.e., all overlays, can be removed by using the command Clear All Highlighting.

Since each overlay corresponds to a search operation, an overlay is identified with the keyword you entered to do the search. This is the name of the overlay. Next to each name a button labeled ‘List.’ Clicking ‘List’ opens a small dialog window listing the objects found for the corresponding search. Each object name is a hyperlink—clicking any of these links centers the Overview on the corresponding object and a red marker emphasizes its location.

Highlighting operations can also be applied via web services.

9.3  Cellular Omics Viewer — Overlay Experimental Data

The Pathway Tools Omics Viewer uses the Cellular Overview for an organism to visualize data from high-throughput experiments in a global metabolic pathway context. The input to the Cellular Omics Viewer is a set of gene, protein, and/or reaction names or identifiers, and data values for each gene, protein, and reaction. The Omics Viewer generates a new version of the Cellular Overview in which the reaction steps identified by the input genes, proteins, and reactions are colored according to the provided data values. For example, for a gene expression experiment, the software identifies the reactions catalyzed by the product of each supplied gene, and colors that reaction with a color value computed from the data point provided for each gene. The data values in the provided dataset are mapped to a spectrum of colors. Similarly, for metabolomics experiments, compound nodes in the Cellular Overview are colored according to the data values for the specified compounds. This facility enables the user to see which pathways are active or inactive under some set of experimental conditions.

The Omics Viewer can be used for:

  • Microarray Expression Data: Reaction lines (and protein icons, where present) are color-coded according to the relative or absolute expression level of the gene that codes for the enzyme that catalyzes that reaction step. The Omics Viewer allows a scientist to interpret the results of gene-expression experiments in a pathway context.

  • Proteomics Data: Reaction lines (and protein icons, where present) are color-coded according to the concentration of the enzyme that catalyzes that reaction step.

  • Metabolomics Data: Compound icons are color-coded according to the concentration of the compound.

  • Reaction Flux Data: Reaction lines are color-coded according to reaction flux values.

  • Other Experimental Data: Any experiment, high-throughput or otherwise, in which data values are assigned to genes, proteins, reactions or metabolites can be viewed in a pathway context using the Cellular Omics Viewer.

The Regulatory Overview also has an omics viewer, but it can display gene data only.

The Cellular Omics Viewer can show absolute data values (such as the concentration of a metabolite or protein, or the absolute expression level of a gene), or it can be used to compare two sets of experimental data by computing a ratio and mapping the ratios onto a color spectrum.

The superposition of multiple sets of experimental data on the Celllular Overview can also be animated to show, for example, how gene expression levels of enzymes change with time over the course of an experiment.

The Cellular Omics Viewer can also be invoked via web services.

9.3.1  Example Omics Data Files

Single gene expression experiment: Sample datafile and brief description See Cellular Overview for this data using ratio of columns 11 and 12.
Time series gene expression animation: Sample datafile and brief description See Cellular Overview for this data using columns 6 to 9.

9.3.2  Getting Started with Omics Data Display

The commands under Overlay Experimental Data (Omics Viewer), available from the right-click menu and the right side operations box, overlays experimental data over the Cellular Overview diagram.

Once the Overlay Experimental Data command is invoked, a window will open, called the Omics Form, where you can specify a data file to upload and various parameters to control the interpretation of the data. The parameters are documented in the window but more details follow on the file format and the parameters to specify.

9.3.3  Omics Dataset File Format

Experimental data is imported from a file provided by the user that is stored on the user’s computer. Each line of the file contains data for a single gene, protein, reaction or metabolite, and is of the form:

<names‑or‑IDs> <other‑columns> <data‑column1>...<data‑columnN>

Columns are separated by the tab character. Lines that start with # or ; are taken to be comment lines and are ignored by the program. The first column is called column 0, the second column is called column 1, etc. The program pays attention to column 0 and to the columns you tell it contain your data; the other columns are ignored.

Short examples (see 9.3.1 for full example files):

<verbatim> # In this file the data columns are columns 2-4. # # The first two lines specify genes. trpA        tryptophan synthetase        3.2        3.8        4.3        This line identifies the gene by a gene name # This next line identifies the gene by an accession number that is # listed on the EcoCyc gene page, hence we can be sure that EcoCyc # will recognize it. b0383        alkaline phosphatase        1.1        4.2        2.9         # # The next two lines specify metabolites. # TRP        L-tryptophan        6.3        2.3        4.3        Column 0 specifies the EcoCyc ID for this metabolite # This next line specifies spermidine by its name and KEGG ID and PubChem ID spermidineKEGG:C00315PubChem:6992097        spermidine        1.1        2.8        5.1 </verbatim>

<names‑or‑IDs> can be a list of one or more of the following fields separated by the “$” character. These alternatives give you multiple ways to identify a gene, protein, metabolite, or reaction. <ul> <li> A name for the object that is known to BioCyc (each BioCyc object typically includes extensive synonym lists; the software tries to match a name to the appropriate target).

<li> BioCyc IDs. Gene IDs from sequencing projects (such as the E. coli B-numbers) are generally acceptable and unambiguous. For protein or reaction data, EC numbers may be used. BioCyc pages (e.g., gene pages, metabolite pages) typically list the ID for the object toward the top of the page, and in the URL field of the page. Please verify that the IDs you are using are known to BioCyc.

<li> IDs in external databases. Many BioCyc DBs contain links to external databases such as UniProt and PubChem; the identifers in those links can be used in column 0 if prefixed by the name of the database, e.g., “UniProt:P00634.” </ul>

The numbers in the data columns can represent either absolute or relative (e.g., ratios or log ratios) values. If the data values represent absolute numbers, you may choose to visualize either a single column of absolute data values (select “Absolute” and one data column), or the ratio of two data columns as relative data values (select “Relative” and two data columns). If the data values themselves represent relative numbers, then you need supply only a single column number, and select “Relative.” An entry (a row of data for a gene or other object) may contain any number of data columns (for example, if you want to compile measurements from several experiments or time points into a single file), but only those data columns specified will be visualized at a time — all other columns will be ignored.

9.3.4  Color Scale

The color scale used depends on the type and, by default, the range of the data. Thus, a particular color may correspond to one gene expression level for one dataset, and a different gene expression level for another dataset, depending on the range of values or the supplied maximum cutoff value for each dataset. We use the spectrum from yellow/green to red, with yellow representing the lowest expression levels or ratios in the dataset, blue representing values in the middle, and red representing the highest values. Reactions for which no data was provided are drawn in black. The legend for mapping colors to data values is shown in the key, which is drawn to the right of the overview for a single experiment, or to the left for an animation.

A maximum cutoff value is chosen. By default, this is computed from the data. Alternatively, the user may supply a maximum cutoff value to use. Supplying the same maximum cutoff value for multiple experiments ensures that the same color scale is used for each one, so that the displays are directly comparable.

The minimum cutoff value is determined based on the maximum cutoff value and the other parameters. For absolute data values, we use a minimum cutoff value of zero. For relative data values that are not logs, we use the inverse of the maximum cutoff. For relative data values that are logs, we use the negative of the maximum cutoff. The color spectrum is then mapped evenly along a log scale between the maximum cutoff and the minimum cutoff.

In many cases, several genes or proteins, each with their own expression level or concentration, will map to a single reaction. This is because the reaction might be catalyzed by an enzyme complex made up of several gene products, or the reaction might be catalyzed by several isozymes, each with its own gene or genes. Since a reaction can only be colored a single color, we must choose which data value to use. For absolute data values, we choose the maximum. For relative data values, we choose the value whose log has the greatest deviation from zero, under the assumption that the user is primarily interested in identifying the entities whose behavior differ most between the two datasets.

9.3.5  Omics Viewer Results

Once the form to upload the data is submitted, by clicking the Submit button at the bottom of the Omics Form, the data are processed by the Web server. The time to process the file depends on the speed of the server and the amount of data in the file. The results are returned to your browser in the form of highlighted objects (e.g., reactions). If several data experiments are loaded from the same file (i.e., several data columns are provided from the uploaded file), an animation is created where each step of the animation corresponds to one experiment (i.e., one column).

A small dialog window is opened to display the color scale for the experiment(s) and buttons to control the animation, if any. You can pause, restart, go forward or backward, increase or decrease the animation speed from this window.

Overlaying exprimental data can be done at any zoom level. Once the data are uploaded and overlayed, zooming out or in can be done, and the corresponding highlighting will be adjusted accordingly.

The tooltips for highlighted objects show the experimental data. The data displayed changes during an animation.

10  Regulatory Overview

The Regulatory Overview enables you to visually analyze the regulatory relationships between genes for a specific organism. These relationships are based on the regulatory data available in the database (i.e., PGDB) of the organism. Currently, the relationships are based on transcriptional regulatory data (future versions may cover other types of regulation). Note: The Regulatory Overview has been tested on Internet Explorer 7.0, Firefox 3.3, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the Regulatory Overview since its performance can be very slow when manipulating a large number (more than 100) of highlighted genes. The performance of the three other browsers are much better compared to Internet Explorer.

The Regulatory Overview is represented as a network with nodes and arrows (i.e., arcs). Each node represents a gene of a specific organism. There is an arrow from gene A to gene B if and only if A regulates B.

When first displayed, the overview does not show any regulatory arrow relationships since, typically, their great number would clutter the overview. These arrows can be selectively added by using the highlighting commands. See the sections below for more information on highlighting commands.

Not all organisms have regulatory data in their PGDB. If the command Genome → Regulatory Overview is grayed out, no Regulatory Overview can be displayed for the selected organism. Otherwise, by selecting the command Genome → Regulatory Overview a Regulatory Overview Web page will open and the complete Regulatory Overview of the selected organism will be displayed. The operations box on the right has several commands specifically for the Regulatory Overview.

It is possible to display a regulatory subnetwork of a specific organism by doing a series of highlighting and then use the command Redisplay Highlighted Genes Only. This command will create a new, smaller layout of the regulatory network that contains the genes that are highlighted only. Genes that do not regulate, or are not regulated by any highlighted genes, are not included in the subnetwork. Further operations can be done on this subnetwork as for the complete overview. See the Section Redisplay Highlighted Genes Only below for more details.

The most common operation is to move the Regulatory Overview left, right, up or down, since sometimes the entire network cannot fit entirely in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the graphic at the top left of the screen called the panning widget.

To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the regulatory network. By increasing the zoom level (i.e., going up in the ladder), the gene names might overlap the network nodes— increasing the zoom level should remove such overlaps. The last zoom level (i.e., the last step of the ladder) will always force the display of all gene names in the network.

Note that depending on the speed of the server, generating large regulatory network overviews (i.e., a zoom-in near the top of the ladder) may require some time. They might have been already generated or they might need to be generated by the server. Accordingly, the response time might vary.

Mousing over a gene node displays a tooltip with data about the genes, its product, the possible ligand, the direct regulatees and regulators. Left-clicking the gene node will open a new Web page containing even more data specific for the gene.

Other more complex visual commands can be reached by right-clicking on genes or in a blank area. This is discussed in detail in the following sections.

Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys to use may be customized on your Mac via the preferences panel.

Organism Selection: Selecting a new organism through the organism selector does not immediately change the Regulatory Overview to this organism. The next operation such as zoom-in or zoom-out will apply to the new selected organism. At any moment you can display the complete regulatory overview of the selected organism by selecting the command Display Complete Regulatory Overview under the right-clicking menu in a blank area or from the right operations box Redisplay Complete Regulatory Overview.

10.1  Summary of Commands

10.1.1  Mouse Commands

  • Left-Click on a gene node opens a new browser window with information about the gene.

  • Left-Click (and holding) in a blank area allows to pan (i.e., move) the entire regulatory network left, right, up and down. You need to hold down the mouse button to do the panning.

  • Right-Click on a gene node opens a menu to select a command to apply for this gene. The commands highlight the direct and/or indirect regulatees and/or regulators for this gene and show highlighted arcs between regulatees and regulators.

  • Right-Click in a blank area opens a menu to select general command applicable to the entire regulatory network. These commands are also available in the top menu bar under the menu ‘Regulatory Overview’.

  • Double-Left-Click in a blank area does a zoom-in operation.

The following sections describe in more details these operations and some others.

10.1.2  Layout Selection

For any organism, there are two layouts available: nested ellipses or top to bottom.

The layout nested ellipses uses up to three ellipses to display the gene nodes. The inner most ellipse contains, in alphabetical order of the gene names, the genes that have the largest number of regulatees. The middle ellipse contains genes that regulate at least one gene. The outer ellipse contains the genes that have no regulatees. They might be displayed as groups of genes regulated by the same set of genes (a multi-regulon). This is typically done using triangles or a short straight line if the group is small.

The layout top to bottom uses several straight rows to display the gene nodes. Each row contains genes that do not directly regulate each other. The top row contains the genes that regulate the largest number of genes. The bottom row contains genes that do not regulate any genes. In between rows contain genes that regulate some other genes. As for the nested ellipses layout, this row might have genes grouped in straight lines or triangles.

10.1.3  Highlighting Genes and Regulatory Relationship Arrows

There are several commands to highlight genes and show the regulatory relationship arrows between them.

Two commands use the gene name, or a substring of gene names, or a gene frame-id. Both of these commands are available by right-clicking in a blank area, or from the top menu bar under Regulatory Overview. The command Highlight Gene By Name or Frame ID highlights at most one gene. It is essentially a search command since you might not know the location of that gene in the regulatory network. Once found, the regulatory network will be centered on the location of the gene. The command Highlight Genes By Substring may highlight several genes. Selecting the command opens a panel from which you can enter a string of characters. Once clicking the button labeled Highlight in the panel, the genes highlighted have a name that contains the given string (this is a case-insensitive search). For this command it is also possible to include the regulatory relationships between the genes found.

The command HighlightGenesByGeneOntologyTerms accessible from the right-clicking menu enables you to select one or more Gene Ontology (GO) terms. The genes that produce proteins annotated with the selected GO terms will be highlighted. The option Include Relationships Arrows enables you to add relationship arrows between the highlighted genes. Note that if you are displaying a subnetwork, there might be genes with such products in the organism but that these might not be in the subnetwork. In such a case, a warning is given that no genes have been highlighted.

Right-clicking on a gene will open a menu of highlighting commands specific to that gene. The menu may contain from one to seven commands. Since some genes do not have any regulators or/and any regulatees, this list of commands may vary from gene to gene. Here are the list of all possible commands available from this menu where name will be the gene name (e.g., trpA) on which the right-clicking was done. The highlighting is done with one a specific color but that color changes from one executed highlighting command to the next.

  • Highlight Gene name Highlights only the gene selected.

  • Highlight Gene name and its Direct Regulatees The gene selected and all its direct regulatees are highlighted and relationship arrows are displayed from the selected gene to its regulatees.

  • Highlight Gene name and its Direct Regulators The gene selected and all its direct regulators are highlighted and relationship arrows are displayed from the regulator genes to the selected gene.

  • Highlight Gene name and its Direct Regulatees and Regulators This command combines the two previous commands.

  • Highlight Gene name and its Direct and Indirect Regulatees The selected gene and all its direct regulatees and indirect regulatees are highlighted and relationship arrows are displayed from regulators to regulatees.

  • Highlight Gene name and its Direct and Indirect Regulators The selected gene and all its direct regulators and indirect regulators are highlighted and relationship arrows are displayed from regulators to regulatees.

  • Highlight Gene name and its Direct and Indirect Regulatees and Regulators This command combines the two previous commands.

When a highlighting operation is done, a new overlay is created. The list of overlays is shown in the Layer Switcher panel on the right of the overview Web page. This panel may be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. This is particularly useful if you use the command Redisplay Highlighted Genes Only.

All highlighting can be removed by using the command Clear All Highlighting.

For more information about highlighting, see Section Redisplay Highlighted Genes Only.

10.1.4  Redisplay Highlighted Genes Only

The command Redisplay Highlighted Genes Only will display a regulatory network by considering only the genes that are highlighted. The layout is changed to “top to bottom” since it is usually a better layout when using a small set of genes. This command would be used after a series of highlighting operations to select a set of genes to analyze closely. The current displayed regulatory network will be removed and a new regulatory network will be displayed. The active highlighting will remain active. All overlays (active or not) will also remain. It is useful to keep the deactivated overlays since you may come back to the complete regulatory network and reactivate them to recreate a new regulatory subnetwork. Note that genes that do not regulate or are not regulated by any highlighted genes are not included in the subnetwork.

To redisplay the complete regulatory network, use the command Display Complete Regulatory Overview accessible when right-clicking in a blank area. The current active overlays remain active and the deactivated overlays are not removed.

The information in tooltips within a subnetwork display (produced when mousing over gene nodes) are restricted to that subnetwork. That is, the tooltip’s list of regulatees and regulators are for the subnetwork, not for the entire regulatory network of the organism. However, when you transition from a subnetwork display back to the display of the entire network, any highlighting done on a subnetwork will be expanded for the entire regulatory network to show relationships within the full network. For example, if gene A has four direct regulatees in a subnetwork, but twenty regulatees in the entire network, when the operation Highlight Gene A and its Direct Regulatees is applied in the subnetwork, only the four regulatees are highlighted, but once you redisplay the entire network, the twenty regulatees will be highlighted.

10.2  Regulatory Omics Viewer

The Pathway Tools Regulatory Omics Viewer illustrates the results of high-throughput experiments in the context of gene regulation. Genes that are involved in regulation are mapped to gene

levels in a given experimental dataset is mapped to a spectrum of colors. This facility enables the user to see instantly which genes are active or inactive under some set of experimental conditions.

The Omics Viewer for the Regulatory Overview is very similar to the Omics Viewer for the Cellular Overview. Data files submitted to the Regulatory Omics Viewer must contain in their first column gene names or frame ids. To start the Regulatory Omics Viewer, use the command Overlay Experimental Data (Omics Viewer) under the Regulatory Overview menu. See Section 9.3 for details of how to use the Regulatory Omics Viewer.

11  Comparative Analysis

Several types of comparative operations are available within Pathway Tools Web sites. Note that all of the PGDBs to be compared must be resident within a single Pathway Tools Web site.

Start a comparative analysis by specifying the organism(s) you want to compare. In many cases this can be done from the menu command Select organisms/databases for comparison operations, which is accessible through the Gene, Pathway, Reaction, and Compound menus. It is also accessible through the Choose Organisms button in the Analysis → Comparative Analysis page. This tool supports multi-organism selection using the following three modes. In each mode, a list of organisms for comparison is built up on the right side; you can add to, remove from, or clear that entire list using the buttons in the middle.

  • By Name: Select individual organisms by name on the left

  • By Taxonomy: Select a taxonomic group by clicking through the tree or entering a search term. All genomes under that taxonomic group can be added to the selection by clicking “Add”

  • My Lists: Choose organism lists that were previously saved in your online account, or create a new organism list from the current selection

11.1  Compare Objects Across Databases

Most object pages in Pathway Tools Web sites contain commands for navigating to that same object in one or more other PGDBs. For example, the command Show this gene in another database on a gene page will find the same gene in a specified PGDB The command Show this compound in another database from a compound page will show the same metabolite in a specified PGDB. Similarly, Search for this gene in multiple databases on a gene page will generate a table showing information about that gene in multiple specified PGDBs.

Pathway Tools finds “the same object” using different mechanisms for different types of objects:

  • For genes and proteins, the software uses orthology information when available. If no equivalent object is found using orthology information, then the software searches for a gene or protein of the same name (note that name-based searches sometimes yield incorrect results).

  • For compounds, reactions, and pathways, the software relies on the fact that when the PathoLogic component of Pathway Tools constructs new PGDBs, it does so by selectively copying information about compounds, reactions, and pathways from the MetaCyc PGDB to the new PGDB. When performing this copy operation, the software maintains the same unique identifier for each compound, reaction, and pathway in the new PGDB as it had in MetaCyc. Thus, when comparing compounds, reactions, and pathways, the software looks for objects with the same unique identifiers in other PGDBs. Note that compounds, reactions, or pathways created by a user in an individual PGDB will have new unique identifiers that will not match identifiers in other PGDBs.

The following comparison commands are all available under the Gene, Compound, Reaction, and Pathway menus:

  • Show this object in another database

  • Show this object in multiple databases

  • Show this object in MetaCyc (not available for genes)

In addition, the following command will generate a table comparing the operon context of a gene across multiple organisms: Show orthologs (with operon diagrams) in multiple databases.

The comparative genome browser described in Section 6.2 supports more powerful viewing of genome regions around orthologous genes.

11.2  Compare Individual Pathways and Reactions

The “Species Comparison” operation in the operations box for pathway and reaction pages generates tables comparing a pathway or reaction across multiple PGDBs. If you wish to change the organisms being compared, use the command Change organisms/databases for comparison operations.

The reaction comparison table lists the enzyme(s) that catalyze the reaction; activators, inhibitors, and cofactors for those enzymes; and the one or more pathway(s) containing the reaction in that organism.

The pathway comparison table includes a graphic of the pathway showing which reactions in the pathway have enzymes present in each organism; a list of the enzymes catalyzing each reaction; and operon diagrams for each gene in the pathway.

11.3  Comparative Analysis Tables

Analysis → Comparative Analysis allows users to generate summaries of individual PGDBs, and to compare statistics between PGDBs. Currently we support comparative analysis of reactions, pathways, compounds, proteins, orthologs, transporters, and transcription units — select the type(s) of reports you wish to generate.

Next select one or more PGDBs for which to perform the analysis.

Please experiment with these commands to see the detailed reports generated by each comparison.

12  Sequence Search and Alignment

12.1  BLAST Search

Pathway Tools has an optional feature that allows Pathway / Genome Databases (PGDB) that have sequence data to be searched using NCBI BLAST.

To access the Web interface for BLAST searches, go to: Search Menu → BLAST search.

Documentation on the use of the Web interface for NCBI BLAST can be found here.

12.2  Alignment Viewer

12.3  PatMatch Sequence Search

PatMatch [⎈ cite{PatMatch05}⎈ cite{PatMatchURL}] allows you to search for a short nucleotide or amino-acid sequence within a specific genome, using an exact search or using degenerate nucleotide or amino-acid symbols. The minimum length of the input string is 3 residues.

The results are displayed initially as a simple web-page table, with the option of displaying the result as a SmartTable, if there are less than 5000 results. If there are more than 5000 results, then a file download link is provided.

To access the PatMatch search, go to: Search → Sequence Pattern Search .

For each genome, the user can search several alternative sequence databases:

  • Complete peptide database

  • Nucleotide database: whole genome

  • Nucleotide database: coding regions — contains the nucleotide sequence of the coding regions for each protein and RNA-coding gene

  • Nucleotide database: intergenic regions — contains the nucleotide sequence of the regions between adjacent genes

  • Nucleotide database: intergenic regions, extended — contains the nucleotide sequence of the regions between adjacent genes, plus an additional 400 bases upstream and 250 bases downstream, such as to include possible regulatory regions

13  Metabolic Route Search

The Metabolic Route Search (new in version 17.0, March 2013) is a software tool to search and analyze routes in the metabolic reaction network of an organism. Given a starting compound, a target compound, and other parameters, the tool finds the best (least cost) routes between these compounds taking into account atom conservation, path length and adding a minimum number of foreign reactions from MetaCyc.

The tool is activated by first selecting the organism to search using the “change organism database” link on the top right corner of the Web page and then by selecting the command Metabolism → Metabolic Route Search from the menu bar. This command is available for single organism databases only. For example, it is available for E. coli but not for MetaCyc. MetaCyc can be used, not as a native organism but as library of additional reactions, if the Web server was started with option -metaroute-metacyc (in which case the Web server is not publicly accessible). That is, in that case, MetaCyc can be used only as a set of foreign reactions to add to a selected single organism database.

The parameters to specify before clicking the “Search Routes” button are (some defaults are provided for most of them):

  • Start Compound The starting compound for the search. That compound can be entered by name or by using a unique id (i.e., frame id). A suggested list of compounds is given underneath the input text box when you start typing a compound name. You may also select the compound from that list.

  • Goal Compound The ending compound for the search. That compound can be entered by name or by using a unique id (i.e., frame id). A suggested list of compounds is given underneath the input text box when you start typing a compound name. You may also select the compound from that list.

  • Number of Routes An integer that specifies the maximum number of the best routes to find and display. The larger that number, the longer it takes to receive an answer.

  • Maximum Time The maximum number of seconds to use for the search. You may limit the search by entering a small number. If the tool times out, the best routes found so far are displayed and a text message states that a suboptimal solution is displayed.

  • Maximum Route Length The maximum number of reactions that the routes found can contain. The larger this number, the longer it takes to receive an answer.

  • MetaCyc Reaction Cost This input box is shown only if MetaCyc is available as a foreign library of reactions to search. This box is not provided from publicly available Web servers such as BioCyc.org. If available, the value entered, which must be nonnegative, is the cost to assign to a reaction from MetaCyc that is included in a route. This option may be obtained by installing Pathway Tools locally at your site and running it in Web server mode on your intranet. See command-line option -metroute-metacyc.

  • Native Reaction Cost The value entered, which must be nonnegative, is the cost to assign to a reaction from the native organism that is included in a route.

  • Atom Loss Cost The value entered, which must be nonnegative, is the cost to assign to an atom that is lost from the source compound to the target compound. This cost applies to all tracked atom species (C, O, P, N, and S). The list of atom species can be selected by clicking the selector on the left of that box and selecting “Selected atom species”, a new input box will open and the desire atom species to track can be typed separated by spaces.

A summary of what each parameter means is provided online by clicking the green question mark located on the left of each labelled input box.

The cost of a route is the sum of all costs: reaction costs from the native database and, if available, the MetaCyc database; and the cost of atom losses.

Once the parameters are entered, clicking the “Search Routes” button will initiate the search on the Web server. The solution, that is, the routes found, will be displayed under the parameters. The routes are sorted in ascending order of their cost (best routes are presented first). Depending on the number paths found, and if MetaCyc is involved, it may take some time before the solution is displayed. Also, displaying a large list of reactions might take some time by the browser itself due to the complexity of formating all compound structures and atom mappings.

Each route found is displayed horizontally accross the Web page, the start compound on the left and the target compound on the right with intermediary compounds in between. You may need to scroll the window to see some of the compounds since the whole route may not fit the width of your browser window. On the left of each route is displayed a text summary of the characteristics of the route. The summary includes the cost of the route, the number of atoms kept from the source compound to the target compound, and its number of reactions.

The chemical structure of each compound involved in the route is displayed and its name appears underneath the structure. If the compound is from the native database, its name is in grey; if the compound is from MetaCyc, its name is in red. Clicking the compound opens a new browser tab to display a complete description of the compound.

Each reaction is shown with a right arrow. If the reaction is from MetaCyc, the arrow is red, if it is from the native organism, the arrow is grey. Underneath the arrow, the protein name is displayed. Clicking the arrow stem opens a new browser tab to display a complete description of the reaction.

For each route, the atom mapping (i.e., atom tracing) is displayed using colors on atoms and bonds from compound to compound . A moiety that is conserved across several compounds is colored with a specific color. Mousing over an atom highlights that atom across all compounds that conserves it. For example, an atom that is conserved from the source compound to target compound can be seen by mousing over it in the source compound and the corresponding atoms in all intermediate compounds up to the target compound will be highlighted. Note that this highlighting feature enables you to find out quickly which atoms of the source compound are lost and by which reaction by mousing over each atom of the source compound.

A new search can be initiated by changing any parameters and clicking the “Search Routes” button. The current solution will be erased and a new solution will be displayed.

14  How to Learn More

References

[1]   PatMatch home page. ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/.

[2]   T. Yan, D. Yoo, T. Z. Berardini, L. A. Mueller, D. C. Weems, S. Weng, J. M. Cherry, and S. Y. Rhee. PatMatch: a program for finding patterns in peptide and nucleotide sequences. Nucleic Acids Res, 33(Web Server issue):W262–6, 2005.