News

News
Update History

Information

Introduction to BioCyc
7667 Databases
Guided Tour
Pathway Tools Software
Pathway Tools Blog
Publications
Linking to BioCyc
Webinars
Contact Us

Services

Subscribe to BioCyc
Metabolic Posters
Genome Posters
Software/Database Downloads
Registry
Web Services

Web Site User’s Guide for Pathway Tools-Based Web Sites

Contents

    1  Overview

    2  Selecting the Database to Search

    3  Searching Pathway/Genome Databases
        3.1  Quick Search
        3.2  Search Menu: Object Searches
        3.3  Search Menu → Cross Organism Search
        3.4  Search Menu → BLAST search
        3.5  Search Menu → Google This Site
        3.6  Search Menu → Search Full-text Articles

    4  Web Accounts

    5  Genome Browser
        5.1  Displaying External Tracks on the Genome Browser
        5.2  Comparative Genome Browser

    6  SmartTables
        6.1  SmartTable Structure and Display
        6.2  SmartTable Directory
        6.3  Creating a SmartTable
        6.4  Manipulating SmartTable Contents
        6.5  SmartTable Transformations
        6.6  Enrichment Analysis of SmartTables
        6.7  Exporting and Sharing a SmartTable
        6.8  Browsing SmartTables and Users

    7  Omics Data Analysis

    8  Cellular Overview (Metabolic Map Diagram)
        8.1  Summary of Commands
        8.2  Searching and Highlighting
        8.3  Cellular Omics Viewer — Overlay Experimental Data

    9  Metabolic Models
        9.1  How to Use the Web-MetaFlux Modeling Tool
        9.2  Selecting a Model of Interest
        9.3  Executing a Model
        9.4  Inspecting and Modifying a Metabolic Model

    10  Metabolic Route Search

    11  Pathway Collages

    12  Regulatory Overview (Regulatory Network Diagram)

    13  Comparative Analysis
        13.1  Compare Objects Across Databases
        13.2  Compare Individual Pathways and Reactions
        13.3  Comparative Analysis Tables

    14  Sequence Search and Alignment
        14.1  BLAST Search
        14.2  PatMatch Sequence Search
        14.3  Alignment Viewer

    15  How to Learn More

1  Overview

This document describes how to use Web sites based on the Pathway Tools software from SRI International. Since multiple Web sites such as BioCyc, YeastCyc, AraCyc, and MouseCyc are all based on the same underlying software, the same usage instructions apply to all. (Note that differences in configuration and in software version may introduce some variability among sites).

Please note that the desktop version of Pathway Tools that you can install locally provides some additional operations compared to the Web capabilities described here. Click here for more details.

2  Selecting the Database to Search

Unless otherwise indicated, all Pathway/Genome Database searches are restricted to a single database. In most cases, a database describes a single organism – although a small number of multi-organism Pathway/Genome Databases exist (examples include MetaCyc and PlantCyc). The database against which searches will be conducted is indicated below the Quick Search box in the page banner.

To search a different database, click on the ‘change organism database’ link below the Quick Search box. In the dialog that pops up, you can either search for the organism of interest by starting to type its name, by browsing the organism taxonomy, or by querying various properties.

If the site supports user accounts, and you are logged in, you may select one database as your preferred database. This database will be your default selection when starting a new web session.

Once you have selected the desired database from one of the tabs described below, click OK to exit the dialog. This will navigate to the page of summary statistics for the selected database.

Note that if you follow a link to a page for a different organism database, then the selected database for searching will change to match the organism of the currently displayed page.

By Name

By default, the By Name tab will be initially selected. If a small number of databases is available, a full scrollable list of databases is present to select from. When a large number of databases is available, you must start typing or select a starting letter from the alphabetical index to the left of the database list in order to see the list of matching databases. If you start typing an organism name or select a starting letter, the full list of databases (if available) will be replaced by a list of databases matching the typed string or starting with the selected letter — you can use the mouse or the up/down arrows on your keyboard to select the desired database. An organism name will match the string you type if any word in its name (i.e. genus, species or strain name) starts with the string you type.

In the list of matching databases, some database names may be displayed with a gray background – these indicated databases that have had some level of manual review and/or curation. Tier 1 databases, i.e. those that have received at least a year of literature-based curation, will have a dark gray background. Tier 2 databases, i.e. those with a lower level of manual curation, will have a light gray background. All others are Tier 3 databases, which means they have been computationally generated with little or no manual review.

Lists of your recently used databases and the site’s most popular databases provide shortcuts for selecting those databases.

By Taxonomy

The By Taxonomy tab allows you to select an organism by browsing for it. After the name of each class of organisms is listed the number of organism databases in that class. The taxonomy tree does not include all taxonomy classes, only those that contain at least one organism database – if a particular taxon does not appear in the tree, it means there is no database available for it or its children. Clicking on a class name will show or hide its list of child taxa. Clicking on an organism name will select that database and show its name at the top.

You may search for any taxon by starting to type its name in the text box. If you select one of the options from the resulting auto-complete box, the taxonomy will automatically expand to show the selected taxon (you must still click on the organism name in the taxonomy to select that database, however).

By Organism Properties

The By Organism Properties tab allows you to query for all organisms that have (or do not have) some property. The types of properties that can be queried (known as the organism “metadata”) include such attributes as when and where and from what host the sample was collected, whether or not the organism is a pathogen, its relationship to oxygen (e.g. aerobic or anaerobic), etc. Not all organism databases contain data for each of these attributes. In the list of properties from which to select, the number of databases that have values for that property is listed in parentheses.

After selecting a property, you can constrain its value, or just select all databases that have (or do not have) any value for that property. To select from a list of all available values, click in the text box. In the resulting list of possibilities, the number in parentheses after each value is the total number of organisms that match that value. If you start to type, the list of visible options will be limited to those that match the string you have typed. Multiple options may be selected by clicking in the text box again after selecting a value – in that case, an organism will satisfy the constraint if it matches any of the selected values (i.e. the values are connected by an implicit OR). For properties whose values consist of free text, you may also query by substring. The first few values that match your substring are shown, but you are not obligated to select any of them. For properties whose values are numeric, a variety of numeric operators are available, as well as the option to select from all available values. If you specify an = constraint, an organism will satisfy the constraint if its value falls within a small range on either side of the specified value – the size of this range depends on the property, and is indicated below with the description of each property. To specify a different range, use a combination of < and > constraints.

Up to six different constraints may be specified (use the “Add Constraint” button to add a new constraint, up to the limit). These may be connected by either AND (an organism must satisfy both constraints) or OR (an organism may satisfy either constraint). Since there is no way to group constraints, if you are are building a query that combines both ANDs and ORs, ordering becomes very important. Queries are processed in a left-to-right order, so X AND Y OR P AND Q is interpreted as ((X AND Y) OR P) AND Q, which may not match what was intended. If the ordering of constraints do not allow for a desired query, you may be better off splitting your query into multiple queries and searching for the desired organism one part of the query at a time.

The following properties are available for searching:

  • Environment: This property encompasses terms that describe the environmental features and habitats where the sample was taken. This can include biome-level terms, such as desert, deciduous woodland, coral reef; geographic features such as harbor, cliff, lake; and/or environmental material such as air, soil, water. It can also include terms related to host environment (e.g. blood, skin, oral cavity, gut). This slot combines the MIGS concepts biome, feature, material, body_habitat, body_site and body_product. Ideally, terms should be taken from the EnvO or the FMA ontologies, but can also be free text. An organism may have multiple different values for this property.

  • Geographic Location: The geographical origin of the sample, defined by country or sea name, and/or specific region name. This property can have multiple values, e.g. one might be a country name, another a region name, and another text describing the specific location.

  • Latitude: The latitude of the geographical origin of the sample. Values are reported in decimal degrees, in the WGS84 system. Positive numbers are North, negative numbers are South. If you specify an = constraint for this property, all organisms whose latitude is within 10 degrees of the requested value will be included in the result. If you wish a different size range, you will need to specify it explicitly by combining < and > constraints.

  • Longitude: The longitude of the geographical origin of the sample. Values are reported in decimal degrees, in the WGS84 system. Positive numbers are East, negative numbers are West. If you specify an = constraint for this property, all organisms whose longitude is within 10 degrees of the requested value will be included in the result. If you wish a different size range, you will need to specify it explicitly by combining < and > constraints.

  • Depth/Altitude: The depth or altitude in meters at which the sample was collected. Negative numbers are depths, positive numbers are altitudes. If you specify an = constraint for this property, all organisms whose depth or altitude is within 20% of the requested value will be included in the result. If you wish a different size range, you will need to specify it explicitly by combining < and > constraints.

  • Collection Date: The year the sample was collected.

  • Relationship to Oxygen: Whether the organism is an aerobe or anaerobe, and what form.

  • Trophic Level: The position of the organism in a food chain.

  • Temperature Range: A qualitative description of what kind of temperature range the organism grows best in. A mesophile grows best in moderate temperatures, typically between 20 and 45 degrees Celsius. A psychrophile prefers colder environments, whereas a thermophile prefers warmer ones, and a hyperthermophile thrives in extremely hot environments of 60 degrees Celsius and higher.

  • Biotic Relationship: Whether the organism is free-living or in a host, and if the latter, what type of relationship is observed.

  • Pathogenicity: The general class of organisms to which the organism is pathogenic.

  • Host: The host from which the sample was isolated.

  • Health/Disease State: The health or disease state of the specific host at the time of collection.

  • Ploidy: The ploidy level of the genome, e.g. haploid, diploid, triploid, allopolyploid.

Once you have specified the desired constraints, use the “Find Organisms” button to search for all matching organisms. In the resulting table, which includes all properties for which at least one of the matching organisms has a value, you may click on any column heading to sort by that column. Click on a row to select that organism.

3  Searching Pathway/Genome Databases

3.1  Quick Search

The Quick Search box in the upper right hand corner of every page is useful if you know the name (or part of the name) or database identifier of the object you are searching for. You may use this box to search for genes, proteins, compounds, RNAs, reactions, pathways, operons, and GO terms. If the query string matches a single object, the page for that object will be displayed immediately. If there are multiple matches, the full list of matches will be shown, organized by the type of object (e.g. gene, protein, etc.).

Some examples of what can be entered into the Quick Search box include:

  • The name of a compound, gene, protein, pathway or other object. Spaces, punctuation and capitalization are ignored. An object will be returned if the query string matches either its common name or one of its synonyms.
    Examples: pyruvate, trpA

  • Any substring of one of the above names that is 3 or more characters in length.
    Examples: kinase, pyr

  • An EC number (full or partial).
    Examples: 1.2.3.3, 1.3.99

  • A compound InChI-key (full or partial).
    Examples: CKLJMWTZIZZHCS‑REOHCLBHSA‑M, CKLJMWTZIZZHCS‑REOHCLBHSA, CKLJMWTZIZZHCS

  • A PGDB internal object identifier for any compound, gene, protein, pathway, reaction, transcription-unit or schema class. Correct capitalization may be required.
    Examples: CPLX0‑3661, HEMN‑RXN

  • A PGDB internal object identifier for any compound, gene, protein, pathway, reaction, transcription-unit or schema class in some other PGDB served at the same site, followed by ’@’ and the PGDB identifier (no spaces).
    Examples: trp@ecoo157, HEMN‑RXN@META

  • An identifier from some external database to which we maintain links, e.g., a UniProt identifier. Correct capitalization and punctuation is required. Note that our set of links is not complete – just because a search for an external ID returns no result does not mean that we do not have the object in our database.
    Examples: P00561, NP_414543, C00047

A few additional rules govern searches:

  • To match several words or text-fragments simultaneously, type in the words separated by spaces to find an object with all the words in its name, or separated by commas to find objects with any of the words in its name. For example, if you enter nitrate camphor, the program will search for a single object that has both nitrate and camphor in its name. However, entering nitrate, camphor would result in a search for objects which have either nitrate or camphor in their names.

  • If your query text is one or two characters in length, only exact text matches will be returned because of the many matches that would otherwise result. For longer text fragments, the search will return all objects that contain the text rather than match it exactly.

  • Searches may be qualified. Currently we allow two qualifiers:

    1. search:exact Example: trpa search:exact
      This search will be limited to exact matches. In the example given, assuming the current organism is E. coli K-12, without the qualifier there will be several matches including genes, proteins and transcription units. With the qualifier you will be taken directly to the trpa gene page.

    2. type:<type-qualifier> Example: atp type:compound
      This search will be limited to the specified type. In the example given, assuming the current organism is E. coli K-12, without the qualifier a large number of results will be returned of various types. With the qualifier, just the seven compounds with ATP in the name will be returned.
      Allowable type-qualifiers include pathway, gene, enzyme, rna, go-terms, compound, reaction, operon, and organism.

3.2  Search Menu: Object Searches

The Search menu contains links to specialized search pages for Compounds, Genes/Proteins/RNAs, Reactions and Pathways. Each such page contains options for searching using a number of different criteria, either individually or in combination. When the page is initially loaded, only the name searches are active, but by clicking on the different search bars, you can enable or disable additional search criteria. If multiple search criteria are specified for a given search, then unless otherwise specified the results must satisfy all of them (that is, an AND connector is used to combine the different criteria).

The results of all object searches is a table containing the names of all objects that satisfy the search, with hyperlinks to their corresponding data pages, along with any additional columns relevant to the particular search. The table will initially be sorted alphabetically by name, but small triangles in the column headers allow the user to sort by any column, in either ascending or descending order.

The sections below describe the different search criteria that are available for each object type.

3.2.1  Search Menu → Search Genes, Proteins or RNAs

  • Search by gene name or database identifier
    Enter a gene name, name fragment, or identifier (either the internal Pathway/Genome Database identifier, or an identifier from some other database). The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected gene, regardless of any other search criteria you may have specified (i.e., other search criteria are ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search by product name, database identifier or EC number
    Enter a protein or RNA name, name fragment, identifier (either the internal Pathway/Genome Database identifier or an identifier from some other database, such as UniProt), or a fully specified EC number. The software will attempt to do auto-completion, as for the gene name field.

  • Search/Filter by sequence length
    Enter a minimum and/or maximum sequence length, and specify whether the units referred to are nucleotides or amino acids. If either the minimum or maximum field is left blank, then the sequence length is unconstrained in that direction.

  • Search/Filter by replicon and/or gene map position
    Enter a minimum and/or maximum gene map position, where the units are the number of base pairs from the start of the replicon. The results will include any gene that overlaps any portion of the specified region. If either the minimum or maximum field is left blank, then the map position is unconstrained in that direction. If the selected organism has multiple replicons, then this search option will include a checkable list of replicons – you may select one or more replicons either instead of or in conjunction with the map position in order to constrain the search to genes on a particular replicon.

  • Search/Filter by product molecular weight
    Enter a minimum and/or maximum molecular weight for the gene product in kilodaltons. If either the minimum or maximum field is left blank, then the sequence length is unconstrained in that direction.

  • Search/Filter by pI
    Enter a minimum and/or maximum pI (isoelectric point) for the gene product. (Typically little information about pI is available for databases other than EcoCyc or MetaCyc.)

  • Search/Filter by small molecule regulator, cofactor, substrate or ligand
    This search option is for retrieving all proteins affected by a specified small molecule in any of several ways. An example might be to search for all enzymes inhibited by ADP, or all enzymes that use Mg2+ as a cofactor. Enter the name of a small molecule. We recommend taking advantage of the auto-complete facility to select the correct small molecule, as only an exact match to a compound name can be accepted here. Check all roles that you are interested in for this compound. Note that we consider cofactors to include only compounds that are not modified in any way during the reaction. Molecules such as NAD, which are modified, are considered to be substrates, not cofactors. (Relatively little information about activators, inhibitors, etc. is typically available for databases other than EcoCyc or MetaCyc.)

  • Search/Filter by evidence code
    The evidence ontology appears here in browseable form. Each evidence code includes in parentheses after its name the number of gene products that have their function annotated with that code. Selecting one or more codes to filter on allows you to restrict your search, for example, to all proteins whose function has been established experimentally. The Pathway Tools evidence codes and ontology are described here.

  • Search/Filter by cell component
    The cell component ontology appears here in browseable form, along with the numbers of gene products associated with each cell component. Selecting one or more components allows you to restrict your search to proteins known to be present in those cellular locations. (Note that relatively little information about cellular locations of gene products is available for databases other than EcoCyc or MetaCyc.) The Pathway Tools cell component ontology is described here.

  • Search/Filter by Gene Ontology
    If the selected database has been annotated using Gene Ontology, then you will see a browseable ontology here. Only terms that have one or more gene products annotated to them or their children will be present, and the number in parentheses after each term name indicates the number of gene products annotated to that term or one of its children. You may browse this ontology to a particular term to see all gene products annotated with that term. Clicking on a gene product will then take you directly to the data page for that gene product, just as clicking on a term name will take you to the data page for that term. Alternatively, you can use the checkboxes to indicate that your search should be restricted to include only gene products annotated with the checked terms or their children. If you wish to filter by only a single term, and you know the name or ID for that term, you also have the option of typing it in the text box (using auto-completion to ensure you select the correct term). Select one or more GO evidence codes to restrict the search results to GO term matches with one of the selected evidence codes.

  • Search/Filter by MultiFun term
    If the selected database has been annotated using the MultiFun ontology, then you will see a browseable ontology here. Only terms that have one or more genes annotated to them or their children will be present, and the number in parentheses after each term name indicates the number of genes annotated to that term or one of its children. You may browse this ontology to a particular term to see all genes annotated with that term. Clicking on a gene will then take you directly to the data page for that gene, just as clicking on a term name will take you to the data page for that term. Alternatively, you can use the checkboxes to indicate that your search should be restricted to include only genes annotated with the checked terms or their children.

  • Search/Filter by organism
    This search option will be available only if the selected database is a multi-organism database (such as MetaCyc), and allows you to browse directly for proteins from a particular organism, or to restrict your search to one or more taxonomic groups.

  • Search/Filter by publication
    This search option is useful for retrieving a list of all genes or gene products that cite a given publication or author. Enter either the PubMed ID, the author surname, or part or all of an article title.

  • Search/Filter by existence of protein features
    This search option generates a browsable ontology of protein features. Select one or more feature types to search for proteins annotated with those features.

3.2.2  Search Menu → Search Compounds

  • Search for compound by name or ID
    Enter a compound name, name fragment, or identifier (either the internal Pathway/Genome Database identifier, or an identifier from some other database such as PubChem or LIGAND). The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected compound, regardless of other search criteria you may have specified (i.e., other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by ontology
    This option allows you to browse the compound ontology. Each compound class includes in parentheses after its name the number of instance-level compound objects that are members of that class. Clicking a + icon shows the classes and compounds that belong to a particular class. The ontology may be used in one of two ways. By selectively clicking on + icons, you can browse to find a compound or compound class of interest, and click directly on its name to visit the data page for that compound. Alternatively, you can check the checkbox next to one or more class names to limit your search (which may also include other search criteria) so as to only include compounds that belong to one of the checked classes.

  • Search/Filter by monoisotopic molecular mass
    For searching for matches to mass spectroscopy results, enter one or more monoisotopic molecular masses, and specify the desired tolerance.

  • Search/Filter by molecular weight
    This option can be used to specify either a minimum molecular weight value, a maximum molecular weight value, or both. If either the minimum or maximum field is left blank, then the molecular weight is unconstrained in that direction.

  • Search/Filter by chemical formula (partial or full)
    If one or more element symbols are entered without a number, then the result will include any compound containing those elements (and possibly some others). If an element symbol is followed by a number, then only compounds with exactly that number of that element in its chemical formula will be included in the result. For example, the query string C12N will retrieve all compounds with exactly 12 carbons, one or more nitrogens, and possibly some other elements. The search is case-insensitive unless case is needed to disambiguate. For example, either co or CO will retrieve all compounds containing both carbon and oxygen, but Co will instead retrieve all compounds containing cobalt.

  • Search by InChI string
    InChI is short for International Chemical Identifier, and offers a way to search for a molecule by its chemical structure. We support only exact string matching for InChI strings.

  • Search by InChI key
    An InChI key is a compressed formulation of the InChI string. You may enter either the full InChI key, or a partial InChI key that omits either the charge or the isomer and charge information.

3.2.3  Search Menu → Search Reactions

  • Search for reaction by EC number or name
    Enter a reaction EC number or name (typically an enzyme name). EC numbers can be either full or partial. The software will attempt to do auto-completion on the name or EC number. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected reaction or reaction class, regardless of any other search criteria you may have specified (i.e., other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by substrates or products
    Enter a compound name to retrieve all reactions in which that compound participates either as a substrate or product. Multiple compounds can be specified, separated by either OR, AND or AND NOT. When multiple compounds are specified, they can appear anywhere in the reaction equation, or they can be restricted to being on either the same or opposite sides of the reaction relative to each other. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by whether or not reaction is catalyzed by an enzyme
    Specify whether to include only enzyme-catalyzed reactions for which an enzyme has been identified, enzyme-catalyzed reactions for which no enzyme has been identified, or spontaneous reactions.

  • Search/Filter by ontology
    This option allows you to browse the Pathway Tools reaction ontology. Each reaction class includes in parentheses after its name the number of reactions that are members of that class. The ontology may be used in one of two ways. By selectively clicking on + icons, you can browse to find a reaction of interest, and click directly on its name to visit the data page for that reaction. Alternatively, you can check the checkbox next to one or more class names to limit your search (which may also include other search criteria) so as to only include reactions that belong to one of the checked classes. Note that there are two parallel reaction classification systems, one in which reactions are classified by conversion type (this includes the entire EC hierarchy), and another in which the reactions are classified by substrate. Most reactions in the database have parents in both classification systems.

  • Search/Filter by cellular location
    Select one or more cell compartments to filter the result to only include reactions that occur in those compartments. Transport reactions will not be included.

3.2.4  Search Menu → Search Pathways

  • Search for pathway by name
    Enter a pathway name, name fragment, or internal Pathway/Genome Database identifier. The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected compound. This is true regardless of any other search criteria you may have specified (i.e. other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by ontology
    This option allows you to browse the Pathway Tools pathway ontology. Each pathway class includes in parentheses after its name the number of reactions that are members of that class. The ontology may be used in one of two ways. By selectively clicking on + icons, you can browse to find a pathway of interest, and click directly on its name to visit the data page for that pathway. Alternatively, you can check the checkbox next to one or more class names to limit your search (which may also include other search criteria) so as to only include pathways that belong to one of the checked classes.

  • Search/Filter by number of reactions
    Enter a minimum and/or maximum number of desired reactions in the pathway. If either the minimum or maximum field is left blank, then the number of reactions is unconstrained in that direction.

  • Search/Filter by substrates present
    Enter one or more compound names to retrieve all pathways in which those compounds participate as a reactant, a product, or an intermediate. If you enter more than one compound, then the pathway must involve all specified compounds in order to be included in the results. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by evidence code
    The Pathway Tools evidence ontology appears here in browseable form. Each evidence code includes in parentheses after its name the number of pathways that have their function annotated with that code. Selecting one or more codes to filter on allows you to restrict your search, for example, to all pathways whose presence has been established experimentally. The Pathway Tools evidence codes and ontology are described here.

  • Search/Filter by organism
    This search option will be available only if a multi-organiam database (such as MetaCyc) is the selected database, and allows you to browse for pathways that are curated as occurring in a particular organism based on experimental information. The fact that a pathway is not stated to be present in a given organism does not mean that the organism does not have the pathway – pathways are curated for only a small subset of the organisms in which they appear.

  • Search/Filter by expected taxonomic range
    This search option will be available only if a multi-organism database (such as MetaCyc) is the selected database. Each pathway in MetaCyc has been annotated with its expected taxonomic range. This search option allows you to restrict your search to include only those pathways you could reasonably expect to see for a given taxonomic grouping, for example, to restrict your search to pathways seen in plants.

  • Search/Filter by publication
    This search option is useful for retrieving a list of all pathways that cite (either directly or through one of the pathway’s enzymes, genes, subpathways or substrates) a given publication or author. Enter either the PubMed ID, the author surname, or part or all of an article title.

3.2.5  Search Menu → Search DNA or mRNA sites

Many databases include information about DNA or mRNA sites other than genes. The kinds of sites that can be searched here include transcription units, promoters, terminators, transcription-factor binding sites, riboswitches, REP elements, transposons, phage attachment sites, etc., although most databases will not include all of these site types.

  • Search/Filter by Site Type
    Choose one or more site types from among those available in the currently selected database. You must specify at least one site type.

  • Search/Filter by replicon and/or map position
    Enter a minimum and/or maximum map position, where the units are the number of base pairs from the start of the replicon. The results will include any site that overlaps any portion of the specified region. If either the minimum or maximum field is left blank, then the map position is unconstrained in that direction. If the selected organism has multiple replicons, then this search option will include a checkable list of replicons – you may select one or more replicons either instead of or in conjunction with the map position in order to constrain the search to sites on a particular replicon.

  • Search/Filter by regulatory protein or RNA
    Enter a transcription factor, sigma factor or regulatory protein or RNA name. Use the autocomplete functionality to select a full name, as no substring matching is done on the regulator name. If no match is found, then the database contains no regulatory interactions or sites involving that regulator. This filter is compatible only with searches for transcription units, promoters, transcription factor binding sites, attenuators, or mRNA binding sites.

  • Search/Filter by evidence code
    The evidence ontology appears here in browseable form. Selecting one or more codes to filter on allows you to restrict your search, for example, to all promoters whose location has been established experimentally. The Pathway Tools evidence codes and ontology are described here.

3.2.6  Search Menu → Search Growth Media

Some databases may include sets of growth media, along with information about whether or not the organism can grow on a particular medium and under what conditions (for example, gene knockout studies can indicate whether the organism can grow on a particular medium in the absence of a particular gene). To see the full list of growth media for a database, including an indication of which media have associated knockout data, click on the All Growth Media for this Organism button. Use the other fields of this form to search for growth media that meet certain criteria.

  • Search for growth media by name
    Enter a growth medium name or name fragment. The software will attempt to do auto-completion on the string you have entered based on the contents of the database. If you select one of the auto-complete options, then when you submit the form you will be taken directly to the data page for the selected compound. This is true regardless of any other search criteria you may have specified (i.e. other search criteria will be ignored). If you do not select one of the auto-complete options, then the string you typed will be the target of a substring search, which may be combined with other search criteria.

  • Search/Filter by compounds present in the medium
    Enter up to four compound names to retrieve all growth media that contain either any or all of the specified compounds. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by compounds not present in the medium
    Enter up to four compound names to retrieve all growth media that do not contain any of the specified compounds. We recommend taking advantage of the auto-complete facility to select the correct compound, as only an exact match to a compound name can be accepted here.

  • Search/Filter by observed growth
    Select one or more growth levels to retrieve media on which any of the selected levels of growth have been observed. If no gene knockout is specified, then the growth levels refer to wildtype growth. If a gene is specified, then the growth levels refer to knockouts of that gene. When specifying a gene, we recommend using the auto-complete facility to select the correct gene, as only an exact name match can be accepted here.

3.2.7  Search Menu → Search DNA or mRNA Sites

Some databases include DNA or mRNA sites that are not genes, such as transcription-units, promoters, terminators, binding-sites, extragenic-sites, etc. This page includes a checklist of all types of such sites that are present in the current database. Select one or more types that you wish to search. The other fields of this form allow you to further constrain your search.

  • Search/Filter by replicon and/or map position
    Enter a minimum and/or maximum map position, where the units are the number of base pairs from the start of the replicon. The results will include any site that overlaps any portion of the specified region. If either the minimum or maximum field is left blank, then the map position is unconstrained in that direction. If the selected organism has multiple replicons, then this search option will include a checkable list of replicons – you may select one or more replicons either instead of or in conjunction with the map position in order to constrain the search to sites on a particular replicon.

  • Search/Filter by regulatory protein or RNA
    This option allows you to search for all sites that bind to or are regulated by the specified protein or RNA. Possible proteins or RNAs can include transcription factors, sigma factors, sRNAs, sRNA accessory proteins, and other proteins or RNAs that regulated transcription or translation. As you start typing in the textbox, a menu of possible completions will appear. This menu will only include proteins and RNAs that are known to regulate transcription or translation — you must select the appropriate value from the auto-complete menu.

  • Search/Filter by small molecule ligand
    This option allows you to search for all sites that are regulated in some way by the specified small molecule. The small molecule can bind directly to or otherwise directly regulate a site (as in the case of riboswitches), or can bind to a transcription factor to either enable or prevent it from binding to a site. As you start typing in the textbox, a menu of possible completions will appear. This menu will only include small molecules that are known to regulate transcription or translation — you must select the appropriate value from the auto-complete menu.

  • Search/Filter by evidence code
    The evidence ontology appears here in browseable form. Selecting one or more codes to filter on allows you to restrict your search, for example, to all promoters whose location has been established experimentally. The Pathway Tools evidence codes and ontology are described here.

3.2.8  Search Menu → Advanced Search

The Advanced Search tool facilitates generation of queries that are more complex than those supported by the object search tools described above. Using the Advanced Search tool, you can write queries that combine data from multiple organisms or multiple types of objects, and you can search fields that are not supported by the individual object search pages. Detailed instructions for using the Advanced Search tool to construct complex queries are available here.

3.3  Search Menu → Cross Organism Search

The Cross Organism Search tool is only available on the BioCyc.org web-servers. It enable queries across all the organisms on the BioCyc.org website.

  • Search Terms
    Enter the term(s) you wish to search for. This is a search which will match on substrings, so “trp” will match “trpA”, “trpB”, etc. Also, if you enter multiple terms, you can select whether all terms must be present, or just any one (or more) of them. For example, “any” “trp yersinia” will yield all entries for “Yersinia” and all entries for “trp” - an enormous number of entries; however, selecting “all” will limit the search results to a small, more manageable number of results.

  • Fields to Search
    One can select “Names” if the only search you want performed is on the names entities you are interested in. Selecting “Summary” means that your search will be on include looking for matches in the Summary string. The latter will be possibly less useful. For example, if the summary says “X is not in anyway similar to Y” and you’re searching on “Y”, you will retrieve a reference to the “X” entity, though you are likely not interested in this.

  • Types to Restrict Search To
    This enables you select the types of entities you’d like to search on.

  • Number of Results Per Page
    The results are presented in a “paged” table; that is, not all the results are returned in a single table (unless the result set is smaller than this value), and one can page backwards and forwards through the results.

  • Choose Organisms
    You can choose a set of organisms individually by name or property. You can also select all members of a taxonomically-related group, for example all Bacteria.

Search results are presented sorted by relevance (or match strength) in a table with clickable links, which link to the details for each matched entity. Each column in the table can be used to sort the results, with the relevance being used as the default. Re-sorting the table re-sorts all of the results, and this sorting is preserved as you navigate through the results table, from one page to the next.

3.4  Search Menu → BLAST search

This facility (not available for MetaCyc) allows you to perform sequence-similarity searches using the BLAST program to compare your protein or nucleic acid sequence against the complete genome of the selected organism database.

3.5  Search Menu → Google This Site

The Search Menu → Google This Site command uses Google to perform a full text search over this entire Web site. Searches will not be restricted to the selected database, and can locate text strings found in page comments, help pages, and other page content not queryable by other means. Submitting this form will direct the user outside this Web site to a page generated by Google. A Google full text search is also offered as an option when a Quick Search fails to return any result (or does not return the desired result).

3.6  Search Menu → Search Full-text Articles

Textpresso is a package for indexing and searching a corpus of biological literature. Textpresso searches are available for searching a large Escherichia coli literature corpus only at the BioCyc Web site, and are available only when EcoCyc is the selected database.

Ontology Searches

An ontology is a carefully constructed vocabulary of terms, often called a controlled vocabulary. The terms are organized into a classification hierarchy (also called a taxonomy). Ontologies can be used to browse and search for objects by drilling down from more general categories to more specific ones. Each Pathway/Genome Database contains several ontologies. Those that can be searched are available from the Ontologies sub-menu in the Search menu. These ontologies can also be accessed from the object search page for their particular object type. The browseable ontologies are:

  • Genome → Browse Gene Ontology
    Not all databases contain Gene Ontology (GO) annotations, but for those that do, GO can be browsed to see which gene products are assigned to which GO terms. Each database only contains those terms to which one or more gene products are actually assigned, so a term may be missing from the browseable ontology even though it is a valid GO term. GO can also be browsed from the Search Menu → Genes/Proteins/RNAs page.

  • Metabolism → Browse Pathway Ontology
    The Pathway Tools pathway ontology classifies pathways into groups based on their biological functions, and based on the classes of metabolites that they produce and/or consume. It is also accessible from the Search Menu → Pathways page.

  • Metabolism → Browse Enzyme Commission Ontology
    <a Enzyme Commission numbers (EC numbers) form a classification scheme for enzymes, based on the chemical reactions they catalyze. Pathway/Genome Databases use EC numbers to organize enzyme-catalyzed reactions (rather than the enzymes themselves) based on type of transformation and class of substrates. The EC ontology can also be browsed from the Search Menu → Reactions page (as a child of Chemical-Reactions). Both Search Menu → Reactions and Search Menu → Genes/Proteins/RNAs pages allow searching by EC number.

  • Metabolism → Browse Compound Ontology
    The Pathway Tools compound ontology describes small molecules, that is, chemical compounds that are not macromolecules. It is also accessible from the Search Menu → Compounds page.

4  Web Accounts

Pathway Tools Web accounts give users the ability to customize their experience when accessing PGDBs via the Web, and to store SmartTables of objects in their account.

Web site accounts provide several benefits. Through your account you can:

  • Define SmartTables of genes, pathways, metabolites, and more for analysis and to share with colleagues

  • Customize the appearance of pages on this Web site

  • Store organism sets for comparative operations

  • Receive important email updates about this Web site

To create an account, click “Create New Account” at the top right of most Web pages. (If those words are missing it probably means that Web Accounts are not enabled for this Pathway Tools Web site. The Pathway Tools User Guide describes how to enable and configure Web Accounts for a Pathway Tools Web site.)

5  Genome Browser

The genome browser can be used to examine one replicon (chromosome or plasmid) at a time. Its tracks capability can be used to visualize high-throughput datasets in a genome context.

The genome browser can be invoked by

  • Selecting Genome → Genome Browser from the main menu

  • Clicking on a replicon listed in the organism summary page (that page can be created by selecting Analysis → Summary Statistics

  • Clicking on the “Genome Browser” button in gene pages, on the Map Position line

At the top of the genome-browser page, the full length of the chromosome is shown at low resolution. A region of the chromosome can be selected for display at much higher magnification in the lower part of the screen. The selected region will be drawn using as many lines as will comfortably fit on the Web browser page. The full chromosome view at the very top indicates the magnified region by means of a red, rectangular cursor.

Selection of the magnified region can be achieved by the following methods:

  • Clicking on a vertical tick mark within the full chromosome line at the top will show the immediate neighborhood of that position. The tick marks in the magnified region can also be clicked on, to recenter the region around the selected tick mark quickly.

  • Start and end base-pair positions can be entered in the corresponding text entry boxes; clicking the Go button displays that region.

  • The region around a gene can be shown by entering the gene name in the corresponding text entry box and clicking on the Go button. The selected gene will be visually highlighted.

  • The panel of navigation arrows to the left of the legend can be used for moving to a nearby region. The panel allows lateral translation to the left or right, and also serves to zoom in or out.

The magnified section indicates the transcription direction of genes by rectangular blocks with an arrow at one end, pointing from the 5’ to the 3’ end. ORFs for actual or inferred proteins have symmetrical arrowheads (with the arrow apex in the center), whereas RNA genes have an asymmetrical arrowhead (with the apex at the top edge). Phantom- and pseudo-genes are crossed out with a big, diagonal X. When a gene wraps across more than one line, a zigzag at the end of the line indicates that the gene continues on the next line. Clicking on a gene brings up the corresponding gene description page.

Gene arrows filled with solid colors have transcription unit (operon) information available. All the adjacent genes that are part of a given operon are assigned the same color. Genes that have not been assigned to any transcription unit are not colored. Additionally, transcription-units are indicated by a gray background area behind the genes, spanning the entire region of the operon.

Moving the mouse-cursor over the genes reveals their product name and the length in base pairs of the intergenic region between the chosen gene and its neighboring genes to the left and right. If the number of base pairs carries a minus sign, the genes overlap by that many bases. As an example:

  Gene: xdhB

  Product: putative xanthine dehydrogenase subunit, FAD-binding domain

  Intergenic distances (bp): xdhA< +11 xdhB -3 >xdhC

This means that there are 11 bp to the left of xdhB before xdhA is reached, but to the right, xdhC overlaps with xdhB by 3 bp.

If the overlap between adjacent genes is more than a small amount, the shorter gene is drawn above the longer gene to avoid visual clashes.

When zooming in to a great level of detail, transcription start sites and terminators are drawn. Transcription start sites are indicated by small arrows that point toward the 3’ end of the transcript. Moving the mouse-cursor over a transcription start site reveals the operon it is part of. The transcription factors controlling the operon are also shown, with a plus sign meaning activation and a minus sign meaning inhibition. Clicking on a transcription start site brings up the corresponding transcription unit description page.

5.1  Displaying External Tracks on the Genome Browser

External datasets can be shown alongside the display of a replicon region, in form of additional tracks that are uploaded by the user. The supported tracks file format is GFF, version 2. A short description of this format can be found on the help page, reached by clicking on the green icon containing a question mark, on the far right side of the genome browser’s navigational controls.

The GFF file allows definition of segments on the chromosome that are denoted by a start and stop base-pair position. In an attribute field of the file, a name can be assigned to the segment, and in a score field, a numerical value (such as an expression value) can be supplied. This allows a broad range of different data types to be shown in the genome browser, aligned with the genes and transcription units that a PGDB already describes. This could include alternate gene predictions, or the results of expression experiments. Each specified segment can state a source and feature value, allowing different segment types to be supplied in one file. The external track mode of the genome browser will display different combinations of source/feature values grouped together. If in these groups some of the shown segments overlap due to their base-pair positions, such horizontal segments will be displayed on separate lines, to avoid visual clashes.

To view data from such a GFF file in an external track, first open the genome browser. Next click the “Show Tracks” button to the right of the gene name dialog box. This will enter the external tracks mode, in which the magnified genome region will no longer wrap to fill the screen, instead making room for external tracks that will be displayed underneath. Vertical hair lines will be shown for easier visual alignment of features in external tracks with the magnified region.

Next, add tracks data from an external data file using the controls at the bottom of the page. The data file can be specified through a Web site URL (click the “Add Track” button to the right of “Load track data from GFF file via URL”), or from a file on your computer’s hard disk (click “Browse...” to find the file, then click its associated “Add Track” button). Depending upon the size of your GFF file, it can take several minutes to upload a file. During this time, the page will not respond, and you should not click more controls. After the file has finished successfully uploading and being parsed, it will let you know by refreshing the page.

The external tracks display will show the feature name on the left, the sequence name if one is included, and the appropriate color to match the feature’s score, if a score value was found in the GFF file. Following the display of a track, you can continue to browse the genome normally, using the standard Left, Right, Zoom Out, and Zoom In controls, and the Gene Name box.

You can display data from more than one GFF file at the same time. Load each file individually using the procedure described above. Tracks from the first file loaded will appear just below the gene line. Tracks from the second file loaded will appear below those from the first, and so on. The order of the tracks can be changed, by left-clicking on the underlined track titles on the left side, which name the feature type. The popup menu allows the chosen track to be moved up or down by one step relative to the current ordering.

The horizontal bars represent the feature data found in the GFF track file. These are arranged in rows distributed vertically, so as to help prevent overlapping features from running into each other and being indistinguishable. The number of distributed rows may vary with the zoom scale, so that features can fit; there is no other meaning to the number of lines. The length of each horizontal bar shows the extent of each individual feature reading. The color is drawn from a spectrum that shows the magnitude of a score. In order to get a better feel for this magnitude, a graph of the same track feature data is also plotted above the horizontal bars. In the default graph mode, each feature score is represented by a horizontal line spanning the feature’s start and end base-pair coordinates. The magnitude of the score is represented as the height on the graph. This offers an intuitive method of viewing trends and anomalies in the data at a glance.

In the bar graph mode, the rectangular area between the feature’s horizontal line and the baseline (corresponding to a score of zero) is filled by a solid color. This is useful for features that tend to be very short, which may otherwise be hard to see.

It is possible to choose to display, or turn off the display, of either the horizontal bars or the graph plot or both, for each of multiple tracks viewed simultaneously. Reference a pull-down selector control next to the listing of the track at the bottom of the page, which switches between “Show both graph and horizontal”, “Show both bar graph and horizontal”, “Show only graph”, “Show only bar graph”, “Show only horizontal”, and “Both invisible”. This control allows you to stack graphs from different tracks close to each other, so that you can compare them and see fine differences between them.

It is also possible to shift the plotted range of this graph for each track file viewed. Beside the listing of the track there is also a line saying “graph Y range from [ ] to [ ]” with a “Set” button. Fill in the desired lower and upper Y coordinates of the range, press the “Set” button, and that particular graph will be redisplayed with that setting. Entries may be in integers or decimals. The lower range must be less than the upper range coordinate. Score values that fall outside the range will result in the display of a horizontal line just a little bit outside the graph range, to visually indicate this over- or underflow condition.

In graph mode, the entire track is assigned a color from a predefined set of colors. However, it is possible for the user to choose the color of a track, by adding a new header comment line close to the top of the GFF file, before uploading the file. An example line looks like this:

##color green

Several common color names can be substituted for "green".

5.2  Comparative Genome Browser

The comparative genome browser can be used to examine several replicons (chromosomes or plasmids) simultaneously, side by side. This view facilitates comparison of related organisms to observe similarities and differences in their gene arrangements. For the alignment to work, ortholog links must exist among genes of the organisms to be compared.

The comparative genome browser is usually entered from a page describing a gene. To invoke it, select Align in Multi-Genome Browser from the operations box on the right side of the page. You will first be asked to specify the organisms whose genome regions you wish to compare. The selected set of organisms is remembered for some time by the Web browser. If you wish to change them, use the command Change organisms/databases for comparison operations.

When the comparative genome browser is invoked from a gene page, that gene and its organism orchestrate the rest of the alignment. In the display, the top-most replicon is the reference, against which the comparisons are made by following the ortholog links for every gene of the top replicon in its visible section. The selected gene that is the focus of the comparison is highlighted on each replicon by a thick outline and a slanted hashed background. These selected genes are lined up at the center position of their lengths. The magnified region can be adjusted by the following methods:

  • An alignment for a new gene can be displayed by entering the gene name in the gene entry box, then clicking the “Go” button.

  • The panel of navigation arrows can be used to translate the view left or right, and to zoom in and out.

Genes with solid colors have links to orthologs. Corresponding orthologs are assigned the same color, out of a set of a dozen colors that will be reused repeatedly. Genes for which no ortholog links were found in the PGDB are not colored. The other display features are the same as described for the regular genome browser.

6  SmartTables

A SmartTable is a collection of PGDB objects, such as genes or pathways, together with associated data, that can be displayed in tabular form. SmartTables (formerly called “Web Groups”) allow you to store experimental results (e.g., a set of genes of interest from an experimental study), analyze those results (e.g., perform an enrichment analysis to learn if those genes share common biological processes, or paint those genes into a metabolic map diagram), and share SmartTables with colleagues. SmartTables can be created from tabular data files, and from query results, and SmartTables can be exported to files. Transformations, filtering, and set operations on SmartTables can be performed. Example transformations include:

  • Transform a gene SmartTable to a SmartTable of pathways in which the genes participate

  • Transform a SmartTable of genes to a SmartTable of promoters, or transcription binding sites, or transcriptional regulators, that control those genes

  • Transform a SmartTable of pathways to a SmartTable of metabolites that are substrates in the pathway

Web SmartTables are stored in a user’s web account, so to create SmartTables you must have an account and be logged in. Users who aren’t logged in can view and download SmartTables that others have made public. A SmartTable has a persistent URL, so they can be used as a data publishing and sharing platform. SmartTables can be private, public, or shared with a selected SmartTable of users.

Firefox is the recommended browser to use with SmartTables. Other browsers will work but have not been as thoroughly tested with SmartTables and thus minor issues may arise. Use of Internet Explorer is discouraged, but, for the most part, will work as well.

A number of SmartTables operations can also be invoked via web services.

6.1  SmartTable Structure and Display

Some terminology: A SmartTable consists of a set of rows and columns. A cell is the intersection of a row and a column, and can contain one or more values, which may be Pathway Tools objects (such as genes or pathways), numbers, or strings.

A SmartTable is displayed on its own web page (see the figure below). The URL of this page is persistent and may be bookmarked or shared. At the top of this page are some metadata about the SmartTable, such as its title and a textual description (these can both be edited by clicking on them). Information about the SmartTable’s contents and sharing status is also displayed.

In this example, we started with a SmartTable of genes (in the first column after the checkboxes), and added some properties.

Typically the first column of a SmartTable will be a set of PGDB frames (e.g., a set of genes from a search or from an experimental result) and other columns will be properties or other values derived from the first column (e.g., the products of the genes in the first column). The blue column headings are clickable and can be used to select individual columns for certain operations. A SmartTable must always contain at least one column.

If a SmartTable has more elements than will fit on a page, paging controls will be displayed above the column headings. All rows can also be dispalyed on one page.

The checkboxes on the left are used to select subsets of the SmartTable’s rows for deleting or copying to a new SmartTable. Note that checkboxes work properly over multiple pages — that is, some rows can be checked, a new page can be navigated to and check some more, and the ones on the first page will still be considerered checked. Checking/unchecking the checkbox in the header will check or uncheck all rows in the SmartTable (not just the ones on the current page). This checkbox behavior also applies to any lists of SmartTables.

6.2  SmartTable Directory

The SmartTable directory page provides a list of accessible SmartTables. It may be accessed via any of the items under the SmartTables menu. The directory is composed of several tabs:

By default the SmartTable directory is ordered by update time (most recently changed first), but it can be resorted using the sort arrows in column headings.

6.3  Creating a SmartTable

There are a number of ways to create a SmartTable. To create a saved SmartTable you must be logged-in to the PGDB website; otherwise the SmartTable will be temporary.

6.3.1  Creating a SmartTable From a Search

The results of web searches (e.g., from the Search → Search compounds page) can be converted to a SmartTable by means of the “Turn into a SmartTable” button.

6.3.2  Creating a SmartTable Manually

An empty SmartTable can be created and filled in by hand. To do this:

  1. Go to the SmartTables directory page (SmartTables → My SmartTables)

  2. Select the New → Empty SmartTable action from the operations box on the right. This creates a SmartTable with a single column and no rows.

  3. Add a row by clicking the “Add row” link at the bottom of the display.

  4. The row has an autocompleting text field. Enter an object name (e.g., a gene or metabolite name) and hit Enter.

  5. Repeat steps 3 and 4 for the rest of the SmartTable.

6.3.3  Creating a SmartTable Via Tab-Separated File Import

A SmartTable can be created by importing a text file in tab-separated value format.

  1. Go to the SmartTable directory page.

  2. Select the New → SmartTable from Uploaded File... action from the operations box on the right.

  3. A panel will appear that will prompt for a file to be seleced and uploaded.

Unless “Try to make objects” is selected in the upload menu, values in uploaded files are initially just strings. To turn them into recognized database objects (e.g., genes) after importing, select the appropriate column and use the Column → Set Type... action.

6.3.4  Creating a SmartTable Containing Chromosomal Regions and Sequence Variation Data

A SmartTable can be created by importing a text file that specifies the coordinates of replicon regions, and associated sequence variants, in a tab-separated file format. A special transformation supports further analysis and interpretation of sequence-variant data — see Section 6.5.2

To perform an import via a file of replicon coordinates, do the following:

  1. Select the organism with which the SmartTable will be associated.

  2. Go to the SmartTable directory page.

  3. Select the New → SmartTable from Replicon Coordinates... action from the operations menu on the right.

  4. A panel will appear that will prompt for a file to be specified and uploaded.

The input file format is as follows (an example file is available at http://brg.ai.sri.com/ptools/replicon-coords.dat):

  • Column 1: replicon name (as listed in organism summary) – defaults to first replicon stored in PGDB, invalid/blank value uses default

  • Column 2: region start coordinate

  • Column 3 (optional): region end coordinate — defaults to start

  • Column 4 (optional): nucleotide letter(s) for the substition at this region

  • Column 5 (optional): comment describing the region

Replicons can be specified in the file by either frame name or common name. Nucleotide coordinates for the start and end positions are relative to the replicon specified. If only either a start or end position is given, it is defined as a single nucleotide region. Any invalid data may result in a row containing “NIL” and the row may have other unexpected results.

The resulting SmartTable will contain either one or two columns — the first column will contain the specified regions; the second column will contain region comments, if supplied; see example below. Clicking on a cell in the first column will open the genome browser around that region.

6.3.5  Creating a SmartTable From an Existing SmartTable

There are a number of ways to create new SmartTables from existing SmartTables.

A SmartTable can be copied via the New → Copy of this SmartTable action. Additionally, if the SmartTable can only be viewed but not edited, such as “Special SmartTables”, a message will appear prompting the user to create a writeable copy of the SmartTable.

A column of a SmartTable can be used and have its contents turned into a new SmartTable, using the + icon that appears in column headings, or using the New → SmartTable from Column action (these are equivalent operations).

Rows of a SmartTable can be used to create a new SmartTable that shares the same column headings by selecting the desired rows using the checkboxes at the beginning of each row, then using the New → SmartTable from Selected Rows action.

See also the Filtering operation which has the option of creating a new SmartTable based on a filtered subset of rows.

6.4  Manipulating SmartTable Contents

SmartTables can be manipulated in a large number of ways, both at a fine level of granularity (such as editing individual cells), and by applying transformations to an entire SmartTable.

6.4.1  Adding a Property Column

Property columns show attributes (slot values) of an object, such as the molecular weight of a compound or the pI of a protein. The most common situation is to add a property column for the objects listed in the first column of the SmartTable, but the Add Property Column dropdown menu will list available properties to show for the currently selected column. Frequently used properties include Common-Name, Comment, Citations, and Creation-Date. The ability to create a property column or an enrichment column from another property column may not be available.

6.4.2  Adding an Empty Column

Columns can be added to a SmartTable from the Add → Column action (which creates an empty editable column), or by using the transform and property selectors (see below).

6.4.3  Editing a Column

Editable columns (which are those that are not defined by a transform or other computation) can be edited by clicking the edit icon in the column header. This changes the cells to editable fields. Clicking the icon a second time will turn off editing for that column.

6.4.4  Adding a Row

A row can be added by means of the link at the bottom of a SmartTable, or using the Add → Row action (they are equivalent). Any editable cells in the new row are displayed in edit mode, so values can be entered.

Additionally, certain object pages, such as those for a gene or protein, have an “Add to SmartTable” button, which places the object in an existing SmartTable.

6.4.5  Deleting Rows

Rows can be deleted by selecting them using the checkboxes on the left of the display, then choosing the Delete → Delete checked rows action.

6.4.6  Moving and Deleting Columns

Columns can be rearranged with the Column → Move ... menu items. They can be deleted either with the Columns → Delete menu item. These operations apply to the selected column. A column can also be deleted by clicking on the “–” icon in the column header. This icon will not be present if deleting the column is not currently a valid action, such as when the SmartTable has only one column.

6.4.7  Sorting

SmartTables can be resorted on the values of any column by means of the sorting controls (triangles) in column headers.

6.4.8  Filtering

Filtering means selecting a subset of rows from a SmartTable according to some criterion. The filter menu context may differ between column types. For example, numeric columns will be given options to specify a range value condition, such as greater than, equal to, less than, and so on. Likewise, string columns have options to filter based on various substring conditions. To filter, select the appropriate column and choose the Filter action. A dialog appears that allows for selection based on the filtering criterion.

The filter can either modify the SmartTable in place or create a new SmartTable with a specified name. In either case, if the resulting SmartTable is empty, an error is displayed instead of completing the operation.

6.4.9  Set Type

The values in cells have a type, which may be either a Pathway Tools object (e.g., a gene) or a string or number. Generally values in a single column will all be of the same type, but this is not required. The type can be controlled by means of the Column → Set Type... action. In general this is used after importing data from a file, to turn string values into Pathway Tools objects.

6.4.10  Set Operations

Under the Set Operations... action, various set operations based on set theory, such as union, intersection, and difference, can be performed between the current SmartTable and a second SmartTable. A new SmartTable can be created or the current SmartTable can be modified in-place. For example, these operations can compute the intersection (items common to both) of two SmartTables.

6.5  SmartTable Transformations

Transformations apply a computational procedure to all cells within a selected SmartTable column to generate a new column in that SmartTable. To perform a transformation, select a column, then click on the Transformations drop-down menu. Depending on the type objects contained within the selected column, different transformations will be available, e.g.,, different transformations are available for genes than for metabolites. Overall, the difference between properties and transformations is that properties of an object are stored in the database containing that object, whereas transformations are computed by the software.

The easiest way to see what transformations are available for a column type in question is to view a SmartTable containing that type of column and examine the transformations drop-down menu.

Example transformations include: transforming a column of genes to their upstream binding sites, to their promoters, to their Gene Ontology terms, to their orthologous genes within another PGDB, or to the set of genes regulated by those genes; transforming a column of pathways to the genes within the pathways, to the metabolites within the pathways, or to the reactions within the pathways. The following subsections present transformations on metabolites, and a transformation for analyzing sequence variant information.

6.5.1  Transformations on Metabolite Columns

The menu below shows the transformations available when a column of metabolites is selected. For example, the “Pathways of compound” transformation will generate a new column where each cell in the new column contains the set of metabolic pathways in which the compound in the selected cell in the same row occurs. Imagine that we want to create a new SmartTable consisting of all pathways that the preceding SmartTable of metabolites are in, that is, to create a new SmartTable consisting of the result of the preceding transformation. We can do so by clicking the “+” at the top of the column containing the pathways. That operation will create a new SmartTable with two colums: Column 1 contains a non-duplicative list of all pathways in the preceding column; Column 2 lists the metabolites from Column 1 of the previous SmartTable that are present in each pathway.

The transformation “Compare – remove objects present in other species PGDB” will generate a new column containing those metabolites not present in another specified PGDB. The transformation “Compounds – proteins that bind compound” will generate a new column containing all proteins known to bind each corresponding metabolite (e.g., as an enzyme activator or transcription-factor ligand).

6.5.2  Transformations on Chromosomal Regions Containing Sequence-Variant Information

This transformation takes as its starting point a SmartTable of genome regions and sequence substitutions within those regions, as described in Section 6.3.4. The transformation “Sequence – nearest gene to DNA region” adds several new computed columns to such a SmartTable, shown here:

Column 3 lists the gene whose coding region is nearest to the DNA region in the first column.

Columns 4 and 5: If the coding region of the nearest gene overlaps the DNA region in the first column, then Column 4 says “intragenic” followed by the DNA strand from which the gene is transcribed; Column 5 lists the amino-acid change caused by the substitution at the given region (the column is empty for RNA-coding genes). If the coding region of the nearest gene does not overlap the region in the first column, Column 4 states the distance from the region in the first column to the coding region of the nearest gene, and Column 5 is blank.

A natural next analysis step is to click on the top of the Nearest Gene column and then perform an enrichment analysis (described in the next section) to determine what these genes have in common.

6.6  Enrichment Analysis of SmartTables

Enrichment analysis is a computational technique for identifying known categories of objects (e.g., pathways) that are statistically over-represented in a set of objects (e.g., genes that are significantly up-regulated in an expression experiment). For example, enrichment analysis allows us to ask whether a set of genes contains more genes regulated by a given transcriptional regulator than one would expect to occur by chance, or more metabolites in a given metabolic pathway than one would expect to occur by chance. Please see the Pathway Tools Users Manual for more information on enrichment, including a description of the parameters available on the web.

Enrichment analysis can be invoked on a SmartTable of objects in a SmartTable by:

  1. Selecting the column to be operated on (such as a column of genes or a column of compounds)

  2. Choosing an item from the Enrichments selector and clicking the button

  3. Choosing parameters from the dialog

This operation always creates a new SmartTable, which contains three columns: the enriched objects, the p-value, and the matched objects from the original SmartTable. The new SmartTable will be sorted by p-value, lowest (most significant matches) first.

6.7  Exporting and Sharing a SmartTable

Once a SmartTable is defined, there are a few things that can be done with it (other than browse it on the web). The SmartTable can be exported in a variety of ways or shared with others.

6.7.1  Export to a Spreadsheet File

SmartTables can be exported to tab-separated value format files using the SmartTables → Export → to Spreadsheet File ... menu command. When selected, the option is given whether to export the frame names of objects stored in the SmartTable or to use the common name of the objects. Keep in mind that, generally, it’s easier to re-import data by using frame names in the generated file, but the file will also be more difficult to read.

6.7.2  Export to a FASTA File

SmartTables with a gene column can be exported to FASTA format files using the Export → to FASTA File... action. The sequences used will be the currently selected column and the names used will be a string representation of the values in the first column.

6.7.3  Paint Data (on Cellular Overview)

Objects of the appropriate types (any types that have frame representations in the current PGDB, such as compounds, reactions, or genes) can be displayed over the cellular overview using the Paint Data → On Cellular Overvew command. Be sure to select the appropriate column first.

If the first column of the SmartTable contains objects (e.g. genes, compounds), and one or more other columns contain numerical data values, then the SmartTable can be displayed on the Cellular Overview Omics Viewer using the command Paint Data → On Cellular Overvew Omics Viewer. You will be asked to select the data columns you wish to display, and to specify what kinds of values they are (e.g. absolute or relative, log or linear). Another way to paint data from a SmartTable on the Cellular or Regulatory Overview is to navigate to the desired overview and use the command Overlay Experimental Data → From SmartTable.

6.7.4  Sharing a SmartTable

By default, SmartTables are readable and writeable only by their creator. Access can be granted to other users by means of the Sharing dialog, available via the Sharing... command.

Access by the general public is controlled by the first two checkboxes. “Public?” means that anyone can view the contents of the SmartTable; “Public and writable?” means that anyone can view and edit the contents of the SmartTable (editing is restricted to logged-in users).

Access can also be controlled on a per-user level using the “Share with users” boxes, which accept email addresses of registered Pathway Tools users.

6.8  Browsing SmartTables and Users

6.8.1  User Pages and Directory

As part of SmartTables, an enhanced public user page has been created, which can be accessed by clicking on any user name in the SmartTable directory (try the Public SmartTables tab). A user page displays the user’s name, an optional user-settable graphic picture, and a list of the user’s public SmartTables. There is also a user directory available.

6.8.2  Browsing a SmartTable

Under the Browse this SmartTable command, the current SmartTable can be browsed one row at a time. Depending on the type of data in the SmartTable, various text and image elements will be displayed in a single page for a row. In the upper-left corner of the page, a grey box will be shown that displays the name of the SmartTable being browsed as well as a Next link to move to the next row’s page. The Clear link can be used to stop browsing and stay in the current page.

7  Omics Data Analysis

Pathway Tools based Web sites offer multiple tools for analysis of gene expression, metabolomics, and other large-scale datasets.

Omics data file format is described in Section 8.3.1.

A number of these capabilities are also available as web services.

Multi-Omics Analysis

The following tools can be used for analysis of combined datasets from multiple high-throughput technologies.

  • Paint multi-omics data onto metabolic map — Colors reaction arrows in the metabolic-map diagram with colors indicating gene-expression and/or protein-expression levels; color metabolite nodes in the diagram with colors indicating metabolomics data. Data can be uploaded from a file or imported from a recently visited SmartTable. The uploaded data can contain a mixture of rows describing genes, proteins, and metabolites.
    [documentation]
    [To start: Metabolism → Cellular Overview then Right Operations Menu → Overlay Experimental Data]
    When uploading a file that contains multiple types of data, be sure to specify that the items in the first column can be any of genes, proteins compounds, etc.

  • Paint multi-omics data onto a Pathway Collage — Generate a user-customizable diagram containing a set of pathways of interest, overlaid with multi-omics data. There are multiple ways to specify the pathways to be included.
    [documentation]
    [To start: Metabolism → Pathway Collages]

  • Paint multi-omics data onto pathway diagram — Allows visualization of large-scale datasets on individual pathways.
    [file format documentation]
    [To start: Visit a pathway page, then select Right Operations Menu → Customize or Overlay Omics Data on Pathway Diagram]
    In the pop-up window, in addition to customizing which pathway elements appear in the diagram, you may specify a file of Omics data to be displayed. If the file contains multiple types of data, be sure to specify that the items in the first column can be any of genes, proteins compounds, etc.

Gene Expression and Proteomics Analysis

Many of the following tools can accept proteomics as well as gene-expression data.

  • Paint gene-expression data onto metabolic map — Colors reaction arrows in the metabolic-map diagram with colors indicating gene-expression and/or protein-expression levels. Data can be uploaded from a file, imported from PortEco or GEO, or imported from a recently visited SmartTable.
    [documentation]
    [To start: Metabolism → Cellular Overview then Right Operations Menu → Overlay Experimental Data]

  • Table of highly over/under-expressed pathways — When painting a dataset onto the metabolic map, the upload dialogue offers the option of generating a table of the N most highly perturbed pathways. Each pathway is assigned a Pathway Perturbation Score (PPS), which attempts to measure the overall extent to which a pathway is up- or down-regulated, by averaging the level of deviation from zero (in either direction) over all the reactions in the pathway. If multiple data columns are specified, a differential score (DPPS) is computed, which attempts to measure the extent to which a pathway exhibits change between timepoints.
    [documentation]
    [To start: Use previous tool but for the Show data: field, select either As a table of pathway diagrams or Both on this diagram and as a table in a new tab and specify the number of pathways to include in the table.]

  • Paint gene-expression data onto a Pathway Collage — Generate a user-customizable diagram containing a set of pathways of interest, overlaid with gene expression data. There are multiple ways to specify the pathways to be included.
    [documentation]
    [To start: Metabolism → Pathway Collages]

  • Paint gene-expression data onto single pathway diagram.
    [file format documentation]
    [To start: Visit a pathway page, then select Right Operations Menu → Customize or Overlay Omics Data on Pathway Diagram]
    In the pop-up window, in addition to customizing which pathway elements appear in the diagram, you may specify a file of Omics data to be displayed.

  • Paint gene-expression data onto regulatory map — Colors genes in the regulatory overview diagram with colors indicating gene-expression levels. Data can be uploaded from a file, imported from GEO, or imported from a recently visited SmartTable.
    [documentation]
    [To start: Genome → Regulatory Overview then Right Operations Menu → Overlay Experimental Data]

  • Paint gene-expression data onto genome map diagram — Colors genes in the genome map with colors indicating gene-expression levels. This tool is not yet available for Web sites, but does function in the desktop version of Pathway Tools.

  • Enrichment Analysis — Given a SmartTable of genes, determines whether that gene set is statistically over-represented for genes within certain metabolic pathways, or for genes in certain Gene Ontology categories, or for genes that are regulated by shared regulators.
    [documentation]
    [To start: Visit a SmartTable page]

  • SmartTable Transformations — Given a SmartTable of genes or proteins (e.g., the highly expressed genes from an expression dataset), transform those genes to the set of pathways containing the genes, or to the set of regulators that regulate those genes.
    [documentation]
    [To start: Visit a SmartTable page]

  • Genome Browser Tracks — Superimpose positional datasets such as ChIP-chip or RNA seq data on genome regions for visual interpretation.
    [documentation]
    [To start: Genome → Genome Browser then, if not following this direct link, click Show Tracks button.]

Metabolomics Analysis

  • Monoisotopic mass search — Enables searching of multiple monoisotopic masses against all metabolites in the selected PGDB.
    To start: Search → Search Compounds.

  • Paint metabolomics data onto metabolic map — Colors metabolite nodes in the metabolic-map diagram with colors indicating observed metabolite levels. Data can be uploaded from a file, or imported from a recently visited SmartTable.
    [documentation]
    To start: Metabolism → Cellular Overview then Right Operations Menu → Overlay Experimental Data
    When uploading a file, be sure to specify that the items in the first column are compound names and/or identifiers.

  • Table of highly perturbed pathways — When painting a dataset onto the metabolic map, the upload dialogue offers the option of generating a table of the N most highly perturbed pathways. Each pathway is assigned a Pathway Perturbation Score (PPS), which attempts to measure the overall extent to which a pathway is up- or down-regulated, by averaging the level of deviation from zero (in either direction) over all the reactions in the pathway. If multiple data columns are specified, a differential score (DPPS) is computed, which attempts to measure the extent to which a pathway exhibits change between timepoints.
    [documentation]
    [To start: Use previous tool but for the Show data: field, select either As a table of pathway diagrams or Both on this diagram and as a table in a new tab and specify the number of pathways to include in the table.]

  • Paint metabolomics data onto a Pathway Collage — Generate a user-customizable diagram containing a set of pathways of interest, overlaid with metabolomics data. There are multiple ways to specify the pathways to be included.
    [documentation]
    [To start: Metabolism → Pathway Collages]

  • Paint metabolomics data onto single pathway diagram.
    [file format documentation]
    To start: Visit a pathway page, then select Right Operations Menu → Customize or Overlay Omics Data on Pathway Diagram
    In the pop-up window, in addition to customizing which pathway elements appear in the diagram, you may specify a file of metabolomics data to be displayed. Be sure to specify that the items in the first column are compound names and/or identifiers.

  • Metabolite Enrichment Analysis — Given a set of metabolites, determines whether that metabolite set is statistically over-represented for metabolites within certain metabolic pathways.
    [documentation]
    To start: Visit a SmartTable page.

  • SmartTable Transformations — Given a SmartTable of metabolites (e.g., the highly expressed metabolites from a metabolomics dataset), transform those metabolites to the set of pathways containing the metabolites, or to the set of reactions containing those metabolites.
    [documentation]
    To start: Visit a SmartTable page.

Omics Pop-Ups for Cellular Overview

The Cellular Overview enables the user to drill-down to see the data available for specific genes or metabolites. First, mouse over a reaction or metabolite in the Cellular Overview and lock the resulting tooltip in place to create a caption window.

Omics Pop-Ups enable users to see bar charts, X–Y plots, or heat maps of omics data for single genes or metabolites, or for all genes or metabolites within a pathway. The pop-ups can be customized for a publication or to otherwise make them more legible. To view an omics pop-up for single genes or metabolites, first examine the associated caption. The caption pop-up will include an “Omics” button, if there is omics data associated with the selected node. Selecting the “Omics” button transforms the pop-up into a graphic display of the data. Right-click on a reaction node in a pathway for which there is omics data to expose a menu including the item “Display Omics Data for Every Node in Pathway: <pathway name>”. The graphics will include the moics data for every gene or metabolite in the pathway to which this reaction belongs.

Generating a Table of Most Highly Perturbed Pathways

Uploaded omics data can be used to generate a table of most highly perturbed pathways. Pathways are ranked by Pathway Perturbation Score (PPS) for single-column datasets, and by Differential Pathway Perturbation Score (DPPS) for multi-column datasets. The user can select how many of the highest-scoring pathways are included in the table. To generate this table, start from the Cellular Overview Diagram (Metabolism → Cellular Overview) for the organism of interest. Use the Upload Data from File command to enter your datafile information. By default, the “Show data” option will overlay the data onto the Cellular Overview Diagram. However, you can instead request that the data be shown either “As a table of pathway diagrams” or “Both on this diagram and as a table in a new tab” — either one of these options will cause a table to be generated. You must specify how many pathways should be included in the table.

The Pathway Perturbation Scores and Differential Pathway Perturbation Scores are computed as follows:

PPS: The PPS attempts to measure the overall extent to which a pathway is up- or down-regulated, by combining the activation levels of all reactions in the pathway. A Reaction Perturbation Score (RPS) is computed for each reaction as the maximum absolute value of all data values for objects associated with the reaction (e.g. genes for gene expression data, compounds for metabolomics data). If the data values are not already in log format, they are first converted to log values. For example, if a reaction has three associated genes with gene expression values -1.5, .3 and 1.2, the RPS would be 1.5.

To compute the PPS, we sum the squares of the RPSs for all reactions in the pathway for which data are available, divide by the number of reactions for which data are available, and take the square root of the result (we use the square of the RPSs instead of a traditional average in order to weight larger RPSs more heavily). For a pathway containing N reactions: PPS = sqrt[(RPS12 + RPS22 + ... + RPSN2)/N].

DPPS: For multi-column datasets, the Differential PPS (DPPS) computes the maximum extent to which a pathway is perturbed between columns. The DPPS is computed the same way as the PPS, by combining RPS values for each reaction. However, when computing the RPS from the entities (e.g. genes, metabolites) associated with a reaction, the data value we use is not the entity’s expression value for any single column, but rather the difference between its maximum and minimum values across all columns. For example, if a single gene in a three-column series has values .1, 2, -1.5, the value for that gene used in the RPS computation would be (2 - -1.5) = 3.5. The differential RPS (DRPS) is then computed as the maximum of these difference values for all entities associated with the reaction. The DPPS is computed from these DRPS values as above, using DRPS values in place of single-column RPS values, i.e. DPPS = sqrt[(DRPS12 + DRPS22 + ... + DRPSN2)/N].

Because PPS measures perturbation in either direction, the DPPS is not a simple difference between PPS values – a pathway can have a high DPPS even if its PPS is relatively similar for each column if either (a) the value for some object swings between a large positive value and a similar magnitude negative value between columns, or (b) if different reactions in the pathway experience their large perturbations in different columns.

Note that for metabolomics datasets, the RPS value for a reaction is the maximum data value for all metabolites (reactants and products) in the reaction. Because side metabolites (those metabolites not shared between adjacent reactions in a pathway) are omitted from the pathway diagrams in the table, and because the colored circles showing metabolite expression levels are shown for main (shared) metabolites only, some data values may not be visible on the diagram.

For multi-omics datasets, the RPS calculation for a reaction will be the maximum of all data values associated with associated with the reaction, whether those are values for a metabolite, gene, protein, or the reaction itself. This is really only useful if all data values are normalized, such that a given value for a metabolite is of roughly equal consequence as that same value for a gene. Otherwise, the RPS and therefore the PPS computations will be distorted. Thus, we do not recommend using this tool with most multi-omics data.

8  Cellular Overview (Metabolic Map Diagram)

The Cellular Overview diagram depicts the biochemical machinery of an organism as described in a PGDB. Each node in the diagram (such as the small circles and triangles) represents a single metabolite, and each blue line represents a single bioreaction. This page describes the organization of the Cellular Overview and the operations users can perform to interrogate it. Different PGDBs will have different components of the diagram present or absent depending on what was included by the PGDB authors.

Note: The Cellular Overview has been tested on Internet Explorer 8.0, Firefox 3.5, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the Cellular Overview since its performance can be very poor. The performance of the three other browsers are much better compared to Internet Explorer.

Note: The desktop version of Pathway Tools that you can install locally provides different and additional operations on the Web Overview. Click here for more details.

Organization of the Cellular Overview: Within the cytoplasmic membrane, the small-molecule metabolism of the organism is depicted in several regions. The glycolysis and the TCA cycle pathways, if present, will be placed in the middle of the diagram to separate predominately catabolic pathways on the right from pathways of anabolism and intermediary metabolism on the left. The existence of anaplerotic pathways prevents rigid classification. The majority of pathways operate in the downward direction. Signal transduction pathways, if present, run along the bottom of the diagram. Pathways are grouped into related clusters as indicated by the shaded regions.

The large group of individual reactions at the right of the diagram represent reactions of small-molecule metabolism that have not been assigned to any pathway.

The shapes of the metabolite icons represent various compound classes. The different shapes used are as follows:

  • Triangle: Amino Acids

  • Square: Carbohydrates and Derivatives

  • Diamond: Proteins and Modified Proteins

  • Vertical Ellipse: Purines

  • Horizontal Ellipse: Pyrimidines

  • T: tRNAs

  • Circle: All other compounds

  • Filled shape: Phosphorylated compound

The one or more cellular membranes of the organism are depicted, depending on the cellular architecture of the organism, and on whether that architecture was specified when the PGDB was created. Transporters will be depicted in the membrane in which they reside as blue lines whose arrowhead indicates the direction of transport. For gram-negative bacteria, periplasmic proteins will be depicted when identified in the PGDB.

Getting Started: The Cellular Overview is accessible from the command Metabolism → Cellular Overview. The current selected organism, as displayed on the right in the banner of the Web page, is used to generate the Cellular Overview diagram. The generation of the diagram can take some time if it was not previously generated by the Web server.

Once the Cellular Overview diagram is displayed, the most common operation is to move it left, right, up or down, since sometimes the entire overview cannot fit in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the widget located on the left top of the screen.

To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the Cellular Overview. By increasing the zoom level (i.e., going up in the ladder), names of compounds, enzymes, reactions, and pathways are eventually displayed.

Note that depending on the speed of the server, generating large Cellular Overviews (i.e., a zoom-in near the top of the ladder) might require some time.

Mousing over a Cellular Overview icon (e.g., a ‘tee’ icon for a tRNA) displays information about the object in a small tooltip popup. Click the ‘Keep Open’ button to keep that informational window open; drag the window by its title to re-position it.

Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys can be customized on your Mac via the system preferences panel.

All the commands for the Cellular Overview are available from the right-clicking menu or the operations box on the right side of the page.

The Cellular Overview can display your experimental data — see Section 8.3.

MetaCyc, which is a multi-organism database, has no cellular diagram.

8.1  Summary of Commands

8.1.1  Summary of Mouse Commands

  • Left-Click on a object open a tooltip (i.e., small window) to display basic data about the object. The tooltip contains further Web links to display more data about the object or objects related to the clicked object.

  • Double-Left-Click in a blank area location does a zoom-in centered at that location.

  • Left-Click (and holding) in a blank area allows to pan (i.e., move) the entire Cellular Overview left, right, up and down. You need to hold down the mouse button to do the panning.

  • Right-Click in a blank area opens a menu to invoke general commands applicable to the entire Cellular Overview. These commands are also available in the top menu bar under the menu ‘Cellular Overview’. All searching (highlighting) commands are under these menus. See the following list for an explanation of the general commands.

8.1.2  Summary of Menu Commands

The commands in the Cellular Overview menu are:

    Overlay Experimental Data (Omics Viewer)

    • Upload Data from File

    • Enter/Paste Data from Keyboard

    • Import Data from PortEco

    • From Recent Datasets (PortEco or GEO only)

    • From SmartTable

    Highlight Pathway(s)

    • By name or Frame ID : This operation lets the user to search a Pathway by its name or Frame ID.

    • By Substring : This operation lets the user to search a Pathway by substring.

    • By Curation: This operation let the user search Pathways according to their curation status. There are three types of curation information:

      • Comments

      • Citations

      • Evidence

    • By Evidence : This operation let the user to search Pathways according to the evidence code.The evidence code are presented in the drop-down menu.

    Highlight Reaction(s)

    • By name or Frame ID

    • By Substring

    • By EC number : This operation lets the user to search the reactions based on the EC number

    • By Enzyme name

    • By Curation : This operation let the user to search Reactions according to their curation status. There are three types of curation information:

      • Comments

      • Citations

      • Evidence

      By Evidence

    Highlight Gene(s)

    • By name or Frame ID

    • By Substring

    • From File

    • By Replicon : This operation colors the reactions according to the replicon - chromosome or plasmid - on which their genes are located

    • By Regulon : This operation lets the user select a transcription factor, and highlights all reactions whose genes are in operons that are regulated by that transcription factor.

    • By Pan-Genome Core Genes : In a Pan-Genome PGDB, shows all the reactions of the genes that are shared among all the strain PGDBs, in other words, each gene had orthologs to all the other strains.

    • By Pan-Genome Unique Genes : In a Pan-Genome PGDB, shows all the reactions of the genes that have no orthologs at all, and are thus uniquely contributed by only one single strain.

    Highlight Enzyme(s)

    • By name or Frame ID

    • By Substring

    Highlight Compound(s)

    • By name or Frame ID

    • By Substring

  • Clear All Highlighting removes all the highlighting from the cellular diagram.

  • Show Legend opens a small window to show a legend of the icons used in the cellular diagram.

  • Help opens a new Web page to present a documentation on the Cellular Overview.

The following sections describe in more detail these operations and some others.

8.2  Searching and Highlighting

In this document, ‘Searching’ and ‘Highlighting’ are synonymous terms. There are several commands to search for reactions, pathways, enzymes, genes, and compounds. The search commands are available from the right-click menu and the the Cellular Overview menu from the top menu bar.

When a search is done, the objects found are highlighted in the Cellular Overview diagram which also creates a new overlay. The list of overlays is shown in the Layer Switcher panel on the right of the Overview Web page. This panel might be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. You cannot delete an individual overlay. But all highlighting, i.e., all overlays, can be removed by using the command Clear All Highlighting.

Since each overlay corresponds to a search operation, an overlay is identified with the keyword you entered to do the search. This is the name of the overlay. Next to each name a button labeled ‘List.’ Clicking ‘List’ opens a small dialog window listing the objects found for the corresponding search. Each object name is a hyperlink—clicking any of these links centers the Overview on the corresponding object and a red marker emphasizes its location.

Highlighting operations can also be applied via web services.

8.3  Cellular Omics Viewer — Overlay Experimental Data

The Pathway Tools Omics Viewer uses the Cellular Overview for an organism to visualize data from high-throughput experiments in a global metabolic pathway context. The input to the Cellular Omics Viewer is a set of gene, protein, and/or reaction names or identifiers, and data values for each gene, protein, and reaction. The Omics Viewer generates a new version of the Cellular Overview in which the reaction steps identified by the input genes, proteins, and reactions are colored according to the provided data values. For example, for a gene expression experiment, the software identifies the reactions catalyzed by the product of each supplied gene, and colors that reaction with a color value computed from the data point provided for each gene. The data values in the provided dataset are mapped to a spectrum of colors. Similarly, for metabolomics experiments, compound nodes in the Cellular Overview are colored according to the data values for the specified compounds. This facility enables the user to see which pathways are active or inactive under some set of experimental conditions.

The Omics Viewer can be used for:

  • Microarray Expression Data: Reaction lines (and protein icons, where present) are color-coded according to the relative or absolute expression level of the gene that codes for the enzyme that catalyzes that reaction step. The Omics Viewer allows a scientist to interpret the results of gene-expression experiments in a pathway context.

  • Proteomics Data: Reaction lines (and protein icons, where present) are color-coded according to the concentration of the enzyme that catalyzes that reaction step.

  • Metabolomics Data: Compound icons are color-coded according to the concentration of the compound.

  • Reaction Flux Data: Reaction lines are color-coded according to reaction flux values.

  • Other Experimental Data: Any experiment, high-throughput or otherwise, in which data values are assigned to genes, proteins, reactions or metabolites can be viewed in a pathway context using the Cellular Omics Viewer.

The Regulatory Overview also has an omics viewer, but it can display gene data only.

The Cellular Omics Viewer can show absolute data values (such as the concentration of a metabolite or protein, or the absolute expression level of a gene), or it can be used to compare two sets of experimental data by computing a ratio and mapping the ratios onto a color spectrum.

The superposition of multiple sets of experimental data on the Celllular Overview can also be animated to show, for example, how gene expression levels of enzymes change with time over the course of an experiment.

The Cellular Omics Viewer can also be invoked via web services.

8.3.1  Example Omics Data Files

Single gene expression experiment: Sample datafile and brief description See Cellular Overview for this data using ratio of columns 11 and 12.
Time series gene expression animation: Sample datafile and brief description See Cellular Overview for this data using columns 6 to 9.

8.3.2  Getting Started with Omics Data Display

The commands under Overlay Experimental Data (Omics Viewer), available from the right-click menu and the right side operations box, overlays experimental data over the Cellular Overview diagram.

Once the Overlay Experimental Data command is invoked, a window will open, called the Omics Form, where you can specify a data file to upload and various parameters to control the interpretation of the data. The parameters are documented in the window but more details follow on the file format and the parameters to specify.

8.3.3  Omics Dataset File Format

Experimental data is imported from a file provided by the user that is stored on the user’s computer. Each line of the file contains data for a single gene, protein, reaction or metabolite, and is of the form:

<names‑or‑IDs> <other‑columns> <data‑column1>...<data‑columnN>

Columns are separated by the tab character. Lines that start with # or ; are taken to be comment lines and are ignored by the program. The first column is called column 0, the second column is called column 1, etc. The program pays attention to column 0 and to the columns you tell it contain your data; the other columns are ignored.

Short examples (see 8.3.1 for full example files):

<verbatim> # In this file the data columns are columns 2-4. # # The first two lines specify genes. trpA        tryptophan synthetase        3.2        3.8        4.3        This line identifies the gene by a gene name # This next line identifies the gene by an accession number that is # listed on the EcoCyc gene page, hence we can be sure that EcoCyc # will recognize it. b0383        alkaline phosphatase        1.1        4.2        2.9         # # The next two lines specify metabolites. # TRP        L-tryptophan        6.3        2.3        4.3        Column 0 specifies the EcoCyc ID for this metabolite # This next line specifies spermidine by its name and KEGG ID and PubChem ID spermidineKEGG:C00315PubChem:6992097        spermidine        1.1        2.8        5.1 </verbatim>

<names‑or‑IDs> can be a list of one or more of the following fields separated by the “$” character. These alternatives give you multiple ways to identify a gene, protein, metabolite, or reaction. <ul> <li> A name for the object that is known to BioCyc (each BioCyc object typically includes extensive synonym lists; the software tries to match a name to the appropriate target).

<li> BioCyc IDs. Gene IDs from sequencing projects (such as the E. coli B-numbers) are generally acceptable and unambiguous. For protein or reaction data, EC numbers may be used. BioCyc pages (e.g., gene pages, metabolite pages) typically list the ID for the object toward the top of the page, and in the URL field of the page. Please verify that the IDs you are using are known to BioCyc.

<li> IDs in external databases. Many BioCyc DBs contain links to external databases such as UniProt and PubChem; the identifers in those links can be used in column 0 if prefixed by the name of the database, e.g., “UniProt:P00634.” </ul>

The numbers in the data columns can represent either absolute or relative (e.g., ratios or log ratios) values. If the data values represent absolute numbers, you may choose to visualize either a single column of absolute data values (select “Absolute” and one data column), or the ratio of two data columns as relative data values (select “Relative” and two data columns). If the data values themselves represent relative numbers, then you need supply only a single column number, and select “Relative.” An entry (a row of data for a gene or other object) may contain any number of data columns (for example, if you want to compile measurements from several experiments or time points into a single file), but only those data columns specified will be visualized at a time — all other columns will be ignored.

8.3.4  Color Scale

The color scale used depends on the type and, by default, the range of the data. Thus, a particular color may correspond to one gene expression level for one dataset, and a different gene expression level for another dataset, depending on the range of values or the supplied maximum cutoff value for each dataset. We use the spectrum from yellow/green to red, with yellow representing the lowest expression levels or ratios in the dataset, blue representing values in the middle, and red representing the highest values. Reactions for which no data was provided are drawn in black. The legend for mapping colors to data values is shown in the key, which is drawn to the right of the overview for a single experiment, or to the left for an animation.

A maximum cutoff value is chosen. By default, this is computed from the data. Alternatively, the user may supply a maximum cutoff value to use. Supplying the same maximum cutoff value for multiple experiments ensures that the same color scale is used for each one, so that the displays are directly comparable.

The minimum cutoff value is determined based on the maximum cutoff value and the other parameters. For absolute data values, we use a minimum cutoff value of zero. For relative data values that are not logs, we use the inverse of the maximum cutoff. For relative data values that are logs, we use the negative of the maximum cutoff. The color spectrum is then mapped evenly along a log scale between the maximum cutoff and the minimum cutoff.

In many cases, several genes or proteins, each with their own expression level or concentration, will map to a single reaction. This is because the reaction might be catalyzed by an enzyme complex made up of several gene products, or the reaction might be catalyzed by several isozymes, each with its own gene or genes. Since a reaction can only be colored a single color, we must choose which data value to use. For absolute data values, we choose the maximum. For relative data values, we choose the value whose log has the greatest deviation from zero, under the assumption that the user is primarily interested in identifying the entities whose behavior differ most between the two datasets.

8.3.5  Omics Viewer Results

Once the form to upload the data is submitted, by clicking the Submit button at the bottom of the Omics Form, the data are processed by the Web server. The time to process the file depends on the speed of the server and the amount of data in the file. The results are returned to your browser in the form of highlighted objects (e.g., reactions). If several data experiments are loaded from the same file (i.e., several data columns are provided from the uploaded file), an animation is created where each step of the animation corresponds to one experiment (i.e., one column).

A small dialog window is opened to display the color scale for the experiment(s) and buttons to control the animation, if any. You can pause, restart, go forward or backward, increase or decrease the animation speed from this window.

Overlaying exprimental data can be done at any zoom level. Once the data are uploaded and overlayed, zooming out or in can be done, and the corresponding highlighting will be adjusted accordingly.

The tooltips for highlighted objects show the experimental data. The data displayed changes during an animation.

9  Metabolic Models

Flux Balance Analysis (FBA) is a computational method for simulating an organism’s metabolic network. Metabolic models based on FBA depict a steady-state condition of a cell. Among the components of the simulation are the biochemical reactions in the organism’s metabolic network, the metabolites utilized by the organism as nutrients, the compounds secreted by the organism, and the biomass metabolites synthesized by the metabolic network. The nutrients are the inputs to the metabolic machinery, and the secretions and biomass metabolites are the outputs of that machinery.

For a quick overview of how to run a metabolic model through this web interface, please execute the following steps.

  • Select a database within this website for which a metabolic model exists, such as E. coli K-12 MG1655.

  • Enter the metabolic modeling area of this website via Metabolism → Run Metabolic Model.

  • Select a model from the table for execution by clicking the “Select” button for that model.

  • Click the “Execute” button. Once the model has been executed, results will be provided in the Results tab. You can visualize the resulting fluxes on a zoomable metabolic-map diagram by clicking the button “Show Fluxes on Cellular Overview”.

  • You can view more details of the model from the tabs labeled Biomass, Secretions, Nutrients, and Reactions.

9.1  How to Use the Web-MetaFlux Modeling Tool

The modeling tool available from this interface, called Web-MetaFlux, allows you to modify, execute, and store FBA-based metabolic models for organisms available on this website. The Web-MetaFlux interface provides a subset of the functionality of the MetaFlux tool available from the Pathway Tools desktop software. More precisely, Web-MetaFlux provides the ability to execute models for single organisms only (“solving mode”), whereas the desktop version provides several other modes: development mode aids creation of new metabolic models, knockout mode enables modeling of gene and reaction knockouts, and another mode enables modeling of organism communities.

The models on this website can be designated as public or private. You cannot directly modify a public model that you do not own, but you can copy such a model under your user account, and then modify the copy. Modifying a model can include adding or removing nutrients, secretions, or biomass metabolites, or adding or removing reactions. These modifications allow you to study the behavior of an organism for different growth conditions (e.g., anaerobic), or under different reaction availability. Note then that we use the term “model” to include parameters such as the nutrients on which the organism is to be grown.

As you make modifications to a model, those modifications are automatically saved permanently on the web server. Therefore, there is no save button. However, when you modify any entry, you must clearly indicate that you have finished modifying that entry by pressing Tab, pressing Enter, selecting an autocomplete choice, or clicking on any other entry.

9.2  Selecting a Model of Interest

Begin by finding an existing metabolic model that you want to execute, or an existing model that you want to modify and then execute. If you want to create a metabolic model de novo, install a local copy of the Pathway Tools software; this website does not support de novo model creation.

To find all organisms in this website having metabolic models, enter the organism selector (click “change organism database”), and select the tab “Having Metabolic Models.” Click on the organism you are interested in modeling to select that organism.

To see the metabolic models available for that organism, run the command Metabolism → Run Metabolic Model.

Click the “Select” button for a given model to select it for execution. Click “Copy” to make your own copy of the model in order to modify the model or its parameters.

9.3  Executing a Model

Once you have selected or copied a model, you are on the model summary page, which summarizes the state of the current model, and provides tabs near the bottom of the page for viewing the components of the model.

Click the “Execute” button to run the model. The results of execution will appear in the Results tab. If a biomass flux of 0.0 is obtained, then no cellular growth was obtained for the model given its specified reactions, biomass metabolites, nutrients, and secretions. If a positive biomass flux is obtained, then this number is the optimal value found for the objective function in the linear programming problem defined for this model. When the model is defined to optimize the production of cellular biomass, then the biomass flux is the steady-state cellular growth rate under the defined conditions of growth.

A table in the Results tab lists the flux values computed for reactions in the model that carry a non-zero flux. Those reactions can be visualized on a zoomable metabolic map diagram by clicking “Show Fluxes on Cellular Overview.” More details about the model run can be obtained by clicking the buttons “Show Solution File” and “Show Log File.”

9.4  Inspecting and Modifying a Metabolic Model

A set of four tabs on the model summary page, called Reactions, Biomass, Nutrients, and Secretions, allow you to inspect models owned by others, and to inspect and modify models that you own. Here we discuss these tabs in more detail.

9.4.1  Reactions Tab

Under the Reactions tab, you can specify the set of reactions from the PGDB (the organism database) to include in your model, which can be done in the following way.

  • A checkbox allows you to specify that all metabolic and transport reactions from the PGDB will be included in the model. The full list of reactions used in the model is output to the log file when the model is executed. Note that some reactions specified for inclusion in the model may be filtered out during model execution, such as reactions that are not mass balanced.

  • If you include all metabolic and transport reactions, you can also provide a list of reactions to exclude from the model, such as reactions believed to be down-regulated.

  • You can specify each reaction to include in the model instead of including all metabolic and transport reactions. You can also constrain the fluxes of specific reactions, such as to apply regulation to the model. Note that all reactions have a default lower bound flux of 0 and a default upper bound flux of 30,000.

    To add a reaction to the list of reactions to include or exclude, you must specify its frame id, which can be found in the URL line of reaction pages on the website. The reaction equation will be shown once a frame id is entered. Autocompletion is provided to help select a frame id, but if you do not know at all which frame ids to select, you can search the PGDB for a reaction based on its substrates or a pathway that contains it. Reactions can be removed from each list by clicking the red “x” button on the left side of each row.

9.4.2  Nutrients Tab

A metabolic model uptakes nutrients from the cell’s environment to activate biochemical reactions and produce biomass. The set of nutrients provided must be sufficient to activate the reactions needed to produce all of the specified biomass metabolites. Otherwise, the model cannot show growth.

Nutrients can be added and removed from a simulation using the Nutrients tab. The first row of the nutrients table can be used to add a nutrient based on its name (e.g., palmitoleate) or its frame id (e.g., CPD-9245). Autocompletion is provided for these two types of entries. Once a nutrient is added, optional parameters can be provided, such as a compartment, upper and lower bounds on the flux of the nutrient, and a comment. The compartment specifies the cellular location of the nutrient. Although a nutrient can be provided directly in the cytosol, a more realistic model should provide the nutrient into the extracellular space and provide transport reactions to import nutrients. Bounds are optional but typically at least one nutrient has an upper bound to limit the use of all the nutrients. It is common to limit the carbon source, although other nutrients can be used to control growth (e.g., oxygen). For example, if glucose is a nutrient and an upper bound of 10 is specified, then the flux of glucose in the model will not exceed 10. On the other hand, a lower bound on oxygen would force the uptake and use of oxygen by the model.

You can remove a nutrient by clicking the red “x” button on the far left of a row.

9.4.3  Biomass Tab

The computational objective of an FBA model is to produce all biomass metabolites. At least one metabolite must be specified as biomass, otherwise there is no objective to satisfy. The biomass metabolites must be produced given the specified nutrients, reactions and secretions, otherwise there is no growth. When the model is executed, the fluxes of biomass metabolites are maximized. Furthermore, the fluxes of the biomass metabolites must satisfy the coefficients specified in the Biomass table. Those coefficients are major determinants of the computed reaction fluxes, and they typically reflect the relative masses of the biomass component in dried-down cells. The maximization is constrained by the bound(s) on fluxes specified for nutrients and secretions, if any.

You can add a biomass metabolite using the first row of the table shown under the Biomass tab. You can remove a metabolite from that table by clicking the red “x” button on the far left of a row.

9.4.4  Secretions Tab

The Secretions tab operates very similarly to the Nutrients tab. Production of secreted metabolites is often required for model growth. It is important to note the difference between the secretions and the biomass metabolites. A biomass metabolite must be produced by the model whereas a secretion may be produced by the model. If a secretion is not produced, the model may still grow, but if any biomass metabolite is not produced, the model cannot grow.

In most cases, it is better to specify more secretions than is necessary, because secretions that are not active when a model is executed cannot stop growth. On the other hand, only one secretion that is needed for growth that is not specified can prevent growth. For example, if CO2 is produced by an organism under a given growth condition, but there is no way for the CO2 to escape the model, the steady-state constraint that fluxes are balanced at all metabolites will be violated, and no solution will be found for the model. It is therefore recommended to work with a set of secretions needed for many different growth environments (e.g., different sets of nutrients). Care should be taken to select the appropriate compartment for each secretion — in a more realistic model, each secretion will be transported to the extracellular space and then secreted from the model. If a secretion is not produced, it will be reported in the solution file when the model is executed.

The lower-bound flux and the upper-bound flux specified for a secretion can be used to limit the growth of an organism, and multiple such bounds can be specified at the same time on several secretions. When a model is executed, the computed solution fluxes will be constrained by these bounds.

10  Metabolic Route Search

The Metabolic Route Search (new in version 17.0, March 2013) is a software tool to search and analyze routes in the metabolic reaction network of an organism. Given a starting compound, a target compound, and other parameters, the tool finds the best (least cost) routes between these compounds taking into account atom conservation, path length and adding a minimum number of foreign reactions from MetaCyc.

The tool is activated by first selecting the organism to search using the “change organism database” link on the top right corner of the Web page and then by selecting the command Metabolism → Metabolic Route Search from the menu bar. This command is available for single organism databases only. For example, it is available for E. coli but not for MetaCyc. MetaCyc can be used, not as a native organism but as library of additional reactions, if the Web server was started with option -metaroute-metacyc (in which case the Web server is not publicly accessible). That is, in that case, MetaCyc can be used only as a set of foreign reactions to add to a selected single organism database.

The parameters to specify before clicking the “Search Routes” button are (some defaults are provided for most of them):

  • Start Compound The starting compound for the search. That compound can be entered by name or by using a unique id (i.e., frame id). A suggested list of compounds is given underneath the input text box when you start typing a compound name. You may also select the compound from that list.

  • Goal Compound The ending compound for the search. That compound can be entered by name or by using a unique id (i.e., frame id). A suggested list of compounds is given underneath the input text box when you start typing a compound name. You may also select the compound from that list.

  • Number of Routes An integer that specifies the maximum number of the best routes to find and display. The larger that number, the longer it takes to receive an answer.

  • Maximum Time The maximum number of seconds to use for the search. You may limit the search by entering a small number. If the tool times out, the best routes found so far are displayed and a text message states that a suboptimal solution is displayed.

  • Maximum Route Length The maximum number of reactions that the routes found can contain. The larger this number, the longer it takes to receive an answer.

  • MetaCyc Reaction Cost This input box is shown only if MetaCyc is available as a foreign library of reactions to search. This box is not provided from publicly available Web servers such as BioCyc.org. If available, the value entered, which must be nonnegative, is the cost to assign to a reaction from MetaCyc that is included in a route. This option may be obtained by installing Pathway Tools locally at your site and running it in Web server mode on your intranet. See command-line option -metroute-metacyc.

  • Native Reaction Cost The value entered, which must be nonnegative, is the cost to assign to a reaction from the native organism that is included in a route.

  • Atom Loss Cost The value entered, which must be nonnegative, is the cost to assign to an atom that is lost from the source compound to the target compound. This cost applies to all tracked atom species (C, O, P, N, and S). The list of atom species can be selected by clicking the selector on the left of that box and selecting “Selected atom species”, a new input box will open and the desire atom species to track can be typed separated by spaces.

A summary of what each parameter means is provided online by clicking the green question mark located on the left of each labelled input box.

The cost of a route is the sum of all costs: reaction costs from the native database and, if available, the MetaCyc database; and the cost of atom losses.

Once the parameters are entered, clicking the “Search Routes” button will initiate the search on the Web server. The solution, that is, the routes found, will be displayed under the parameters. The routes are sorted in ascending order of their cost (best routes are presented first). Depending on the number paths found, and if MetaCyc is involved, it may take some time before the solution is displayed. Also, displaying a large list of reactions might take some time by the browser itself due to the complexity of formating all compound structures and atom mappings.

Each route found is displayed horizontally accross the Web page, the start compound on the left and the target compound on the right with intermediary compounds in between. You may need to scroll the window to see some of the compounds since the whole route may not fit the width of your browser window. On the left of each route is displayed a text summary of the characteristics of the route. The summary includes the cost of the route, the number of atoms kept from the source compound to the target compound, and its number of reactions.

The chemical structure of each compound involved in the route is displayed and its name appears underneath the structure. If the compound is from the native database, its name is in grey; if the compound is from MetaCyc, its name is in red. Clicking the compound opens a new browser tab to display a complete description of the compound.

Each reaction is shown with a right arrow. If the reaction is from MetaCyc, the arrow is red, if it is from the native organism, the arrow is grey. Underneath the arrow, the protein name is displayed. Clicking the arrow stem opens a new browser tab to display a complete description of the reaction.

For each route, the atom mapping (i.e., atom tracing) is displayed using colors on atoms and bonds from compound to compound . A moiety that is conserved across several compounds is colored with a specific color. Mousing over an atom highlights that atom across all compounds that conserves it. For example, an atom that is conserved from the source compound to target compound can be seen by mousing over it in the source compound and the corresponding atoms in all intermediate compounds up to the target compound will be highlighted. Note that this highlighting feature enables you to find out quickly which atoms of the source compound are lost and by which reaction by mousing over each atom of the source compound.

A new search can be initiated by changing any parameters and clicking the “Search Routes” button. The current solution will be erased and a new solution will be displayed.

11  Pathway Collages

A Pathway Collage is a diagram containing a user-specified set of pathways for an organism. The initial collage is generated from a SmartTable or omics dataset, and can be manipulated and customized in various ways. Pathways are initially laid out automatically so that pathways in the same general class are placed near each other, but both pathways and individual nodes can be manually relocated. The collage is zoomable, with pathway, metabolite, and enzyme labels becoming visible when the collage is at a sufficiently high magnification level to make them readable. The user can selectively highlight objects of interest, delete unwanted portions, import new pathways, edit labels, and use the diagram to display omics data.

The collage can be saved and later reloaded, or it can be exported to a PNG image file for use in a presentation or publication. See an example of a Pathway Collage which has been manipulated in various ways to illustrate some of the possibilities, and then saved.

Pathway Collages are designed to handle fairly small numbers of pathways. As the size of the collage increases, you may find that performance degrades, and there is a significant time lag when zooming, panning, applying customizations, or interacting with the collage in any other way. Larger Pathway Collages also take longer to generate – to avoid an overly large drag on server resources, Pathway Collages are limited to no more than 100 pathways. Pathway Collages work best with a recent version of Chrome or Firefox. While the general functionality should work on all modern javascript-enabled browsers, some functions, such as collage-saving and WYSIWYG color selection, were not yet available on Safari or Internet Explorer at release time. The application has not been tested with any other browsers.

The Pathway Collage application should be intuitive and easy to use. A comprehensive help document is available via the Help→Display Help command.

Generating a Pathway Collage from a SmartTable

The simplest way to generate a Pathway Collage is from a SmartTable containing a set of pathways, using the command Export→Export pathways to Pathway Collage. If the SmartTable contains multiple columns, make sure that the currently selected column is one that contains pathways (if it does not, the software will attempt to find a column that does, but results could be unpredictable). If the SmartTable column happens to contain a pathway class, then all instances of that class will be included. If the SmartTable, in addition to one or more pathways, contains one or more individual reactions, then those reactions will also be included in the Pathway Collage.

A Pathway Collage generated in this way automatically includes data from the most recently loaded omics dataset (i.e. loaded onto the Cellular or Regulatory Overview Diagram, or onto a pathway diagram), if any, but it is not visible until the user requests to see it, and a new omics dataset can be loaded onto an existing Pathway Collage at any time using the File→Add or Replace Omics Data command.

Generating a Pathway Collage from a List of All Pathways

Metabolism → Pathway Collages will take you to a page where you can select pathways from a list of all pathways in the current organism, and generate a Pathway Collage containing the selected pathways.

Generating a Pathway Collage from a Pathway Page

From any pathway page, simply invoke the command Generate Pathway Collage. This will generate a Pathway Collage containing just one pathway. You can add to the collage by right-clicking on any metabolite node in the collage and selecting Add Pathways Containing This Compound. A dialog will pop up listing all the pathways that contain that metabolite, and you can choose which ones to include. Note that when building a Pathway Collage in this fashion, you must position the added pathways yourself, and if you import a super-pathway of a pathway that is already present in your collage, you will end up with duplication (but you can always delete any duplicated pathways or parts of pathways manually).

Generating a Pathway Collage from an Omics Dataset

From the Cellular Overview page, invoke the command Upload Data from File, and fill in most of the fields in the pop-up dialog as if you were displaying your data on the Cellular Overview diagram. However, for the “Show data” field, select “As a Pathway Collage” and indicate how many of the highest-scoring pathways should be included (maximum 100). Using this option, a Pathway Collage will be generated containing those pathways with the highest Pathway Perturbation Score (PPS) or Differential PPS.

12  Regulatory Overview (Regulatory Network Diagram)

The Regulatory Overview enables you to visually analyze the regulatory relationships between genes for a specific organism. These relationships are based on the regulatory data available in the database (i.e., PGDB) of the organism. Currently, the relationships are based on transcriptional regulatory data (future versions may cover other types of regulation). Note: The Regulatory Overview has been tested on Internet Explorer 7.0, Firefox 3.3, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the Regulatory Overview since its performance can be very slow when manipulating a large number (more than 100) of highlighted genes. The performance of the three other browsers are much better compared to Internet Explorer.

The Regulatory Overview is represented as a network with nodes and arrows (i.e., arcs). Each node represents a gene of a specific organism. There is an arrow from gene A to gene B if and only if A regulates B.

When first displayed, the overview does not show any regulatory arrow relationships since, typically, their great number would clutter the overview. These arrows can be selectively added by using the highlighting commands. See the sections below for more information on highlighting commands.

Not all organisms have regulatory data in their PGDB. If the command Genome → Regulatory Overview is grayed out, no Regulatory Overview can be displayed for the selected organism. Otherwise, by selecting the command Genome → Regulatory Overview a Regulatory Overview Web page will open and the complete Regulatory Overview of the selected organism will be displayed. The operations box on the right has several commands specifically for the Regulatory Overview.

It is possible to display a regulatory subnetwork of a specific organism by doing a series of highlighting and then use the command Redisplay Highlighted Genes Only. This command will create a new, smaller layout of the regulatory network that contains the genes that are highlighted only. Genes that do not regulate, or are not regulated by any highlighted genes, are not included in the subnetwork. Further operations can be done on this subnetwork as for the complete overview. See the Section Redisplay Highlighted Genes Only below for more details.

The most common operation is to move the Regulatory Overview left, right, up or down, since sometimes the entire network cannot fit entirely in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the graphic at the top left of the screen called the panning widget.

To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the regulatory network. By increasing the zoom level (i.e., going up in the ladder), the gene names might overlap the network nodes— increasing the zoom level should remove such overlaps. The last zoom level (i.e., the last step of the ladder) will always force the display of all gene names in the network.

Note that depending on the speed of the server, generating large regulatory network overviews (i.e., a zoom-in near the top of the ladder) may require some time. They might have been already generated or they might need to be generated by the server. Accordingly, the response time might vary.

Mousing over a gene node displays a tooltip with data about the genes, its product, the possible ligand, the direct regulatees and regulators. Left-clicking the gene node will open a new Web page containing even more data specific for the gene.

Other more complex visual commands can be reached by right-clicking on genes or in a blank area. This is discussed in detail in the following sections.

Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys to use may be customized on your Mac via the preferences panel.

Organism Selection: Selecting a new organism through the organism selector does not immediately change the Regulatory Overview to this organism. The next operation such as zoom-in or zoom-out will apply to the new selected organism. At any moment you can display the complete regulatory overview of the selected organism by selecting the command Display Complete Regulatory Overview under the right-clicking menu in a blank area or from the right operations box Redisplay Complete Regulatory Overview.

Summary of Commands

12.0.1  Mouse Commands

  • Left-Click on a gene node opens a new browser window with information about the gene.

  • Left-Click (and holding) in a blank area allows to pan (i.e., move) the entire regulatory network left, right, up and down. You need to hold down the mouse button to do the panning.

  • Right-Click on a gene node opens a menu to select a command to apply for this gene. The commands highlight the direct and/or indirect regulatees and/or regulators for this gene and show highlighted arcs between regulatees and regulators.

  • Right-Click in a blank area opens a menu to select general command applicable to the entire regulatory network. These commands are also available in the top menu bar under the menu ‘Regulatory Overview’.

  • Double-Left-Click in a blank area does a zoom-in operation.

The following sections describe in more details these operations and some others.

12.0.2  Layout Selection

For any organism, there are two layouts available: nested ellipses or top to bottom.

The layout nested ellipses uses up to three ellipses to display the gene nodes. The inner most ellipse contains, in alphabetical order of the gene names, the genes that have the largest number of regulatees. The middle ellipse contains genes that regulate at least one gene. The outer ellipse contains the genes that have no regulatees. They might be displayed as groups of genes regulated by the same set of genes (a multi-regulon). This is typically done using triangles or a short straight line if the group is small.

The layout top to bottom uses several straight rows to display the gene nodes. Each row contains genes that do not directly regulate each other. The top row contains the genes that regulate the largest number of genes. The bottom row contains genes that do not regulate any genes. In between rows contain genes that regulate some other genes. As for the nested ellipses layout, this row might have genes grouped in straight lines or triangles.

12.0.3  Highlighting Genes and Regulatory Relationship Arrows

There are several commands to highlight genes and show the regulatory relationship arrows between them.

Two commands use the gene name, or a substring of gene names, or a gene frame-id. Both of these commands are available by right-clicking in a blank area, or from the top menu bar under Regulatory Overview. The command Highlight Gene By Name or Frame ID highlights at most one gene. It is essentially a search command since you might not know the location of that gene in the regulatory network. Once found, the regulatory network will be centered on the location of the gene. The command Highlight Genes By Substring may highlight several genes. Selecting the command opens a panel from which you can enter a string of characters. Once clicking the button labeled Highlight in the panel, the genes highlighted have a name that contains the given string (this is a case-insensitive search). For this command it is also possible to include the regulatory relationships between the genes found.

The command HighlightGenesByGeneOntologyTerms accessible from the right-clicking menu enables you to select one or more Gene Ontology (GO) terms. The genes that produce proteins annotated with the selected GO terms will be highlighted. The option Include Relationships Arrows enables you to add relationship arrows between the highlighted genes. Note that if you are displaying a subnetwork, there might be genes with such products in the organism but that these might not be in the subnetwork. In such a case, a warning is given that no genes have been highlighted.

Right-clicking on a gene will open a menu of highlighting commands specific to that gene. The menu may contain from one to seven commands. Since some genes do not have any regulators or/and any regulatees, this list of commands may vary from gene to gene. Here are the list of all possible commands available from this menu where name will be the gene name (e.g., trpA) on which the right-clicking was done. The highlighting is done with one a specific color but that color changes from one executed highlighting command to the next.

  • Highlight Gene name Highlights only the gene selected.

  • Highlight Gene name and its Direct Regulatees The gene selected and all its direct regulatees are highlighted and relationship arrows are displayed from the selected gene to its regulatees.

  • Highlight Gene name and its Direct Regulators The gene selected and all its direct regulators are highlighted and relationship arrows are displayed from the regulator genes to the selected gene.

  • Highlight Gene name and its Direct Regulatees and Regulators This command combines the two previous commands.

  • Highlight Gene name and its Direct and Indirect Regulatees The selected gene and all its direct regulatees and indirect regulatees are highlighted and relationship arrows are displayed from regulators to regulatees.

  • Highlight Gene name and its Direct and Indirect Regulators The selected gene and all its direct regulators and indirect regulators are highlighted and relationship arrows are displayed from regulators to regulatees.

  • Highlight Gene name and its Direct and Indirect Regulatees and Regulators This command combines the two previous commands.

When a highlighting operation is done, a new overlay is created. The list of overlays is shown in the Layer Switcher panel on the right of the overview Web page. This panel may be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. This is particularly useful if you use the command Redisplay Highlighted Genes Only.

All highlighting can be removed by using the command Clear All Highlighting.

For more information about highlighting, see Section Redisplay Highlighted Genes Only.

12.0.4  Redisplay Highlighted Genes Only

The command Redisplay Highlighted Genes Only will display a regulatory network by considering only the genes that are highlighted. The layout is changed to “top to bottom” since it is usually a better layout when using a small set of genes. This command would be used after a series of highlighting operations to select a set of genes to analyze closely. The current displayed regulatory network will be removed and a new regulatory network will be displayed. The active highlighting will remain active. All overlays (active or not) will also remain. It is useful to keep the deactivated overlays since you may come back to the complete regulatory network and reactivate them to recreate a new regulatory subnetwork. Note that genes that do not regulate or are not regulated by any highlighted genes are not included in the subnetwork.

To redisplay the complete regulatory network, use the command Display Complete Regulatory Overview accessible when right-clicking in a blank area. The current active overlays remain active and the deactivated overlays are not removed.

The information in tooltips within a subnetwork display (produced when mousing over gene nodes) are restricted to that subnetwork. That is, the tooltip’s list of regulatees and regulators are for the subnetwork, not for the entire regulatory network of the organism. However, when you transition from a subnetwork display back to the display of the entire network, any highlighting done on a subnetwork will be expanded for the entire regulatory network to show relationships within the full network. For example, if gene A has four direct regulatees in a subnetwork, but twenty regulatees in the entire network, when the operation Highlight Gene A and its Direct Regulatees is applied in the subnetwork, only the four regulatees are highlighted, but once you redisplay the entire network, the twenty regulatees will be highlighted.

Regulatory Omics Viewer

The Pathway Tools Regulatory Omics Viewer illustrates the results of high-throughput experiments in the context of gene regulation. Genes that are involved in regulation are mapped to gene

levels in a given experimental dataset is mapped to a spectrum of colors. This facility enables the user to see instantly which genes are active or inactive under some set of experimental conditions.

The Omics Viewer for the Regulatory Overview is very similar to the Omics Viewer for the Cellular Overview. Data files submitted to the Regulatory Omics Viewer must contain in their first column gene names or frame ids. To start the Regulatory Omics Viewer, use the command Overlay Experimental Data (Omics Viewer) under the Regulatory Overview menu. See Section 8.3 for details of how to use the Regulatory Omics Viewer.

13  Comparative Analysis

Several types of comparative operations are available within Pathway Tools Web sites. Note that all of the PGDBs to be compared must be resident within a single Pathway Tools Web site.

Start a comparative analysis by specifying the organism(s) you want to compare. In many cases this can be done from the menu command Select organisms/databases for comparison operations, which is accessible through the Gene, Pathway, Reaction, and Compound menus. It is also accessible through the Choose Organisms button in the Analysis → Comparative Analysis page. This tool supports multi-organism selection using the following three modes. In each mode, a list of organisms for comparison is built up on the right side; you can add to, remove from, or clear that entire list using the buttons in the middle.

  • By Name: Select individual organisms by name on the left

  • By Taxonomy: Select a taxonomic group by clicking through the tree or entering a search term. All genomes under that taxonomic group can be added to the selection by clicking “Add”

  • My Lists: Choose organism lists that were previously saved in your online account, or create a new organism list from the current selection

13.1  Compare Objects Across Databases

Most object pages in Pathway Tools Web sites contain commands for navigating to that same object in one or more other PGDBs. For example, the command Show this gene in another database on a gene page will find the same gene in a specified PGDB The command Show this compound in another database from a compound page will show the same metabolite in a specified PGDB. Similarly, Search for this gene in multiple databases on a gene page will generate a table showing information about that gene in multiple specified PGDBs.

Pathway Tools finds “the same object” using different mechanisms for different types of objects:

  • For genes and proteins, the software uses orthology information when available. If no equivalent object is found using orthology information, then the software searches for a gene or protein of the same name (note that name-based searches sometimes yield incorrect results).

  • For compounds, reactions, and pathways, the software relies on the fact that when the PathoLogic component of Pathway Tools constructs new PGDBs, it does so by selectively copying information about compounds, reactions, and pathways from the MetaCyc PGDB to the new PGDB. When performing this copy operation, the software maintains the same unique identifier for each compound, reaction, and pathway in the new PGDB as it had in MetaCyc. Thus, when comparing compounds, reactions, and pathways, the software looks for objects with the same unique identifiers in other PGDBs. Note that compounds, reactions, or pathways created by a user in an individual PGDB will have new unique identifiers that will not match identifiers in other PGDBs.

The following comparison commands are all available under the Gene, Compound, Reaction, and Pathway menus:

  • Show this object in another database

  • Show this object in multiple databases

  • Show this object in MetaCyc (not available for genes)

In addition, the following command will generate a table comparing the operon context of a gene across multiple organisms: Show orthologs (with operon diagrams) in multiple databases.

The comparative genome browser described in Section 5.2 supports more powerful viewing of genome regions around orthologous genes.

13.2  Compare Individual Pathways and Reactions

The “Species Comparison” operation in the operations box for pathway and reaction pages generates tables comparing a pathway or reaction across multiple PGDBs. If you wish to change the organisms being compared, use the command Change organisms/databases for comparison operations.

The reaction comparison table lists the enzyme(s) that catalyze the reaction; activators, inhibitors, and cofactors for those enzymes; and the one or more pathway(s) containing the reaction in that organism.

The pathway comparison table includes a graphic of the pathway showing which reactions in the pathway have enzymes present in each organism; a list of the enzymes catalyzing each reaction; and operon diagrams for each gene in the pathway.

13.3  Comparative Analysis Tables

Analysis → Comparative Analysis allows users to generate summaries of individual PGDBs, and to compare statistics between PGDBs. Currently we support comparative analysis of reactions, pathways, compounds, proteins, orthologs, transporters, and transcription units — select the type(s) of reports you wish to generate.

Next select one or more PGDBs for which to perform the analysis.

Please experiment with these commands to see the detailed reports generated by each comparison.

14  Sequence Search and Alignment

14.1  BLAST Search

Pathway/Genome Databases (PGDBs) that have sequence data can be searched using NCBI BLAST.

To access the Web interface for BLAST searches, go to: Search Menu → BLAST search.

Documentation on the use of the Web interface for NCBI BLAST can be found here.

14.2  PatMatch Sequence Search

PatMatch [21] allows you to search for a short nucleotide or amino-acid sequence within a specific genome, using an exact search or using degenerate nucleotide or amino-acid symbols. The minimum length of the input string is 3 residues.

The results are displayed initially as a simple web-page table, with the option of displaying the result as a SmartTable, if there are less than 5000 results. If there are more than 5000 results, then a file download link is provided.

To access the PatMatch search, go to: Search → Sequence Pattern Search .

For each genome, the user can search several alternative sequence databases:

  • Complete peptide database

  • Nucleotide database: whole genome

  • Nucleotide database: coding regions — contains the nucleotide sequence of the coding regions for each protein and RNA-coding gene

  • Nucleotide database: intergenic regions — contains the nucleotide sequence of the regions between adjacent genes

  • Nucleotide database: intergenic regions, extended — contains the nucleotide sequence of the regions between adjacent genes, plus an additional 400 bases upstream and 250 bases downstream, such as to include possible regulatory regions

14.3  Alignment Viewer

A multiple sequence alignment viewer can be invoked to view alignments of amino-acid and nucleotide sequences. The tool can be invoked on a set of orthologs or on a set of genes or proteins via SmartTables.

To invoke the alignment viewer on a set of orthologs:

  • Navigate to the gene page for a gene/protein you want to include in the alignment

  • Execute one of the following commands from the right-sidebar menu depending on whether you wish to align nucleotide or amino-acid sequences:

    • Align gene nucleotide sequence with orthologs

    • Align gene product amino acid sequence with orthologs

  • Next you will be prompted to select the set of organisms from which orthologs to the starting gene will be obtained

  • A multiple sequence alignment will be constructed using MUSCLE and displayed using the alignment viewer

To invoke the alignment viewer on a set of genes in a SmartTable:

  • Create a SmartTable where each row in the SmartTable contains one gene or protein that you wish to align (see Section 6.3)

  • Run onen of the following commands in the right-sidebar menu to create the alignment:

    • Column → View Alignment of Gene Nucleotide Sequences

    • Column → View Alignment of Gene Product Peptide Sequences

  • A multiple sequence alignment will be constructed using MUSCLE and displayed using the alignment viewer

The sequence alignment viewer enables the user to zoom in to a region of the alignment by clicking on a point within the alignment graphic, to move left or right in the sequence by clicking the green arrows to the left/right of the coordinate line, and to re-render the alignment between specified coordinates.

15  How to Learn More

References

[1]   PatMatch home page. ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/.

[2]   T. Yan, D. Yoo, T. Z. Berardini, L. A. Mueller, D. C. Weems, S. Weng, J. M. Cherry, and S. Y. Rhee. PatMatch: a program for finding patterns in peptide and nucleotide sequences. Nucleic Acids Res, 33(Web Server issue):W262–6, 2005.