CMap Administration and Data Curation
$Revision: 1.46 $
This document is intended to help you understand how to use the data curation tools provided with CMap. There are three tools you will use: the configuration files, the ``cmap_admin.pl'' command-line interface and the web-base administration interface.
The configuration file is used to create and customize the look of map, feature and evidence types as well as customizing the CMap experience.
The ``cmap_admin.pl'' program is used for all long-running processes that are not practical to address over the HTTP protocol. It employs a ``wizard''-like approach to accomplishing various tasks by asking the curator a series of questions and performing the desired action using the answers provided. cmap_admin.pl can also be driven via command line arguments.
The web admin tool is meant to provide a point-and-click interface for performing the more mundane administrative tasks or for viewing the data.
All of this is discussed in further detail in this document.
If you have just installed CMap, then your database is probably completely empty. (Of course, a small but complete dataset is provided in the ``data'' directory, but the database for your own data will be empty.) So let's start by discussing what you need to do to get a usable installation with your data.
This document assumes that you've already been through all the steps described in the INSTALL.pod document, so you should be able to pull up the web-based admin tool by pointing your browser to ``http://your.host.name/cgi-bin/cmap/admin'' (or where ever you have installed it). Depending on whether or not you decided to password-protect that URL, you may have to enter a username and password when prompted by your browser.
The first step to getting your data into CMap is to correctly set up your configuration files. The config files tell CMap how to access your database. Also, each map type, feature type and evidence type of the data that you will be installing needs to be defined in the configuration files.
CMap has multiple configuration files. These files are read out of the ``cmap.conf/'' directory, likely located in your apache ``conf/'' directory. There are two types of config files. There is one (and only one) ``global.conf'' file that holds information applicable to all of the databases. And in that directory, at least one individual configuration file for each database used. Everything that can be customized about CMap is controlled by these files, which are described in detail below.
The configuration files are written in a standard Apache-style configuration file syntax. Comments are defined as anything following a hash sign and are ignored. Options which are grouped together are in angle brackets (``<>''). Standard options are written in a ``Name Value'' syntax where the two are separated by whitespace or an equal sign. For more information on the syntax, read the POD for Config::General (by executing ``perldoc Config::General'' on your system).
Each configuration setting is documented in comments. Legal and default values are listed along with a brief description of the option. Except for the ``database'' and map/feature/evidence_type settings, there is a reasonable default, hard coded value for every configuration setting (so you could actually comment out all of the ``optional settings'' in the file).
Whenever you make a change to cmap.conf, it will most likely be necessary for you to purge the CMap cache to see the change if the change affects something in the viewer since. This is particularly true if map|feature|evidence type information has been changed because that information is cached along with database search results. Purging the query cache can be done by using the cmap_admin.pl program (explained later in this document).
The global.conf contains information which is used by all of the other configurations. You will want to set the default_db value.
Excluding or setting to 0 will ignore directory size.
Excluding or setting to 0 will ignore directory size.
Excluding or setting to 0 will prevent CMap from attempting to purge the image directory.
The default is set to five minutes so that any http requests accessing current images will have a chance to get them. Setting to 0 may cause problems with concurrent requests.
CMap supports multiple data sources and multiple customizations for each. This allows a curator to maintain distinct databases and view them all in CMap.
If you have only one database, then you need only one configuration file of this type. Adding another is as simple as creating a new configuration file. Any file that is in the ``cmap.conf/'' directory that ends in ``.conf'' will be read as a configuration file.
All of the parameters that can be specified in an individual config file are described here.
The required elements are the Required General Options and the Map, Feature and Evidence Type Information. The other elements all have reasonable defaults and can be ignored until you want to tweak them.
The file ``example.conf'' (which is installed in the cmap.conf/ directory) provides a good starting point for your configuration. When you are done customizing your config file, make sure you set ``is_enabled'' to 1.
Starting with v0.13 map, feature and evidence types (*_types) are defined in the configuration files and not in the database. This means that to add an object of *_type, its type must be defined here in the config file.
The accession id of the *_type is listed in two places for technical reasons. It is important that both places have the same accession id. One is in the initial tag, <map_type X>, where 'X' is the accession. The other is in the 'evidence_type_acc' field.
It is important to note that some of the information is stored in the database when features and map sets are created. Fields like map units and default_rank will not change in the database if you change them only in the config file. This may be something to be addressed in the future.
Attributes are essentially name, value pairs which describe something about the type. They must be defined in <attribute> tags. Here is an example entry:
<attribute> name Description value This is the description of the *_type. is_public 1 </attribute> <attribute> name History value This type was first discovered in the fourth century BC. </attribute>
Notice that is_public is optional. If not given it will default to 1.
Cross-references (xrefs) are links from this type to another website. They consist of a name and a URL. If the URL is inside the ``cmap/'' namespace you can define just the end portion. However if you want to link outside this website, you must include the ``http://''. Here is an example entry:
<xref> name Map Search url map_search </xref> <xref> name google url http://www.google.com </xref>
$code=sprintf("onMouseOver=\"window.status='%s';return true\"",$map->{'map_name'}); $alt =sprintf("%s",$map->{'map_name'}); $url=sprintf("www.google.com/search?q=%s",$map->{'map_name'});
This is Perl code that lets you change the values of three variables; $code, $alt and $url. $code hold the JavaScript commands that the area will use. $alt holds the alternate message that will pop up when the mouse is hovered over the area box. $url is the URL that the browser will go to when clicking the area box. These are evaluated for each map of this type which enables access to the internal variable $map (or in the case of features, $feature).
This allows the administrator the freedom to make the page more versatile. Each variable has usable defaults.
Some useful keys in $map (which are used $map->{'key'}):
Some useful keys in $feature (which are used $feature->{'key'}):
Since area_code allows you to inject Perl to be executed, the Bio::GMOD::CMap::Data::Generic can be used gather data from the db. To access the module, use the ``$self->sql()'' method. You can use the object returned to call data retrieval methods. For an list and explanation of the possible methods, execute ``perldoc Bio::GMOD::CMap::Data::Generic''.
The following is an example of how to get an xref from the database.
area_code <<EOF my $dbxrefs = $self->sql()->get_xrefs( cmap_object => $self, object_id => $feature->{'feature_id'}, object_type => 'feature', xref_name => 'Chado', ); my $new_url = ''; if ( @{ $dbxrefs || [] } ) { my $t = $self->template; $t->process( \$dbxrefs->[0]{'xref_url'}, { object => $feature }, \$new_url ); } $url = $new_url; $code=sprintf("onMouseOver=\"window.status='%s';return true\"",$feature->{'feature_type_acc'}); EOF
There are banding specific options that can be defined:
Also, a feature type with this glyph must have it's own drawing lane.
Also, a feature type with this glyph must have it's own drawing lane.
For example, two items are collapsed below.
---- ------------ | V -----OOOO---
This would be represented by three features and written in the import file like this:
feature_name feature_start feature_stop feature_type_acc 1 1 5 something_density 2 6 9 something_density 1 10 12 something_density
There are heatmap specific options that should be defined in order for this to be most useful.
Also, a feature type with this glyph must have it's own drawing lane.
The following will change the color of the feature if the name matches a specified pattern.
feature_modification_code <<EOF if ($feature->{'feature_name'} =~ /^sb(\S+)_/){ $color = 'black'; } elsif ($feature->{'feature_name'} =~ /^os(\S+)_/){ $color = 'orange'; } EOF
Lanes are not absolute, they are only used to sort the features on a map; therefore, if you have only ``Centromeres'' and ``Markers'' in lane #1, on any map with either of those types of features, they will be drawn =item *on* the map; on any map without those types, the feature type with the next highest lane number will be drawn =item *on* the map. The same is true as you move into higher lanes; if the lanes move like ``1,'' ``4,'' ``7,'' you will have only 3 lanes with no gaps.
Values: display: Display all features of this type corr_only: Display only features of this type with displayed correspondences ignore: Completely ignore features of this type.
A single correspondence can be supported by any number of evidence types, so it is necessary that these be ranked from ``1'' to N (where lower numbers have precedence over higher). When multiple records support a correspondence, only the highest ranking evidence will be used when determining how to draw the correspondence line. If you intend to use cmap_admin.pl tool to create correspondences for you based on simple name/feature type comparisons, you should create a special evidence to use just for that purpose (e.g., with the name ``Automated name-based'' or something similar to flag it as a machine-created/non-curated correspondence). Evidence type color will only be used when NOT aggregating.
Values: direct: Line directly from feature to feature. indirect: Line that starts perpendicular to the map for a short distance before traveling to the other feature. ribbon: A ribbon is drawn that spans the top and bottom of each feature.
You cannot default to the greater than or less than score options.
Values: display: Display all correspondences with evidence of this type ignore: Completely ignore the evidence of this type when considering correspondences.
These colors will only be used if the aggregate correspondences are split based on evidence type. Meaning each evidence type will have it's own aggregated correspondence. It is suggested to use primary colors for each different type, so they can be distinguished.
<aggregated_correspondence_colors> 1 lightblue 20 blue 0 darkblue </aggregated_correspondence_colors>
The unit_granularity must be defined for each map_type for a configuration file to be considered valid by the validator. However, CMap will work without it.
The values are defined as name, value pairs, where the name is the feature_type accession and the value is the default value for that feature_type. They must be defined in <feature_default_display> tags. Here is an example entry:
<feature_default_display> marker display read corr_only </feature_default_display>
Values: display: Display all features of this type corr_only: Display only features of this type with displayed correspondences ignore: Completely ignore features of this type.
Values: color
Values: color
Values: color
Values: color
Values: none, landmarks, all
Values: 0: don't collapse 1: collapse
Values: display: Display all features of this type corr_only: Display only features of this type with displayed correspondences ignore: Completely ignore features of this type.
Values: 0: don't aggregate 1: one line 2: two lines
This default is overridden by the any individually defined defaults.
Values: display: Display all correspondences with evidence of this type ignore: Completely ignore the evidence of this type when considering correspondences.
Values: 0 : Lines connect features 1 : Lines connect maps
Values: small, medium, large
Values: small, medium, large
Values: png, jpeg, svg and maybe gif
Values: 0: don't scale 1: scale
For an example (a completely nonsensical example), lets pretend that a centimorgan (cM) is 100 base pair (bp). The config would look something like the following, which is read 'One centimorgan is the size of 100 base pairs'.
<scale_conversion> <cM> bp 100 </cM> </scale_conversion>
Since the directionality doesn't matter, this is the same as:
<scale_conversion> <bp> cM .01 </bp> </scale_conversion>
There is no reason to define the conversion in both directions since only one factor will be used and it could create inconsistencies if one direction was different from the other.
Also, multiple conversions can be defined as in this (nonsensical) example.
<scale_conversion> <bp> cM .01 bands .1 </bp> <cM> bands 10 </cM> </scale_conversion>
It is appropriate to describe all relationships as CMap won't necessarily figure out the relationship between a band and a centimorgan transitively (but it might depending on the order of the map sets). Therefor, to avoid inconsistencies, please make sure to define all relationships (and define them correctly).
Values: 0 : render all of the area boxes 1 : omit the feature area boxes 2 : omits all area boxes
Values: any positive integer (within reason)
Values: see background_color
Values: see background_color
Values: see background_color
Values: see background_color
Values: see background_color
The format is ``upper_bound color'' where upper_bound is the cap that this color is used for. In the following example, from 0-1 corr, the line will be light grey. If there are 2 corr, the line will be blue. From 3-5 (inclusive), the line will be purple. Anything ABOVE 20 will be black, which is denoted by an upper_bound of 0 (in this case it means infinity).
<aggregated_correspondence_colors> 1 lightgrey 2 blue 5 purple 20 red 0 black </aggregated_correspondence_colors>
Values: any color found in the COLORS array of Bio::GMOD::CMap::Constants To view this file, either look in the ``lib'' directory of the original CMap source directory, or, if you have the very handy ``pmtools'' installed on your system, type ``pmcat Bio::GMOD::CMap::Constants'' on your command-line.
Values: see background_color
Values: see background_color
These options are only used to allow complex JavaScript to be specified in the config document. These can be used in conjunction with the area_code of feature_types and map_types.
This can be used to specify functions that the individual maps or features need to use. When a feature or map type is used that requests a page_code object, the code is inserted at the top of the page.
CMap allows the creation of buttons to modify options in the CMap viewer menu. Conditions can be set that would need to be met before the button will appear.
A simple example would be to add a ``For Publication'' button that would set the ``clean_view'' option and would only appear when the ``clean_view'' option was off.
The impitus for adding this feature was for creating a button that would toggle between an overview style view and a more detailed view. Clicking the ``Overview'' button would set the feature display options to ``corr_only'' of all but a select set of feature types which would have the effect of de-cluttering the view. A ``Detailed'' button could set all the feature display options to always ``display''.
All buttons are enclosed in an <additional_buttons> tag.
The following is the configuration for the ``clean_view'' buttons mentioned above.
<additional_buttons> <button> text For Publication <if> clean_view 0 </if> <set> clean_view 1 </set> </button> <button> text Return Navigation Buttons <if> clean_view 1 </if> <set> clean_view 0 </set> </button> </additional_buttons>
Note that there are two buttons, one for when ``clean_view'' is set and one for when it is not. They work together.
Each <button> is it's own element with the following fields.
The individual options are described later in the Testable Parameters section.
The following is an alternate configuration for the ``For Publication'' button.
<button> text For Publication <if_not> clean_view 1 </if_not> <set> clean_view 1 </set> </button>
The individual options are described later in the Testable Parameters section.
highlight collapse_features scale_maps stack_maps omit_area_boxes show_intraslot_corr split_agg_ev clean_view corrs_to_map ignore_image_map_sanity dotplot
prev_ref_species_acc prev_ref_map_set_acc ref_species_acc ref_map_set_acc image_type label_features aggregate comp_menu_order data_source ref_map_start ref_map_stop font_size pixel_height dotplot_ps
display_feature_type feature_type_acc corr_only_feature_type feature_type_acc ignored_feature_type feature_type_acc
included_evidence_type evidence_type_acc ignored_evidence_type evidence_type_acc less_evidence_type evidence_type_acc greater_evidence_type evidence_type_acc
An example button:
<button> text No Marker <if> display_feature_type marker </if> <set> ft_marker 0 </set> </button>
Values: feature_name, feature_aid
Values: any positive integer (within reason)
Values: any positive integer (within reason)
If you wish to have something like this, I'm sure you can glean the basic ideas from Lincoln's ``GramenePage.pm'' module. Then define the ``page_object'' in your cmap.conf. Otherwise, just don't worry about it, and this method will never return anything.
Titles and Introductory texts for various pages.
These are the default settings for when map sets are created.
Values: see background_color
Values: any shape found in the VALID hash under map_shape in Bio::GMOD::CMap::Constants
Value: any number from 1 to 10
If there is a section of the individual configuration files that is repeated in multiple files, that section can be placed in a separate file. That file can then be included in each configuration file.
<<include foo_type_info.cfg>>
It is recommended that any included files be named with a ``.cfg'' extension. Do no name it with a ``.conf'' extension, otherwise it will be read as a separate config file.
You can use the cmap_validate_config.pl (VALIDATE CONFIG FILES) to test your individual config files (not global.conf yet) and make sure that they are valid. This can save you a lot of headaches dealing with configuration problems. See the section on VALIDATE CONFIG FILES for more information.
There are a few configuration options that can be used to increase the speed and clarity of CMap. This is a collection of those options. The details are described above.
All of these can be overridden by the user.
feature_default_display can be overridden by the feature_type configs and by the user. If there are some feature types that should be displayed initially, you can set an individual feature_default_display value in the feature_type configuration.
Setting omit_area_boxes to 2 will render the whole image unclickable. This could save time but makes CMap harder to use since it relies on clicking the image for a lot of navigation. If you do choose to go this route, CMap will set clean_view to 1 automatically to keep from drawing the buttons that were rendered useless.
The home page for the web-based CMap administration tool has links for all the different actions you can take:
Each link is self-explanatory. In addition to the above links, you will also see the current data source to which you are connected. You may configure CMap to connect to different distinct data sources by having multiple configuration files; see the section EDITING CONFIGURATION FILES above for more information on this. If you have configured more than one data source, you will also see a drop-down control allowing you to connect to a different data source.
On each ``View'' page, you have access to edit and delete the information there (except for map, feature and evidence type information which are defined in the config file). Each page will be discussed in detail later. For the moment, let's just focus on what you need to import your first data set. To do that, you'll need to first set up the species.
Starting with CMap version 1.01, CMap is able to import data in the GFF3 file format.
Using special CMap specific extensions, all the CMap data can be imported in one file. This allows a whole data base worth of CMap data to be imported in one step (rather than the multiple previously required which are described below).
Alternatively, regular GFF3 (as used by GBrowse) can be imported if a map set is specified.
The format is described in detail in the Bio::DB::SeqFeature::Store::cmap module. It can be accessed by running the command:
$ perldoc Bio::DB::SeqFeature::Store::cmap
The GFF files can be imported using the cmap_admin.pl script, either using the menu system or by using the command line interface (examples follow).
The first example is using a gff file ``cmap_style.gff'' that has the CMap extensions to define which species and map sets the data belongs.
$ cmap_admin.pl -d DATASOURCE --action import_gff cmap_style.gff
This second example defines the map set accession to import the data into. However, if a map set is defined in the file, the map set in the file will be used.
$ cmap_admin.pl -d DATASOURCE --action import_gff --map_set_acc MAP_SET_ACCESSION gbrowse_style.gff
Note: In order for GFF importing to work correcly, it requires bioperl-live downloaded after June 7th, 2008.
A sample CMap GFF3 file, ``test_data.gff'', is located in the data directory of the distribution. This can be imported into the demo database (which can be created using the ``./Build demo'' command).
$ cmap_admin.pl -d CMAP_DEMO --action import_gff data/test_data.gff
Click on ``View Species'' from the home page of the web-admin tool. As you currently have no species to view, you should see the message ``No species to show.'' To add a new species, click on the ``Create New Species'' link at the top of the page. You should now see a page entitled ``Species Create'' with a form for the following fields:
Note: All the accession id columns in the CMap tables act the same. They are all character fields, so they will accept any combination of numbers and letters you care to use. Please don't use spaces or characters outside the ranges ``a-z,'' ``A-Z,'' ``0-9'' or dashes (``-'') as this will likely only cause you headaches. It is also not necessary to explicitly assign any accession IDs. While they *are* required by the database, there is code in place to ensure that the accession ID is set to the primary ID of the record if the accession ID is empty. Once your accession IDs have been established and publicized, they should never change.
Also, it is best to avoid strictly numeric accession ids since the automatic accessions are numeric and this can cause conflicts.
The fields marked ``(opt.)'' do not require you to enter a value. If one is really required for the database (e.g., the ``accession_id''), then a reasonable default will be provided (e.g., the primary key value for accession IDs). When you are done, hit the ``Submit'' button. If there are errors, they will be reported to you and you will have to correct them before submitting again. If there are no errors, your entry will be accepted and you will be returned to the ``Species View'' page.
Note: On all the pages with clickable column headers, clicking on the column names will resort the data by the column.
If you are unhappy with any of the data you see, you can click on the ``Edit'' link of the record that displeases you and correct the faults. If you have created an unnecessary species, you can delete it by clicking on the ``Delete'' link. After confirming that you really wish to delete the species, it will be *permanently* removed from the database.
Note: There is no ``undo'' function for deletes, so be sure whenever you decide to remove an object from the database that you really mean it. When an object has other objects that rely on it (e.g., species records are linked to map sets), you will not be allowed to remove the object until all dependencies to it are removed (e.g., no more map sets use the species you want to remove).
Once you've set up all the species you wish to have in your database, click on the ``Home'' link in the upper-left corner to return to the web-admin home page.
Note: Throughout the admin interface, the ``create'' and ``edit'' pages for any object (e.g., species, map types, map sets, etc.) have the same fields in the same order with the same restriction. If this document only mentions the ``create'' or ``edit'' page for an object, rest assured that the complementary page works the same as the one described.
Once you have species set up and map types defined, you are ready to create a new map set. Everything in CMap is designed to be generic, so a ``map set'' is simply a collection of maps. What you group together is entirely up to you. On Gramene, map sets tend to correspond to published studies of organisms and contain maps that represent chromosomes, linkage groups, FPC contigs, and such. As the database design allows only one species and one map type to be linked to a map set, it is best to keep these narrowly defined.
To create a new map set, click on the ``Create New Map Set'' link from the admin home page or from the ``Map Sets View'' page. If you have failed to set up species or map types, you will be prompted to do so before continuing. You will see the following fields:
You can also create a new map set by using the ``cmap_admin.pl'' tool. There are only a few functions that exists in both tools, and this one was only added to cmap_admin.pl in order to make importing new data sets more convenient. You can only specify the species, map type, long and short names, and accession ID for the map set. Everything else about the map must be edited using the web admin tool.
To create a new map set with cmap_admin.pl, start the script and choose the ``Create new map set'' option. Answer the questions appropriately and confirm your decision. If you see no errors, the map set was successfully created.
Note: When using cmap_admin.pl, questions which have only one choice are automatically answered by the tool. In the above example, if you had only one species in the database, cmap_admin.pl will not ask you which species to associate with the new map set. There can only be one answer, so it answers the question automatically.
Once you have successfully created a map set with the web admin tool, you will be taken to the view of that map set. You should see the data that you entered and that the set currently has no maps associated with it. You can either create each map for the map set individually or you can import the data for the map set. Most likely, you will want to do the latter, so that is the next section.
Using the cmap_admin.pl, you can import a tab-delimited file containing the data for a map set with the following fields and data types:
For a more thorough treatment of these fields, read the ``import_tab'' section of ``perldoc Bio::GMOD::CMap::Admin::Import.''
The first line of the file should be the tab-separated names of the fields in whatever order you're supplying them (the order of fields is not important). You should use the above names for the fields, but you can use spaces and capitalization for the column names, if you like, as spaces will be converted to underscores and the names lowercased (e.g. ``Feature Alt Name'' will become ``feature_alt_name'').
If the fields ``map_start'' and ``map_stop'' are not supplied, then the start and stop positions of the map will be determined after importing all the features by selecting the MIN and MAX start and stop positions from the ``cmap_feature'' table.
Use the ``cmap_admin.pl'' script to bring in your correctly formatted data. Run the script, optionally passing your data file as an argument, like so:
$ cmap_admin.pl my_groovy_maps.dat
If you pass a file as an argument, you will be asked to confirm that that is the file you want to use. If you answer ``no'' or did not pass a file, then you will be asked to locate the file containing your data. Type in the path to the file (noticing that you can use tab completion), or type ``q'' to exit file selection and return to the ``Main Menu.'' Once you've found your file, you'll need to tell the tool which map set the data corresponds to. First choose the map type and then the species of the map set, then the map set itself. Lastly, you will need to confirm your choices. If all goes well, you should see a lot of lines fly by giving you the step-by-step progress, the message ``Done,'' and then you will be returned to the ``Main Menu.'' A complete log of the actions taken by the script will be stored in a file called ``cmap_admin_log.X'' (where ``X'' is an incrementing number). See the docs on cmap_admin.pl (by typing ``perldoc cmap_admin.pl'') for more info.
When you import data for an map set that already has data, all existing maps and features with the same name as maps and features in your data will be updated. You also have the option of deleting any data that isn't updated. If you choose this ``overwrite'' option, any of the pre-existing maps or features that aren't updated will be deleted as it will be assumed that they are no longer present in the dataset.
While the import process is running, it may encounter feature types in your data which do not exist in the database. If this happens, the program will die right there and you will be left to the task of figuring out what was the last data inserted, defining the feature type in the config file and re-running the remaining data.
When cmap_admin.pl has successfully finished importing your map data, you will be returned to the ``Main Menu.'' From here, choose to ``Quit'' as the rest of the functionality will be covered later.
As of 0.10, CMap exports and imports data in XML format. The standard tab-delimited format is very convenient and easy to generate, but it's also difficult to indicate hierarchical relationships among data. As such, some experimental has been added to export the concept of ``objects'' from the database. These objects will contain all the information within them necessary to duplicate themselves entirely in another CMap database. This code is functional but still marked ``experimental'' as it appears to be very slow when exporting very large map sets.
To try this feature, choose ``Export data'' and then ``Database objects.'' Follow the directions from there. Then try importing the data (into another database, of course) using the ``Import data'' option and then ``Import CMap objects.'' Database objects are only created on import at this time; there is no updating of existing objects, so be careful!
Once you have more than one set of maps in CMap, you can use it for what it was designed: showing comparisons. To do this, you must first create the correspondences between the features on the maps. Before you can do this, you'll need to establish the evidence types that can be used to support the correspondences. These are defined in the configuration file. For how to create evidence types, see EDITING CONFIGURATION FILES above.
Once you have evidence types, you can create the correspondence records. It might be helpful to you to inspect ``cmap-schema.png'' and ``cmap-schema-graph.png'' images in the ``docs'' directory that visualize the CMap tables and their relationships. (Also included is ``cmap-schema-desc.html,'' a breakdown of the tables into HTML tables for easy viewing.) The tables involved are:
For descriptions of these tables, look at the ``CODE_OVERVIEW.pod'' document in the ``docs'' directory.
There are three ways to create correspondences:
A) ``View'' a map set, then ``view'' a map, then ``view'' the feature
B) Click on ``Search for a Feature'' and find the feature by name
Once you have located a feature and have navigated to the ``View Feature'' page, click on the link to ``Add Correspondence.'' This will take you to the ``Feature Correspondence Create'' page where you can search for the other feature in the relationship. When you have located both features, you will be presented a form with the following fields:
When you have finished, click the button entitled ``Create Correspondence.'' If the correspondence is successfully created, you will be taken to the ``View Feature Correspondence'' page showing you the two features, their respective maps and types, and the evidences supporting this relationship. You can add more evidence types by clicking the link ``Add Evidence.'' You can also edit and delete existing evidences by clicking the appropriate links to the right of them. If you choose to ``Edit'' a correspondence, you will be presented a form with the following fields:
By default, CMap will only compare features of the same type when making name-based correspondences. You can expand the feature types considered by adding appropriate lines to ``cmap.conf.'' This is documented in that file; look for the string ``add_name_correspondence'' and follow the directions there.
perldoc Bio::GMOD::CMap::Admin::ImportCorrespondences
This method is the surest as you are always directly controlling what gets created.
The data underlying the correspondence matrix is all precomputed. As it is an intensive operation on seldom-changing data, it was determined to cache the pair-wise comparisons of all the maps in the database into a (very denormalized) table that would subsequently optimize the many calls for this data. Because of this, it is necessary that you remember to reload the matrix whenever you alter the number of correspondences in the database. To do this, execute cmap_admin.pl and choose the ``Reload correspondence matrix'' option. The only option is to completely truncate the table and reload it from scratch.
See the ``attributes-and-xrefs.pod'' document.
Now that we've stepped through the basics of setting up the CMap data, let's go over some of the more basic operations of curating the data.
From the home page of the web admin tool, you'll see that you can ``View Map Sets.'' Selecting this link takes you to the ``Map Sets View'' page where you'll notice three drop-down boxes that you can use to narrow the selection criteria for the map sets being displayed. You can restrict them by species, map type, and whether or not they are currently enabled. There is no automatic submission of the form when you make a choice (as you might want to use more than one criteria), so be sure to hit ``Submit'' when you've made your choices. As noted before, you can re-sort the data by clicking on the hyperlinked column headers. Every object in the web admin tool can be ``viewed,'' ``edited,'' and ``deleted'' by the respective links which are usually displayed to the right.
Note: On pages that may return a large record set, the data is ``paged'' and can be accessed by moving through the pages of the data. On all these pages, the total number of records is displayed along with which are currently being shown. This is the section that looks like ``52 records found. Showing 1 to 25.'' (The page size is determined by the ``max_child_elements'' option in the CMap configuration file which will be discussed shortly.)
Choose to ``view'' one of your map sets. You'll be taken to a page which lists all the data in the map set table as well as a summary of all the maps associated with the map set. Notice that you can change any of the data for the map set by clicking the ``edit'' link at the top, or you can delete it from here by choosing the ``delete'' link.
Note: Deleting an object that has dependencies will cause the dependencies to be deleted as well. So deleting a map will delete all the features (which will delete all the correspondences which will delete all the evidences [but not evidence types] supporting the correspondence). Deleting a map set deletes all the maps (which deletes all the features which ... you get the picture). Basically, just be very sure that you want to delete something as it can have cascading effects, and, as noted earlier, THERE IS NO UNDO. There is, however, the option to dump your data before messing with it exists, giving you the ability to recover. This is discussed in the POD documentation of cmap_admin.pl.
Choose to ``view'' a map to see a summary of all the data in the map table as well as all the features on the map. Notice that you can restrict the features displayed by their feature types. You can also search for a feature; the search option on the map view page automatically restricts the search to the current map being displayed.
If you click on ``Edit'' while viewing a map, you will be taken to a page with the following fields:
Choose to ``view'' a feature from the map view page. You'll be presented with all the data stored in the feature table as well as the feature's aliases, correspondences to other features and the evidence types supporting the correspondences. You can also add a new correspondence by clicking ``Add Correspondence'' and following the directions discussed earlier in section 8.
If you click on the ``Edit'' link from the feature view page, you'll be presented a form with the following fields:
Often you will be interested in finding an individual feature in the database without having to navigate to the map set, then the map, then page through until you find the feature. From the home page of the web admin tool, you can choose the ``Search for a Feature'' link. The ``Feature Search'' form gives you four fields:
string(s)
either in feature's ``feature_name'' (and
aliases) or feature_acc.
All the HTML displayed by the application is contained in the templates. These templates are processed by Template Toolkit to produce the user interface. For the most part, the files contain straight-up HTML and can be altered to your heart's content. You could probably even pass off the care and feeding of these templates to a non-technical person (as this was the idea behind having no HTML in the code). The only functional parts of the templates lie in between the many ``[% %]'' tags, and these are often quite self-explanatory. If you can't figure out what to change on your own, then check out http://www.template-toolkit.com/ for the documentation (or type ``perldoc Template'' on your command line).
The cmap_admin.pl script contains full documentation in POD format. To read, please execute ``perldoc <script_name>''.
That will describe how to use cmap_admin.pl using command line flags for scripting.
However, here is a description of the menu options that cmap_admin.pl provides.
There are four layers of the cache. When one layer is purged all of the layers after it are purged.
An example file is supplied, data/sample_saved_links.xml. You will need to change the accessions (*_acc) to reflect your database even if you used the test data (since the test data doesn't specify accessions).
Here are the main elements of the xml file.
The cmap_view element contains many options. The only required option is a slot with a ``number'' of ``0''. All others options are useful but will still load.
Each slot has either one or more map elements or a map_set set element.
Fo
<menu_options> <ft_DEFAULT>1</ft_DEFAULT> <image_size>large</image_size> </menu_options>
Note: If there are multiple CMap installations on the same machine, you may have to specify the --config_dir option to be sure that cmap_admin.pl is using the correct config directory.
cmap_reduce_cache_size.pl
cmap_reduce_cache_size.pl will help limit the growth of the query cache files.
This script cycles through each CMap data_source and (using the Cache::SizeAwareFileCache functionality) reduces the size of the query cache to the value given as 'max_query_cache_size' in the config file. It first removes any expired entries and then if it is still over the limit, it moves to last accessed entries. If 'max_query_cache_size' is not set, it will use the default value in Bio::GMOD::CMap::Constants. It is suggested that this script is run periodically as a cron job.
cmap_validate_config.pl
This script will test a config file to determine if it is valid or not. It currenly only tests the individual data_source files and not the global.conf.
Run it on individual config files.
$ validate_cmap_config.pl config_file.conf
The last line of the output is most important.
The config file, config_file.conf is valid. or The config file, config_file.conf is INVALID.
You can read through the output to find out why it is invalid, if that is the case.
The script will also output options that are missing but not required to let you know what other options you can add. It will also tell you if you are using any deprecated options.
The cmap_matrix_compare.pl script contains full documentation in POD format. To read, please execute ``perldoc <script_name>''.
Most likely, you'll want to link directly into the CMap viewer from some other part of your site.
To link to just one map, make it the ``reference'' map by using the accession IDs for the map itself and optionally,the map's parent ``set''. Here's an example showing just one map on Gramene, showing all the feature labels and highlighting the feature ``RM9'':
http://www.gramene.org/db/cmap/map_details?ref_map_set_aid=cu-dh-2001;ref_map_aids=cu-dh-2001-1;highlight="RM9";data_source=Gramene;label_features=all
Following should help you create your own CMap URL:
/cgi-bin/cmap/viewer?
Information about the map such as start and stop can be stored in this argument. See Map Accession Information Format for more details.
Information about the map such as start and stop can be stored in this argument. See Map Accession Information Format for more details.
Values: 0: don't aggregate 1: one line 2: two lines
Values: 0 : Completely ignore features of this type. 1 : Display only features of this type with displayed correspondences 2 : Display all features of this type
Note: Both the ft_* format and the feature_type_* format to define the same feature type, CMap will appear to randomly decide which one to use. So please be consistant and use ft_*.
Values: 0 : render all of the area boxes 1 : omit the feature area boxes 2 : omits all area boxes
Values: 0 : Completely ignore features of this type. 1 : Display only features of this type with displayed correspondences 2 : Display all features of this type
Note: Both the ft_* format and the feature_type_* format to define the same feature type, CMap will appear to randomly decide which one to use. So please be consistant and use ft_*.
You can use this to ignore or display certain evidence types. You can also filter evidence based on the score. If you do this, there MUST be an ets_* value for this evidence type accession.
If a correspondence has multiple evidences of different types and ANY are displayed, that correspondence will be displayed.
Values: 0 : Completely ignore this evidence type 1 : Display all evidences of this type 2 : Display if score is less than or equal the est_* score 3 : Display if score is greater than or equal the est_* score
Note: Both the et_* format and the evidence_type_* format to define the same evidence type, CMap will appear to randomly decide which one to use. So please be consistant and use et_*.
Any individual map magnification is then added on top of this.
Values: 0: don't scale 1: scale
Values: 0 : Lines connect features 1 : Lines connect maps
These arguments allow you to hide or show individual menus. Set to 1 to display and 0 to hide. They are hidden by default.
These will probably not be used in a constructed URL but they are discribed here in case you run across them and wondered what they do. Also, it's good to document them anyway.
Example: ``1=map_A:-1=map_D:0=map_C:1=map_B''
If the accession is defined as ``-1'' than all of the eligible maps in the map set are used.
See Map Accession Information Format for details on the format.
To fix this, modified_ref_map and modified_comp_map were added. When a JavaScript, area-box button is clicked, it modifies modified_ref_map or modified_comp_map instead of ref_map_aids or comparative_maps. That way, those fields always keep their original values.
To determine if the ``modified'' values should be used, CMap looks at the value of sub. If there is NO value, then it assumes that it is a JavaScript link and uses the ``modified'' values preferentially over the regular values.
Information about the map such as start and stop can be stored in this argument. See Map Accession Information Format for more details.
Information about the map such as start and stop can be stored in this argument. See Map Accession Information Format for more details.
Recently, the Map Details page has been folded into the regular viewer.
The URI of the Map Details page is ``/cgi-bin/cmap/map_details?''
Values: feature_name, feature_type_aid, start_position
Format: map_set_aid=set1 map_aid=map1
If you a are using these in a URL, please change the URL. They are not guaranteed to work.
Drawing information about maps (cropping and magnification) can be inserted with the map accession in the following fields.
The format is ``accession[start*stopxmagnification]''. Any position can be left out but the ``*'' is required. The ``x'' can be left out if there is no magnification.
Examples:
To include some number of comparative maps, you provide them in the ``comparative_maps'' argument, which is a single structured string that lists all the comparative maps and their placement relative to the reference map. Here I'd like to introduce the concept of ``slots,'' where the maps (or map sets) fall into a slot moving in positive and negative direction away from the reference map, which is in slot ``0,'' like so:
- - - - | | | | | | - | - | | | | - | | | | | | | | | | - | - - - - -1 0 1 2 <----negative --+---------positive----->
The above drawing is representative of Gramene's genetic maps in slots -1, 0, and 2, and a physical map in slot 1. Slots are separated in the string by URI-escaped colons and the integral parts of the slot by URI-escaped equal signs. Like so
"comparative_maps=" + .- <slot_number> + | "%3D" + slot | <"map_acc" or "map_set_acc"> + | "%3D" + `- <map_acc or map_set_acc> +
"%3A"
+
<next slot>
Here is a sample that puts ``Rice-Cornell RFLP 2001-1'' on the left (slot ``-1'') and ``Rice-CTIR 2000-1'' on the right (slot ``1''):
comparative_maps=1%3Dmap_acc%3D423%3A-1%3Dmap_acc%3D1
The middle part of the slot is one of the literal strings ``map_set_acc'' or ``map_acc.'' If you wanted to display just a single map in a slot, use the string ``map_acc'' and the map's accession ID. If you want a whole map set in a slot (e.g., a physical map set like ``I-Map''), then use the string ``map_set_acc'' and the map set's accession ID. You can use the admin interface (``/cgi-bin/cmap/admin'') to easily find the accession IDs.
There are several CMap modules that can be used by scripts outside of the cgi. To view more documentation on module use, execute 'perldoc Module_Name.pm``.
Modules that are particularily useful:
To use you must create an instance of another CMap object. You can use Bio::GMOD::CMap or Bio::GMOD::CMap::Admin or just about any other module. Most of the Generic.pm methods require you to pass it this CMap object (as cmap_object) to give it access to the configuration options.
To get access to the Generic.pm object, call ``sql()'' on the CMap object you created. Here is an example:
my $admin = Bio::GMOD::CMap::Admin->new( data_source => $data_source, ); my $sql_object = $admin->sql(); my $features = $sql_object->get_features( cmap_object => $admin, feature_type_accs => \@feature_type_accs, map_id => $map->{'map_id'}, );
The data/ directory has some example data. The sample-dump.sql.gz file contains a full set of CMap data. The tabtest* files also give some simple example files for importing.
If you would like to see other sample import data for CMap, you can find many examples from the Gramene site:
ftp://ftp.gramene.org/pub/gramene/CURRENT_RELEASE
There may be occasion to run multiple CMap installations on the same server. To do so, use the customizable install options during the ``perl Build.PL''. The following is an example that creates a secondary install in the /usr/local/apache2/htdocs/cmap2/ directory.
perl Build.PL \ PREFIX=/usr/local/apache2/ \ CONF=/usr/local/apache2/conf/cmap.conf2 \ WEB_DOCUMENT_ROOT=/usr/local/apache2/htdocs/ \ CGIBIN=/usr/local/apache2/htdocs/cmap2/cgi-bin \ HTDOCS=/usr/local/apache2/htdocs/cmap2/
Here is the breakdown of the options:
NOTE: It is important that the cgi script be executable by the web server. To do this in Apache 2, the following must be added to the httpd.conf.
<Directory /usr/local/apache2/htdocs/cmap2/cgi-bin> Options +ExecCGI SetHandler cgi-script </Directory>
The options not used in the example are ``TEMPLATE'', ``CACHE'' and ``SESSIONS''. These can also be set if so desired.
To access the CMap from the above example, the address will be ``http://127.0.0.1/cmap2/''. The cmap script will be at ``http://127.0.0.1/cmap2/cgi-bin/cmap''.
IMPORTANT NOTE: When creating the individual config files, the ``name'' field in ``<database>'' should be unique across the whole server. Using the same ``name'' for multiple config files, even if they are in different installs, can result in clashes over shared cache space.
OTHER NOTES:
In the ``docs'' directory, you will find two images that might help you understand the relationships in the CMap tables, ``cmap-schema-graph.png'' and ``cmap-schema.png,'' both of which attempt to describe the tables and fields and their relationships. See also ``cmap-schema-desc.html'' for an HTML document describing the schema. These documents were automatically generated from schema definitions via scripts included with SQL::Translator, a set of modules which grew out of the author's constant need to make schema changes and quickly replicate them amongst the different test databases. The Oracle, PostgreSQL, SQLite and Sybase schemas were also generated by SQL::Translator from the MySQL schema, so if you see room for improvement, please relay them to the author, preferably via the SQL:::Translator mailing list. SQL::Translator is available on CPAN. For more information, see here:
http://sqlfairy.sourceforget.net/
Here are the solutions to a few common problems:
To purge the cache use cmap/bin/cmap_admin.pl. You can either use the menu system or by command line:
$ cmap_admin.pl [-d data_source] --action purge_query_cache
See ``perldoc cmap_admin.pl'' for more options.
You can also leave a bug report for CMap at the SourceForge site for GMOD, http://sourceforge.net/projects/gmod/.
Ken Y. Clark, kclark@cshl.edu Ben Faga, faga@cshl.edu
Copyright (c) 2002-6 Cold Spring Harbor Laboratory