CMap Code Overview
$Revision: 1.10 $
CMap is a CGI application for viewing comparative and genetic maps. Written entirely in Perl, this application will run on many different operating systems and relational database management systems (RDBMS), including Oracle, MySQL, Sybase and PostgreSQL. CMap can create images using ``libgd'' for standard image formats like PNG and JPEG as well as creating SVG (Scalable Vector Graphics). The code was originally written for the Gramene project (http://www.gramene.org/), a comparative mapping resource for crop grasses, but much has been done to make the application generic enough to be used with many different types of data.
Care has been given to carefully separate functionally different parts of the code into different modules, roughly corresponding to a traditional ``three-tiered'' structure of layers for the data, the logic, and the presentation layers. You'll find all the database interaction encapsulated into the Bio::GMOD::CMap::Data* modules, all the ``logic'' (the code that lays out the map components) in the Bio::GMOD::CMap::Drawer* modules, and all the HTML generation in the Bio::GMOD::CMap::Apache* modules.
As stated above, all the database interaction happens in the Bio::GMOD::CMap::Data* modules. One goal of this project has always been compatibility with multiple RDBMSs (perhaps if only from necessity, as the system was developed using MySQL but is deployed on Oracle). As a consequence, all the SQL will be placed (eventually) into object-oriented modules where the statements can be sub-classed and modified to run with a particular database without affecting any other SQL.
The Bio::GMOD::CMap::Data module has as a component an ``SQL'' object, with the choices right now confined to Bio::GMOD::CMap::Data::[Generic|MySQL|Oracle]. The ``Generic'' module is the superclass of the other two (and conceivably any others, such as classes for PostgreSQL, Sybase, etc.). All SQL statement methods are defined in the Generic module, and any that don't work for a particular RDBMS can be overridden in a subclass. This also allows users of other systems to create their own modules and drop them into place with very little effort. All that need happen is to subclass Bio::GMOD::CMap::Data::Generic (as noted in the perldocs), and then add a line to the Bio::GMOD::CMap::Constants to point to the new module.
All the modules that actually do something toward laying out the comparative maps live in the Bio::GMOD::CMap::Drawer* namespace. The top level, ``Drawer.pm,'' is basically the coordinator of the objects it manipulates. The Drawer creates a ``Map'' object for each map (or map set) that the user has requested. It asks each Map to lay itself out, then it adjusts the frame, and writes the image to a file. It then is able to tell the calling object the filename of the image and its height and width.
Eventually other modules should fall within this classification, especially the module for administrative functions such as creating and editing maps sets, maps, features, correspondences, etc. All of those functions are currently spread around in the Bio::GMOD::CMap::Admin and Bio::GMOD::CMap::Apache::AdminViewer modules and the cmap_admin.pl script. Eventually I hope to move all the logic into Bio::GMOD::CMap::Admin and have the web- and command-line interfaces simply invoke methods on this Admin object.
The modules in the Bio::GMOD::CMap::Apache namespace are responsible for actually displaying the maps through a web interface. All of the modules are basic Perl classes and are objects inheriting from the Bio::GMOD::CMap::Apache superclass. This superclass creates the Template Toolkit object, the ``page'' object (see perldocs), and handles any errors thrown by the derived classes, reducing the amount of code to create a new handler.
You'll notice that there is no HTML mixed with Perl code as all the web pages are generated with the Template Toolkit Perl module (http://www.template-toolkit.com/) written by Andy Wardley. Template Toolkit is powerful and freely available Perl templating system, and the hope is that by using it, non-technical people who want to tweak the HTML to do so without interfering with the code.
There is Bio::GMOD::CMap::Config which handles the reading in and parsing of the config files.
All local configuration of CMap should be done through the ``cmap.conf'' directory. Of course, the directory doesn't have to be called ``cmap.conf.'' It can be called whatever you like, so long as the absolute path to the directory is in the Bio::GMOD::CMap::Constants file. This path is automatically written during installation if you do the standard ``perl Build.PL; ./Build; ./Build install'' process.
There are now two types of config files. There is the ``global.conf'' that handles information that all the data sources need, like the default data source. There is also one config file for each data source. This file is handles most of the configurable options.
There are defaults provided for most every option in the local config file with the exception of the database connection info and the template and image cache directories. The latter two should be set during installation, and the first should be set by the installer after installation (they are promted to do this after ``./Build install''). If you comment out any of the options in ``cmap.conf'' (except ``database,'' ``template_dir'' and ``cache_dir''), there are defaults in the Bio::GMOD::CMap::Constants file.
Feature, map and evidence types are now defined and controlled in the data source config files.
The web presentation modules are all located under the Bio::GMOD::CMap::Apache namespace and are instantiated as objects. In order to understand how they are invoked, I will describe how the main map viewer (Bio::GMOD::CMap::Apache::MapViewer) works.
map(s)
(features, relationships, map titles, etc.), layout all the map
elements, and finally write the image out to a temporary file.
If the image is created properly, then the drawer passes back to the
map viewer handler the name of the map image and the coordinates on
the map of elements (to make the image clickable) that the browser
needs to display the map.
If all goes well, then the handler uses Template Toolkit to format the
HTML so that the user sees a map that they can click on to see other
things.
The above scenario is probably the most involved process in the comparative maps, but it shows the way that distinct pieces of the problem are split into specialized modules and objects.
The tables used by the comparative maps follow a fairly rigid naming convention so that they should be able to integrate easily with existing databases.
select map.map_name, f.feature_start, from cmap_map map, cmap_feature f where upper(f.feature_name)=? and f.map_id=map.map_id
In the ``docs'' directory, you will find a schema diagram illustrating the structure and relationships of the tables. In the ``sql'' directory, you will find create statements of the tables for MySQL, Oracle and PostgreSQL. Following is a general description of each table, what kind of data it is supposed to hold, and how it fits in with the others. Most of the fields in the tables are described more fully in the ADMINSTRATION document when discussing the forms presented in the web admin tool.
Maps are of some type determined by the data curator. It is useful to researchers to know that a map is a ``genetic'' map and that the distances given are in ``centiMorgans'' as opposed to a ``physical'' map where the distances are ``bands.'' More importantly, though, the curator can decide how different maps are drawn so as to make them visually distinctive. By selecting from shapes, colors and widths, the curator has great discretion over the presentation of maps. Additionally, the curator can determine if the maps can stand alone (like most genetic maps) or if they are a special kind of map set that CMap calls ``relational,'' meaning that the maps can only be shown *in relation to* some other map (of the first, ``non-relational'' kind, no less). Examples of this include physical fingerprint contig maps (FPC), QTLs, and haplotypes. These maps are usually composed of many smaller fragments that have some correspondences to other maps. As such, they are just drawn to a size relative to the distance their ======= some other map. Examples of this include physical fingerprint contig maps (FPC), QTLs, and haplotypes. These maps are usually composed of many smaller fragments that have some correspondences to other maps. As such, they are just drawn to a size relative to the distance their >>>>>>> 1.3 correspondences cover on the reference map.
This is a ``permanent temp'' table used only to speed up the web queries. It is used to remember which maps occur in each of the ``slots'' while collating the data used to draw the maps for each request. The requests are identified by the PID (process ID) of the Apache child serving the user, and the requests *should* be deleted after the request has finished. Records have a timestamp that indicates the age of the request. If you see records in this table that are older than even a few minutes, it should be safe to delete them.
Each feature is of some type. What particular types of features exist are completely determined by the data curator. The data curator decides how to draw the feature on the map, selecting from pre-determined shapes and colors.
These are the types of evidence that we can use in the previous table. For instance, we can say two features are related because they have the same name or the same sequence. Some evidences are stronger than others, so they can be ranked accordingly.
If you'd like to write a script to access the database directly, you can get a handle to the database quite easily using the Bio::GMOD::CMap modules. Here's an example:
#!/usr/bin/perl
use strict; use Bio::GMOD::CMap;
my $cmap = Bio::GMOD::CMap->new or die Bio::GMOD::CMap->error;
# optional, only if you have muliple data sources defined # $cmap->data_source('Foo');
my $db = $cmap->db or die $cmap->error;
Ken Y. Clark, kclark@cshl.edu Ben Faga, faga@cshl.edu
Copyright (c) 2002-5 Cold Spring Harbor Laboratory