TITLE
VERSION
ARCHITECTURE OVERVIEW
DATA MODULES
LOGIC MODULES
PRESENTATION MODULES
CONFIGURATION MODULES
GENERAL FLOW FOR HANDLERS
SQL CONVENTIONS
TABLE DESCRIPTIONS
TIPS
AUTHOR

TITLE

CMap Code Overview

VERSION

$Revision: 1.10 $

CMap is a CGI application for viewing comparative and genetic maps. Written entirely in Perl, this application will run on many different operating systems and relational database management systems (RDBMS), including Oracle, MySQL, Sybase and PostgreSQL. CMap can create images using ``libgd'' for standard image formats like PNG and JPEG as well as creating SVG (Scalable Vector Graphics). The code was originally written for the Gramene project (http://www.gramene.org/), a comparative mapping resource for crop grasses, but much has been done to make the application generic enough to be used with many different types of data.

ARCHITECTURE OVERVIEW

Care has been given to carefully separate functionally different parts of the code into different modules, roughly corresponding to a traditional ``three-tiered'' structure of layers for the data, the logic, and the presentation layers. You'll find all the database interaction encapsulated into the Bio::GMOD::CMap::Data* modules, all the ``logic'' (the code that lays out the map components) in the Bio::GMOD::CMap::Drawer* modules, and all the HTML generation in the Bio::GMOD::CMap::Apache* modules.

DATA MODULES

As stated above, all the database interaction happens in the Bio::GMOD::CMap::Data* modules. One goal of this project has always been compatibility with multiple RDBMSs (perhaps if only from necessity, as the system was developed using MySQL but is deployed on Oracle). As a consequence, all the SQL will be placed (eventually) into object-oriented modules where the statements can be sub-classed and modified to run with a particular database without affecting any other SQL.

The Bio::GMOD::CMap::Data module has as a component an ``SQL'' object, with the choices right now confined to Bio::GMOD::CMap::Data::[Generic|MySQL|Oracle]. The ``Generic'' module is the superclass of the other two (and conceivably any others, such as classes for PostgreSQL, Sybase, etc.). All SQL statement methods are defined in the Generic module, and any that don't work for a particular RDBMS can be overridden in a subclass. This also allows users of other systems to create their own modules and drop them into place with very little effort. All that need happen is to subclass Bio::GMOD::CMap::Data::Generic (as noted in the perldocs), and then add a line to the Bio::GMOD::CMap::Constants to point to the new module.

LOGIC MODULES

All the modules that actually do something toward laying out the comparative maps live in the Bio::GMOD::CMap::Drawer* namespace. The top level, ``Drawer.pm,'' is basically the coordinator of the objects it manipulates. The Drawer creates a ``Map'' object for each map (or map set) that the user has requested. It asks each Map to lay itself out, then it adjusts the frame, and writes the image to a file. It then is able to tell the calling object the filename of the image and its height and width.

Eventually other modules should fall within this classification, especially the module for administrative functions such as creating and editing maps sets, maps, features, correspondences, etc. All of those functions are currently spread around in the Bio::GMOD::CMap::Admin and Bio::GMOD::CMap::Apache::AdminViewer modules and the cmap_admin.pl script. Eventually I hope to move all the logic into Bio::GMOD::CMap::Admin and have the web- and command-line interfaces simply invoke methods on this Admin object.

PRESENTATION MODULES

The modules in the Bio::GMOD::CMap::Apache namespace are responsible for actually displaying the maps through a web interface. All of the modules are basic Perl classes and are objects inheriting from the Bio::GMOD::CMap::Apache superclass. This superclass creates the Template Toolkit object, the ``page'' object (see perldocs), and handles any errors thrown by the derived classes, reducing the amount of code to create a new handler.

You'll notice that there is no HTML mixed with Perl code as all the web pages are generated with the Template Toolkit Perl module (http://www.template-toolkit.com/) written by Andy Wardley. Template Toolkit is powerful and freely available Perl templating system, and the hope is that by using it, non-technical people who want to tweak the HTML to do so without interfering with the code.

CONFIGURATION MODULES

There is Bio::GMOD::CMap::Config which handles the reading in and parsing of the config files.

All local configuration of CMap should be done through the ``cmap.conf'' directory. Of course, the directory doesn't have to be called ``cmap.conf.'' It can be called whatever you like, so long as the absolute path to the directory is in the Bio::GMOD::CMap::Constants file. This path is automatically written during installation if you do the standard ``perl Build.PL; ./Build; ./Build install'' process.

There are now two types of config files. There is the ``global.conf'' that handles information that all the data sources need, like the default data source. There is also one config file for each data source. This file is handles most of the configurable options.

There are defaults provided for most every option in the local config file with the exception of the database connection info and the template and image cache directories. The latter two should be set during installation, and the first should be set by the installer after installation (they are promted to do this after ``./Build install''). If you comment out any of the options in ``cmap.conf'' (except ``database,'' ``template_dir'' and ``cache_dir''), there are defaults in the Bio::GMOD::CMap::Constants file.

Feature, map and evidence types are now defined and controlled in the data source config files.

GENERAL FLOW FOR HANDLERS

The web presentation modules are all located under the Bio::GMOD::CMap::Apache namespace and are instantiated as objects. In order to understand how they are invoked, I will describe how the main map viewer (Bio::GMOD::CMap::Apache::MapViewer) works.

map(s)

The above scenario is probably the most involved process in the comparative maps, but it shows the way that distinct pieces of the problem are split into specialized modules and objects.

SQL CONVENTIONS

The tables used by the comparative maps follow a fairly rigid naming convention so that they should be able to integrate easily with existing databases.

- Primary key:: The primary key of the table is always defined as the first field in the table (though the first field in a table is not always guaranteed to be the primary key). A table's primary key will be the name of the table minus the ``cmap_'' prefix and the token ``_id.'' So, for example, the table ``cmap_feature'' has as its primary key ``feature_id.'' Additionally, the primary key of a table will always be an ascending integer value (like MySQL's ``auto_increment'' field, only handled in Perl code so as to be portable to databases without such types of fields). This naming convention is never used for any other type of field except the ``accession id'', which is exampled by ``feature_acc'' . It is always obvious the type and purpose of any field ending in ``_id'': if it is the same name as the table, then it is a primary key, else it is a foreign key.
- Boolean fields:: Since not all databases have a boolean datatype (``Yes/No,'' ``On/Off'' kind of data), the fields that hold this kind of data are declared using a small integer datatype (e.g., ``tinyint'' in MySQL). Names of boolean fields include a verb indicating something that the records ``is'' or ``has'' (or ``can'' or ``wants'' or whatever) some value, always in the affirmative, e.g. ``is_relational_map.''
- Date-time fields:: Date-time fields will be named like ``*_on,'' e.g., ``published_on.''

Table Aliases:

  select map.map_name,
         f.feature_start,
  from   cmap_map map,
         cmap_feature f
  where  upper(f.feature_name)=?
  and    f.map_id=map.map_id

Placeholders:

TABLE DESCRIPTIONS

In the ``docs'' directory, you will find a schema diagram illustrating the structure and relationships of the tables. In the ``sql'' directory, you will find create statements of the tables for MySQL, Oracle and PostgreSQL. Following is a general description of each table, what kind of data it is supposed to hold, and how it fits in with the others. Most of the fields in the tables are described more fully in the ADMINSTRATION document when discussing the forms presented in the web admin tool.

cmap_map_set:

cmap_map_type (depricated):

Maps are of some type determined by the data curator. It is useful to researchers to know that a map is a ``genetic'' map and that the distances given are in ``centiMorgans'' as opposed to a ``physical'' map where the distances are ``bands.'' More importantly, though, the curator can decide how different maps are drawn so as to make them visually distinctive. By selecting from shapes, colors and widths, the curator has great discretion over the presentation of maps. Additionally, the curator can determine if the maps can stand alone (like most genetic maps) or if they are a special kind of map set that CMap calls ``relational,'' meaning that the maps can only be shown *in relation to* some other map (of the first, ``non-relational'' kind, no less). Examples of this include physical fingerprint contig maps (FPC), QTLs, and haplotypes. These maps are usually composed of many smaller fragments that have some correspondences to other maps. As such, they are just drawn to a size relative to the distance their ======= some other map. Examples of this include physical fingerprint contig maps (FPC), QTLs, and haplotypes. These maps are usually composed of many smaller fragments that have some correspondences to other maps. As such, they are just drawn to a size relative to the distance their >>>>>>> 1.3 correspondences cover on the reference map.

cmap_species:

cmap_map:

cmap_map_cache (depricated):

This is a ``permanent temp'' table used only to speed up the web queries. It is used to remember which maps occur in each of the ``slots'' while collating the data used to draw the maps for each request. The requests are identified by the PID (process ID) of the Apache child serving the user, and the requests *should* be deleted after the request has finished. Records have a timestamp that indicates the age of the request. If you see records in this table that are older than even a few minutes, it should be safe to delete them.

cmap_feature:

cmap_feature_alias:

cmap_attribute:

cmap_feature_type (depricated):

Each feature is of some type. What particular types of features exist are completely determined by the data curator. The data curator decides how to draw the feature on the map, selecting from pre-determined shapes and colors.

cmap_feature_correspondence:

cmap_correspondence_evidence:

cmap_evidence_type:

These are the types of evidence that we can use in the previous table. For instance, we can say two features are related because they have the same name or the same sequence. Some evidences are stronger than others, so they can be ranked accordingly.

cmap_correspondence_lookup:

cmap_correspondence_matrix:

cmap_xref:

cmap_next_number:

cmap_saved_link:

TIPS

If you'd like to write a script to access the database directly, you can get a handle to the database quite easily using the Bio::GMOD::CMap modules. Here's an example:

  #!/usr/bin/perl

  use strict;
  use Bio::GMOD::CMap;

  my $cmap = Bio::GMOD::CMap->new or die Bio::GMOD::CMap->error;

  # optional, only if you have muliple data sources defined
  # $cmap->data_source('Foo');

  my $db = $cmap->db or die $cmap->error;

AUTHOR

Ken Y. Clark, kclark@cshl.edu Ben Faga, faga@cshl.edu