Zoology in the Classroom: Introduction to Bioinformatics

Biological sciences, traditionally, was involved primarily with the observation and descriptive study of organisms. This approach, over a period of time, gave rise to several subject areas that amassed large amounts of factual information on morphology, inheritance, anatomy, taxonomy, life cycle, physiology, ecological and environmental relationships and infectivity. Over a period of time, the scientific community became curious to know the “basis” for these characteristic features of living organisms and variations that exist among them. This shift in scientific paradigm prompted an in-depth understanding of the “molecular” basis of life forms. Beginning from identification of the genetic material (nucleic acids) to sequencing the entire genomes of several organisms, biology has now been substantially re-defined. In this endeavor, biologists were benefited immensely by inputs from physical, chemical and mathematical sciences. The study of biological systems with a “Why is it so?” approach gave birth to several new areas of research viz., molecular genetics, genomics, proteomics, recombinant DNA technology, transgenic technology, etc. Extensive work in these areas on different biological systems led to the generation of large volumes of data on linkage maps, genomes, transcriptomes, proteomes and molecular structures, analysis of which became impossible using manual approaches. Use of computational power to analyze biological data was increasingly felt to be an unavoidable option leading to the birth of a new science called “Bioinformatics”.

Why Bioinformatics

Imagine yourself trying to solve a complex mathematical calculation or trying to find a pattern in a jumbled up string of alphabets or numbers all by yourselves without the aid of any computational devices such as calculators or computers. Not only can such a task become extremely time consuming but may even turn out to be “unsolvable”. However, if you are to have a calculator or a computer for your help, the given task may be performed in a much shorter duration of time. Of course, you need to know how to operate the calculator/computer and the sequence of commands to be given to the machine! In an analogous scenario, understanding the meaning of just four letter of life, namely Adenine, Guanine, Cytosine and Thymidine (Uracil) as building block of life and storehouse of information can prove daunting, unless we are able to decipher the hidden meaning for the maintenance and functionality of the genome. A, G, C and T/U represent just one level of information content, and as we are familiar with the central dogma of Life, serves as the blueprint with message being conveyed from genetic material (DNA/RNA) to messenger RNA and eventually to proteins. Therefore at the minimum level, four nucleotides and twenty amino acids hold the entire key to life (we are not even discussing about the enormous variety of metabolites, biomolecules and other compounds that play a major role in functioning of Life).

Bioinformatics, therefore, attempts to unravel the genome information and can be understood to be comprising of two components:

Biology (bio) + Information Technology (informatics) = Computational Biology

It can be summarized as the use of information technology to generate, acquire, manage and analysis data related to biological sciences.

Computer and internet have played a major role and may be taken as the backbone on which the entire field of bioinformatics is flourishing.

Algorithms or computers programs are specialized programs/softwares written by specialists consisting of a well-defined set of steps for generation, storage and analysis of data.

The need for of development of high speed processing or computing of biological data was felt primarily on the account of the huge volume of sequencing data that was being generated. In a matter of 10 years, the cost of sequencing has dropped from nearly US$5200.00 per megabase in September 2001 to currently at 0.09cents per megabase in January 2012 (http://www.dnasequencing.org/history-of-dna).

From a few hundred megabases/year based on Sanger’s di-deoxy chain termination method of sequencing, today we can generate close to 6 billion bp/ two weeks using one of the Next Generation Sequencing machines (http://www.dnasequencing.org/history-of-dna; http://www.illumina.com/systems/hiseq_comparison.ilmn), the need for even higher performing computational tools are even greater!

Although bioinformatics is largely concerned with analysis of biological data using computational tools, it may be added that it has rapidly emerged as a multidisciplinary science that touches upon subject areas in all branches of science, including physical sciences, chemical sciences, mathematics, artificial intelligence and so on.

Today, bioinformatics can be applied to analysis of a variety of data and some of these are as given below:

DNA sequence:
- Annotation
- Analysis such as
  - Similarity search
  - functional information,
  - evolution,
  - polymorphism,

RNA level:
- Expression analysis using
  - Microarray
  - RNA sequencing
  - Structure prediction

Protein level:
- Domain and motif analysis
- Structure determination
- Evolution
- Functional role

Whole genome/cell/tissue/organism level:
- Genome structure and comparative genomics
- Interactome analysis
- Metabolic pathways

Drug design

The key to successful implementation of bioinformatics tools and their application is to organize the massive volumes of data in a user-friendly and easily accessible manner. The following section will introduce you to a few representative databases from the areas that have been listed above:

Zoology in the Classroom

Wednesday, October 2, 2013

Introduction to Bioinformatics

No comments:

Post a Comment