This is the last of five programming assignments in a semester-long CS-1-like course named DNA to introduce students to programming within the context of genomics: the analysis of DNA within a single cell of an organism. Originally, the course targeted students in the life sciences but it now attracts students across the academy. The goal of these assignments is to prepare students to obtain enough confidence with scripting and associated scientific write-ups to conduct a small computational experiment in a final project.
This programming assignment requests a Python program to generate a report that serves as a preliminary study to compare and contrast certain features of sequences between multiple organisms. Comparative genomics is the analysis and comparison of genomes from different species. This assignment focuses on “genomic signature” to help infer if a region of DNA is “like” other regions of DNA. The genomic signature refers to the “characteristic frequency of oligonucleotides (e.g., motifs of length 4 bp are referred to as tetramers) in a genome or sequence. It has been observed that the genomic signature of phylogenetically related genomes is similar.
For recommendations about this specific assignment as well as general comments for the entire set of DNA-focused programming assignments, please see the attached recommendations document.
We live in a post-genomic world where strings of sequenced DNA are the starting point for discovery from basic research to personalized medicine. In addition to the human genome, exciting interdisciplinary areas such as the computational explorations of the thousands of genomes in the microbial communities within us are leading to new definitions of personalized medical diagnosis and treatment. "If Charles Darwin had taken a couple of undergraduate interns with him on 'The Beagle', those students would have discovered, described and catalogued their share of new species ... therefore it is perhaps ironic that we are experiencing once again an age of exploration and discovery via the old fashion activities of collecting and cataloguing. This time it is not only organisms but DNA sequences ... That enticing, exhilarating idea of being on an expedition is (or could be) an aspect of DNA sequence analysis. The balance is tipped heavily toward vast, unknown territories of undeciphered data waiting to be explored" (LeBlanc and Dyer, 2007).
Note: I originally taught this course in Perl where we used the text written by my colleague in biology Betsey Dyer and myself. I now use Python in this course so I no longer use my own book, but the text includes some wonderful, interdisciplinary insights (most of which are written by Betsey Dyer). Perl for Exploring DNA (Oxford University Press, 2007).