Sequence analysis is a broad field, covering any kinds of analyses of textual sequences; e.g. those representing genomes (DNA) and proteins (amino acids). The biological sequence analyses include determining genome structures, identifying protein-coding regions (genes), predicting gene function, inferring phylogenetic relationships, and ancestral reconstruction (Coghlan, 2011; Hall, 2017). Recent studies showed that genomics and phylogenetics can track spread and evolution of novel coronavirus ([https://nextstrain.org/]). The sequence analysis methods have been used not only in the field of biology, but also in genealogy of manuscripts (Barbrook et al., 1998) and quantitative evaluation of melodic similarity (Savage et al., 2018). Thus, text-processing skills necessary to analyze sequence data can be applied to the analysis of data in other fields.
This course will provide the introduction to the main tools and databases used in the analysis of sequence data and explains how these can be used together to answer biological questions. Examples of analysis include retrieving DNA and protein sequences from public databases, DNA sequence statistics (length, GC content, DNA words, and local variation in base composition), pairwise sequence alignment (dotplot, global sequence alignment, and local sequence alignment), multiple sequence alignment, and phylogenetic inference, etc.
Students from all disciplines will use the sequence analysis methods to tackle problems in their fields (biology, language, manuscript, music, etc.).