Coupling classical molecular biology techniques to high throughput sequencing for nucleic acid detection has revolutionized how scientists study biology. MEDS5420 introduces students to the command line and gradually builds concepts and skills so that students become capable of building workflows to process and analyze high throughput sequencing data. After basic concepts and command line competency are established, the course focuses on the analysis of ChIP-seq, RNA-seq, and ATAC-seq data. Students learn to use many common genomics software packages to perform such tasks as genome alignment, peak calling, motif analysis, and differential expression analysis. Throughout the course students are introduced to the statistical computing language R and perform analyses and visualization using an R interface.
The study of biology has been completely transformed with the data revolution. Analysis is often rate-limiting scientific progress, not data acquisition. Classically trained molecular biologists, biochemists, and cell biologists are often the best equipped to articulate and ask the most relevant questions regarding a dataset, but they may lack the expertise to carry out the appropriate analyses. The mere thought of analyzing such massive datasets can be intimidating. The main goal of this class is to prepare students so that they are comfortable navigating and analyzing massive datasets and to effectively remove any intimidation barrier.
The 2023 materials and lectures are all available at the following link: MEDS5420 2023