Date of Award
Campus Access Thesis
Master of Science (MS)
Jill A. Macoska
High-Throughput Sequencing is one of the most promising tools available to researchers united by the common goals of personalized medicine. The first step in the development of targeted interventions for genetic diseases is identifying the genomic features common to a particular phenotype. Identifying a unique transcriptional signature through comparative analysis of RNA-Seq data provides a glimpse of the regulatory machinery responsible for the presentation of a diseased phenotype. But the development of a standardized downstream analysis procedure for identifying clinically informative feature from the massive sequence libraries produced during one RNA-Seq experiment remains an open area of research and requires computational resources that necessitate the use of a high-performance computing cluster. These circumstances create a skills-gap bottleneck that require molecular biologists to develop a new skill set with extensive knowledge of computer programming and software engineering. This bottleneck is further exacerbated by the difficulty involved in identifying high-confidence population-level bio-markers from small sample-size experiments with low statistical power. The Tailor Pipeline was developed to address these issues and facilitate bio-marker discovery from comparative RNA-Seq analyses between two or more conditions. The Tailor pipeline is operated via two word commands that simplify the use of high performance computing clusters. Tailor produces a visualization of the salient features of an RNA-Seq data set along with sorted, human readable files listing potential bio-markers calls identified from hypothesis tests of the pooled expression levels between two or more conditions. In a recent comparative analysis, Tailor analyzed RNA extracted from urine samples provided by patients with fibrosis-associated lower urinary tract syndrome (LUTS) and a non-symptomatic control group, and identified 370 genes and 30 biochemical pathways that were significantly differentially expressed between the groups. Repetitive analysis with other commonly used tool packages was employed to refine this list to 44 bio-markers that may serve as noninvasive diagnostics for fibrosis-associated LUTS, and potential targets for drug development. Tailor's sensitivity to differential gene expression profiles allows biologists to identify the causal, genetic mechanisms that contribute to diseased phenotypes from non-invasive tests.
Judell, Andrew S., "The Tailor RNA-Seq Comparative Analysis Pipeline: A De Novo Disease Biomarker Discovery Workflow that Facilitates High Performance Computing Cluster Use" (2019). Graduate Masters Theses. 588.