Artificial Intelligence

AI Seminar - Kai Wang

Monday, June 20, 2011, 10:30am - 12:00pm PDTiCal
ISI, 11th Floor Large Conference Room
Dr. Kai Wang

High-throughput genomics platforms, such as whole-genome SNP arrays and next-generation sequencing machines, are producing massive amounts of genetic variation data. However, compared to SNP calling, the approaches to identify copy number variations (CNVs) from these platforms are less well developed. Here I describe the PennCNV, a hidden Markov model (HMM) based algorithm, for identifying CNVs from diverse SNP genotyping arrays and for dealing with the inherent technical artifacts in some arrays. I will further discuss the extension of the original HMM to model SNP data from a father-mother-child trio jointly, and the enhancement of the HMM with hundreds of hidden states to handle cancer genomes using multi-core CPU and GPU computing platforms. I will also describe recent developments in extending PennCNV for high-throughput sequencing data and addressing the unique challenges in sequencing data.

Dr. Wang obtained Bachelor’s degree in Biochemistry and Molecular Biology from Peking University at Beijing, China. He subsequently went to US and got a master’s degree in tumor biology in Mayo Clinic, Minnesota. He became interested in computational aspect of biological sciences, and pursued a Ph.D. focusing on Computational Biology at the University of Washington, working with Dr. Ram Samudrala on protein structure and function prediction. After graduation, he had postdoctoral training at the University of Pennsylvania and Children’s Hospital of Philadelphia, working on genomic analysis of human diseases. He developed the PennCNV software, which was one of the most widely used methods to detect copy number variations from high-density SNP arrays. He also developed the ANNOVAR functional annotation system for annotating genetic variants from high-throughput sequencing data. Dr. Kai Wang is currently an assistant professor of Psychiatry and Preventive Medicine at the Zilkha Neurogenetic Institute, University of Southern California. His major research focus in on next-generation sequencing data analysis, including the development of CNV calling methods and pathway-based association approaches.

« Return to Events