This is a Public Health Course. Public Health classes are offered on the Health Services Campus at 168th Street. For more detailed course information, please go to Mailman School of Public Health Courses website at http://www.mailman.hs.columbia.edu/academics/courses
This course will provide an introduction to the US Food and Drug Administration (FDA) and the drug development and approval process, often referred to as the “Critical Path”. The class will begin with a review of the history and organization of the FDA, and analysis of the principle steps along the critical path, including preclinical testing, clinical testing (drug development phase 0 thru IV), Good Laboratory Practices, Good Manufacturing Practices, Good Clinical Practices, and adverse event reporting. Different types of FDA submissions (IND, NDA, ANDA, SPA, eCTD), and FDA meetings will be examined, along with accelerated drug approval strategies, orphan drug development strategies, generic drug development, and post-marketing Sponsor commitments. Throughout the class we will study the related legislation and regulations that empower FDA, and the interrelated FDA guidance documents that define FDA expectations.
This course will provide an introduction to the basics of regression analysis. The class will proceed systematically from the examination of the distributional qualities of the measures of interest, to assessing the appropriateness of the assumption of linearity, to issues related to variable inclusion, model fit, interpretation, and regression diagnostics. We will primarily use scalar notation (i.e. we will use limited matrix notation, and will only briefly present the use of matrix algebra).
This course will introduce students to core data science skills and concepts through the exploration of applied biostatistics. The course will begin with an introduction to the R programming language and the RStudio IDE, focusing on contemporary tidyverse functions and reproducible programming methods. Then, the course will instruct students in contemporary data manipulation and visualization tools while systematically covering core applied biostatistics topics, including confidence intervals, hypothesis testing, permutation tests, and logistic and linear regression. Finally, the semester will end with an introduction to machine learning concepts, including terminology, best practices in test/training sets, cross-validation, and a survey of contemporary classification and regression algorithms.
The main objective of this course is to provide Columbia University's Clinical & Translational Science award trainees, students, and scholars with skills and knowledge that will optimize their chances of entering into a satisfying academic career. The course will emphasize several methodological and practical issues related to the development of a science career. The course will also offer support and incentives by facilitating timely use of CTSA resources, obtaining expert reviews on writing and curriculum vitae, and providing knowledge and resources for the successful achievement of career goals.
With the explosion of “Big Data” problems, statistical learning has become a very hot field in many scientific areas. The goal of this course is to provide the training in practical statistical learning. It is targeted to MS students with some data analysis experience.
This course covers a review of mathematical statistics and probability theory at the Masters level. Students will be exposed to theory of estimation and hypothesis testing, confidence intervals and Bayesian inference. Topics include population parameters, sufficient statistics, basic distribution theory, point and interval estimation, introduction to the theory of hypothesis testing, and nonparametric procedures.
This course will introduce the statistical methods for analyzing censored data, non-normally distributed response data, and repeated measurements data that are commonly encountered in medical and public health research. Topics include estimation and comparison of survival curves, regression models for survival data, logit models, log-linear models, and generalized estimating equations. Examples are drawn from the health sciences.
This course introduces students to advanced computational and statistical methods used in the design and analysis of high-dimensional genetic data, an area of critical importance in the current era of BIG DATA. The course starts with a brief background in genetics, followed by in depth discussion of topics in genome-wide linkage and association studies, and next-generation sequencing studies. Additional topics such as network genetics will also be covered. Examples from recent and ongoing applications to complex traits will be used to illustrate methods and concepts. Students are required to read relevant papers as assigned by the instructor, and each student is required to present a paper during class. Students are also required to work on a project related to the course material, with midterm evaluation of the progress.
We will use one main textbook: The fundamentals of Modern Statistical Genetics by Laird and Lange (Springer, 2012). For further reading, an excellent book is also Handbook of Statistical Genetics, Volume 1 (Wiley, 2007). Another good book is Mathematical and Statistical Methods for Genetic Analysis by Ken Lange (Springer 2002).
A comprehensive overview of methods of analysis for binary and other discrete response data, with applications to epidemiological and clinical studies. It is a second level course that presumes some knowledge of applied statistics and epidemiology. Topics discussed include 2 × 2 tables, m × 2 tables, tests of independence, measures of association, power and sample size determination, stratification and matching in design and analysis, interrater agreement, logistic regression analysis.
Regression analysis is widely used in biomedical research. Non-continuous (e.g., binary or count-valued) responses, correlated observations, and censored data are frequently encountered in regression analysis. This course will introduce advanced statistical methods to address these practical problems. Topics include generalized linear models (GLM) for non-Gaussian response, mixed-effects models and generalized estimating equations (GEE) for correlated observations, and Cox proportional hazards models for survival data analysis. Examples are drawn from biomedical sciences.
This course explores the theoretical foundations underlying the models and techniques used in mathematical genetics and genetic epidemiology. Topics include use and interpretation of likelihood methods, formulation of mathematical models, segregation analysis, ascertainment bias, linkage analysis, genetic heterogeneity, and complex genetic models. The course includes lectures, discussions, homework problems, and a final exam. My single most important objective for this course is for students to be able to break down any mathematical modeling problem logically into all its component parts, to express each part" accurately, and to know how to "add" all the pieces back up and to check the accuracy of their result."
Students in this course will learn and practice the fundamental methods and concepts of the randomized clinical trial: protocol development, randomization, blindedness, patient recruitment, informed consent, compliance, sample size determination, crossovers, collaborative trials. Each student prepares and submits the protocol for a real or hypothetical clinical trial.
The drug development from compound discovery to marketing and commercialization registration is a lengthy and complex process in which statisticians play an important role from the beginning to the end. The main objective of this course is to provide students with working knowledge of methodological and operational issues that arise in different stages of the drug development that involve statistical contributions.
Topics include: Introduction of drug development; design and analysis of non-clinical studies (toxicology, pharmacokinetics and pharmacodynamics) and Phase I/II/III studies; issues in clinical studies including non-inferiority, meta-analysis, and endpoint selection; overview of safety reporting systems such as MedDRA (Medical Dictionary for Regulatory Activities), CTC version 3 (Common Terminology Criteria for Adverse Events), and preparation for the FDA advisory committee drug approval process. In addition, the views and positions of different regulatory bodies, such as the FDA or EMEA, on design and analysis issues will be discussed.
This course is designed for those students (or any researchers) who want to gain a significant familiarity with a collection of statistical techniques that target the measurement of latent variables (i.e. variables that cannot be measured directly) as well as methods for estimating relationships among variables within causal systems. This course covers: both continuous and categorical latent variable measurement models (i.e. exploratory and confirmatory factor analysis, item response theory models, latent class and finite mixture models), as well as estimation of relationships in hypothesized causal systems using structural equation modeling. Data analysis examples will come from health science applications and practical implementation of all methods will be demonstrated using predominately the Mplus software, but also the R software.
As statistical models become increasingly complex, it is often the case that exact or even asymptotic distributions of estimators and test statistics are intractable. With the continuing improvement of processor speed, computationally intensive methods have become invaluable tools for statisticians to use in practice. This course will cover the basic modern statistical computing techniques and how they are applied in a variety of practical situations. Topics to be covered include numerical optimization, random number generation, simulation, Monte Carlo integration, permutation tests, jackknife and bootstrap procedures, Markov chain Monte Carlo methods in Bayesian settings, and the EM algorithm.
In this course students will synthetize knowledge from the core with knowledge from both specific department required courses and from certificate required courses. The course deliverable is a written paper combining analyses of a student’s selected data set that uses two of the following methods: (linear regression, logistic regression, nonlinear modeling, mixed effect modeling, machine learning, survival analyses). Students will demonstrate understanding of summarizing (numerically and graphically) data for purposes of specific analyses, presenting results, and interpreting them in the context of public health. Finally, students will also demonstrate the ability to present various stages of the analyses, to ask questions in large collaborative settings, and to troubleshoot their work.
In this course, you will learn to design and build relational databases in MySQL and to write and optimize queries using the SQL programming language. Application of skills learned in this course will be geared toward research and data science settings in the healthcare field; however, these skills are transferable to many industries and application areas. You will begin the course examining the pitfalls of using Excel spreadsheets as a data storage tool and then learn how to build properly-designed relational databases to eliminate the issues related to spreadsheets and maintain data integrity when storing and modifying data. You will then learn two aspects of the SQL programming language: 1) the data manipulation language (DML), which allows you to retrieve data from and populate data into database tables (e.g., SELECT, INSERT INTO, DELETE, UPDATE, etc.), and 2) the data definition language (DDL), which allows you to create and modify tables in a database (e.g., CREATE, ALTER, DROP, etc.). You will additionally learn how to optimize SQL queries for best performance, use advanced SQL functions, and utilize SQL within common statistical software programs: R and SAS.
The Capstone Consulting Seminar is a required course for the M.S. Theory and Methods track and M.P.H. students in Biostatistics. It provides experience in the art of consulting and in the proper application of statistical techniques to public health and medical research problems. Students will bring together the skills they have acquired in previous coursework and apply them to the consulting experience. Learning will take place by doing. Over the course of the semester students will attend consultation sessions of the department's Biostatistics Consultation Service. Students will participate in the consultation interaction and will present their report in class for discussion or comment on another student's presentation.
This course provides a general introduction to mathematical statistics and statistical decision theory for doctoral students in biostatistics. It covers elementary decision theory, Bayes rules, Neyman-Pearson theory, uniformly most power tests, similar tests, uniformly most powerful unbiased tests, confidence sets; basic asymptotic criteria, estimation methods and their asymptotic properties, M-estimators, U-statistics, statistical functionals; likelihood ratio tests, Wald tests, Rao's score tests, and their asymptotic properties, Pitman efficiency. This course will prepare students to their theory qualifying exam.
This course serves as the cumulative experience for those in the Clinical Research Methods (CRM) Track in the Department of Biostatistics. By the end of the semester, students are expected to produce a submission ready manuscript to a journal appropriate to their field of study.
The Statistical Practices and Research for Interdisciplinary Sciences (SPRIS, P9185) is a required course for the PhD and DrPH students in the Department of Biostatistics. The goal is to prepare doctoral students to be effective statisticians to collaborate in an interdisciplinary team and to identify novel statistical research problems with important public health and medical applications. The course aims to provide guidelines and insights of the arts and sciences of consulting, collaboration, and translation of statistical methods to medical studies. Practically useful technical skills acquired from previous coursework will be enhanced and illustrated through applications to real world problems in class projects. Important statistical issues currently undergoing extensive debate will be introduced. Examples of conducting original statistical research to develop new methods addressing real world challenges will be discussed. Career-development related topics will be covered to prepare students to become effective independent and interdisciplinary researchers. Class projects will showcase examples of how to analyze real world data.
For appropriately qualified students wishing to enrich their programs by undertaking literature reviews, special studies, or small group instruction in topics not covered in formal courses.