Like many fields of learning, biostatistics has its own vocabulary often seen in medical and public health literature. Phrases like statistical significance", "p-value less than 0.05", "95% confident", and "margin of error" can have enormous impact in a world that relies on statistics to make decisions: Should Drug A be recommended over Drug B? Should a national policy on X be implemented? Does Vitamin C truly prevent colds? However, do we really know what these terms and phrases mean? Understanding the theory and methodology behind study design, estimation and hypothesis testing is crucial to ensuring that findings and practices in public health and biomedicine are supported by reliable evidence.
This is a Public Health Course. Public Health classes are offered on the Health Services Campus at 168th Street. For more detailed course information, please go to Mailman School of Public Health Courses website at http://www.mailman.hs.columbia.edu/academics/courses
This course will provide an introduction to the basics of regression analysis. The class will proceed systematically from the examination of the distributional qualities of the measures of interest, to assessing the appropriateness of the assumption of linearity, to issues related to variable inclusion, model fit, interpretation, and regression diagnostics. We will primarily use scalar notation (i.e. we will use limited matrix notation, and will only briefly present the use of matrix algebra).
The main objective of this course is to provide Columbia University's Clinical & Translational Science award trainees, students, and scholars with skills and knowledge that will optimize their chances of entering into a satisfying academic career. The course will emphasize several methodological and practical issues related to the development of a science career. The course will also offer support and incentives by facilitating timely use of CTSA resources, obtaining expert reviews on writing and curriculum vitae, and providing knowledge and resources for the successful achievement of career goals.
The course aims to present the fundamental principles behind probability theory and lay the foundations for various kinds of statistical/biostatistical courses such as statistical inference, multivariate analysis, regression analysis, clinical trials, asymptotics, and so on. Students will learn how to implement probability methods in various types of applications.
Contemporary biostatistics and data analysis depends on the mastery of tools for computation, visualization, dissemination, and reproducibility in addition to proficiency in traditional statistical techniques. The goal of this course is to provide training in the elements of a complete pipeline for data analysis. It is targeted to MS, MPH, and PhD students with some data analysis experience.
The first portion of this course provides an introductory-level mathematical treatment of the fundamental principles of probability theory, providing the foundations for statistical inference. Students will learn how to apply these principles to solve a range of applications. The second portion of this course provides a mathematical treatment of (a) point estimation, including evaluation of estimators and methods of estimation; (b) interval estimation; and (c) hypothesis testing, including power calculations and likelihood ratio testing.
This course focuses on methods for the analysis of survival data, or time-to-event data. Survival analysis is a method for analyzing survival data or failure (death) time data, that is time-to-event data, which arises in a number of applied fields, such as medicine, biology, public health, epidemiology, engineering, economics, and demography. A special course of difficulty in the analysis of survival data is the possibility that some individual may not be observed for the full time to failure. Instead of knowing the failure time t, all we know about these individuals is that their time-to-failure exceeds some value y where y is the follow-up time of these individuals in the study. Students in this class will learn how to make inference for the event times with censored. Topics to be covered include survivor functions and hazard rates, parametric inference, life-table analysis, the Kaplan-Meier estimator, k-sample nonparametric test for the equality of survivor distributions, the proportional hazards regression model, analysis of competing risks and bivariate failure-time data.
This course will introduce the statistical methods for analyzing censored data, non-normally distributed response data, and repeated measurements data that are commonly encountered in medical and public health research. Topics include estimation and comparison of survival curves, regression models for survival data, logit models, log-linear models, and generalized estimating equations. Examples are drawn from the health sciences.
This course introduces students to advanced computational and statistical methods used in the design and analysis of high-dimensional genetic data, an area of critical importance in the current era of BIG DATA. The course starts with a brief background in genetics, followed by in depth discussion of topics in genome-wide linkage and association studies, and next-generation sequencing studies. Additional topics such as network genetics will also be covered. Examples from recent and ongoing applications to complex traits will be used to illustrate methods and concepts. Students are required to read relevant papers as assigned by the instructor, and each student is required to present a paper during class. Students are also required to work on a project related to the course material, with midterm evaluation of the progress.
We will use one main textbook: The fundamentals of Modern Statistical Genetics by Laird and Lange (Springer, 2012). For further reading, an excellent book is also Handbook of Statistical Genetics, Volume 1 (Wiley, 2007). Another good book is Mathematical and Statistical Methods for Genetic Analysis by Ken Lange (Springer 2002).
A comprehensive overview of methods of analysis for binary and other discrete response data, with applications to epidemiological and clinical studies. It is a second level course that presumes some knowledge of applied statistics and epidemiology. Topics discussed include 2 × 2 tables, m × 2 tables, tests of independence, measures of association, power and sample size determination, stratification and matching in design and analysis, interrater agreement, logistic regression analysis.
Substantive questions in empirical scientific and policy research are often causal. This class will introduce students to both statistical theory and practice of causal inference. As theoretical frameworks, we will discuss potential outcomes, causal graphs, randomization and model-based inference, causal mediation, and sufficient component causes. We will cover various methodological tools including randomized experiments, matching, inverse probability weighting, instrumental variable approaches, dynamic causal models, sensitivity analysis, statistical methods for mediation and interaction. We will analyze the strengths and weaknesses of these methods. The course will draw upon examples from social sciences, public health, and other disciplines. The instructor will illustrate application of the approaches using R/SAS/STATA software. Students will be evaluated and will deepen the understanding of the statistical principles underlying the approaches as well as their application in homework assignments, a take home midterm, and final take home practicum.
This is an applied statistical methods course. The course will introduce main techniques used in sampling practice, including simple random sampling, stratification, systematic sampling, cluster sampling, probability proportional to size sampling, and multistage sampling. Using national health surveys as examples, the course will introduce and demonstrate the application of statistical methods in analysing across-sectional surveys and repeated and longitudinal surveys, and conducting multiple imputation for missing data in large surveys. Other topics will include methods for variance estimation, weighting, post-stratification, and non-sampling errors. If time allows, new developments in small area estimation and in the era of data science will also be discussed.
This is a course at the intersection of statistics and machine learning, focusing on graphical models. In complex systems with many (perhaps hundreds or thousands) of variables, the formalism of graphical models can make representation more compact, inference more tractable, and intelligent data-driven decision-making more feasible. We will focus on representational schemes based on directed and undirected graphical models and discuss statistical inference, prediction, and structure learning. We will emphasize applications of graph-based methods in areas relevant to health: genetics, neuroscience, epidemiology, image analysis, clinical support systems, and more. We will draw connections in lecture between theory and these application areas. The final project will be entirely “hands on,” where students will apply techniques discussed in class to real data and write up the results.
This one-semester course introduces basic applied descriptive and inferential statistics. The first part of the course includes elementary probability theory, an introduction to statistical distributions, principles of estimation and hypothesis testing, methods for comparison of discrete and continuous data including chi-squared test of independence, t-test, analysis of variance (ANOVA), and their non-parametric equivalents. The second part of the course focuses on linear models (regression) theory and their practical implementation.
Students in this course will learn and practice the fundamental methods and concepts of the randomized clinical trial: protocol development, randomization, blindedness, patient recruitment, informed consent, compliance, sample size determination, crossovers, collaborative trials. Each student prepares and submits the protocol for a real or hypothetical clinical trial.
Clinical trials are the pilars of clinical research. The main objective of this course is to prepare researchers to design and conduct complex clinical trials that yield valid and reliable results. The course emphasizes on several methodological and practical issues related to the design and analysis of clinical experiments. The course builds on the knowledge and skills gained in the course Randomized Clinical Trial (P8140). The objective of this course is to provide students with working knowledge of certain methodological issues that arise in designing a Clinical Trial. Topics include: Design of small studies (Phase I and II studies), Interim analyses and group sequential methods, Design of survival studies, Multiple outcome measures, Equivalency Trials, Multi-center studies, and trials with multiple outcome measures.
A good grasp of the fundamentals of Population Genetics is crucial for an understanding of any field of human genetics. This is precisely the aim of this course: to provide to students the key elements of Population Genetics with a view to equip them with the right tools to understand the field of genetics in general and to pursue further studies in human genetics. The course uses various evolutionary principles to explain key population genetics concepts.
The course will introduce students to statistical models and mthods for longitudinal data, i.e., repeatedly measured data over time or under different conditions. The topics will include design and sample size calculation, Hotelling's T^2, multivariate analysis of variance, multivariate linear regression (Generalized linear models), models for correlation, unbalanced repeated measurements, Mixed effects models, EM algorithm, methods for non-normally distributed data, Generalized estimating equations, Generalized linear mixed models, and Missing data.
In this course, you will learn to design and build relational databases in MySQL and to write and optimize queries using the SQL programming language. Application of skills learned in this course will be geared toward research and data science settings in the healthcare field; however, these skills are transferable to many industries and application areas. You will begin the course examining the pitfalls of using Excel spreadsheets as a data storage tool and then learn how to build properly-designed relational databases to eliminate the issues related to spreadsheets and maintain data integrity when storing and modifying data. You will then learn two aspects of the SQL programming language: 1) the data manipulation language (DML), which allows you to retrieve data from and populate data into database tables (e.g., SELECT, INSERT INTO, DELETE, UPDATE, etc.), and 2) the data definition language (DDL), which allows you to create and modify tables in a database (e.g., CREATE, ALTER, DROP, etc.). You will additionally learn how to optimize SQL queries for best performance, use advanced SQL functions, and utilize SQL within common statistical software programs: R and SAS.
The biostatistical field is changing with new directions emerging constantly. Doing research in these new directions, which often involve large data and complex designs, requires advanced probability and statistics tools. The purpose of this new course is to collect these important probability methods and present them in a way that is friendly to a biostatistics audience. This course is designed for PhD students in Biostatistics. Its primary objective is to help the students achieve a solid understanding of these probability methods and develop strong analytical skills that are necessary for conducting methodological research in modern biostatistics. At the completion of this course, the students will a) have a working knowledge in Law of Large Numbers, Central Limit Theorems, martingale theory, Brownian motions, weak convergence, empirical process, and Markov chain theory; b) be able to understand the biostatistical literature that involves such methods; c) be able to do proofs that call for such knowledge.
This course offers a general introduction to essential materials in advanced statistical theory for doctoral students in biostatistics. The course is designed to prepare doctoral students in biostatistics for their written theory qualifying exam. Students in this course will learn theory of estimation, confidence sets and hypothesis testing. Specific topics include a quick review of measure-theoretic probability theory, concepts of sufficiency and completeness, unbiased estimation (UMVUE), least squares principle, likelihood estimation, a variety of estimators and their asymptotic properties, confidence sets, the Neyman-Pearson lemma and uniformly most powerful tests. If time permits, the likelihood ratio test, score test and Wald test, and sequential analysis will be covered.
This course will provide a comprehensive introduction to the field of asymptotic statistics. The treatment will be both practical and mathematically rigorous. The course will consist of two parts. The first will be a review of most of the standard topics of limit theory, such as the delta method and central limit theorems, while avoiding many technicalities. The second will present advanced topics such as semiparametric models, counting processes, empirical likelihood, the bootstrap, and empirical processes. These powerful research techniques are becoming increasingly important for the development of biostatistical methods to handle complex data sets. The overall goal of the course is to train students in the use of advanced asymptotic techniques for medical and public health applications. This course is intended for second-year Biostatistics Ph.D. students to provide a review of asymptotic statistics for the Ph.D. qualifying exam, and give them exposure to a variety of advanced topics.
The aim of this course is to provide students a systematic training in key topics in modern supervised statistical learning and data mining. For the most part, the focus will remain on a theoretically sound understanding of the methods (learning algorithms) and their applications in complex data analysis, rather than proving technical theorems. Applications of the statistical learning and data mining tools in biomedical and health sciences will be highlighted.
This is an advanced course for first-year Ph.D. students in Biostatistics. The aim is to provide a solid foundation of the theory behind linear models and generalized linear models. More emphasis will be placed on concepts and theory with mathematical rigor. Topics covered including linear regression models, logistic regression models, generalized linear regression models and methods for the analysis contingency tables.
This seminar-style course will lead students through the process of writing a Master's Essay in the form of an NIH-style grant application (required for the MS/POR degree track). The essay is undertaken during the fall semester of the second year of study. At the end of the fall term, each student submits a written research proposal following NIH guidelines for either an R01 or K (career development) award. The emphasis in this course is on the quality of the proposed research. The following February, students make an oral presentation to the POR Advisory Board, summarizing the research proposal. Final grades are awarded after the presentations in February.
In this course, students will apply the concepts and methods introduced in Statistical Practices and Research for Interdisciplinary Science (SPRIS) I to a real research setting. Each student will be paired with a Biostatistics faculty member. The student will participate in one of the mentor’s collaborative projects to learn how to be an effective member of an interdisciplinary team. The relationship will mimic that between a medical resident and an attending physician.
The SPRIS II experience will vary depending on the assigned faculty member, but all students will gain exposure to preparing collaborative grant applications, designing research studies, analyzing real data, interpreting and presenting results, and writing manuscripts. Mentors will help to develop the student’s data intuition skills, ability to ask good research questions, and leadership qualities. Where necessary, students may replicate projects already completed by the faculty mentor to gain experience.
For appropriately qualified students wishing to enrich their programs by undertaking literature reviews, special studies, or small group instruction in topics not covered in formal courses.