Like many fields of learning, biostatistics has its own vocabulary often seen in medical and public health literature. Phrases like statistical significance", "p-value less than 0.05", "95% confident", and "margin of error" can have enormous impact in a world that relies on statistics to make decisions: Should Drug A be recommended over Drug B? Should a national policy on X be implemented? Does Vitamin C truly prevent colds? However, do we really know what these terms and phrases mean? Understanding the theory and methodology behind study design, estimation and hypothesis testing is crucial to ensuring that findings and practices in public health and biomedicine are supported by reliable evidence.
This is a Public Health Course. Public Health classes are offered on the Health Services Campus at 168th Street. For more detailed course information, please go to Mailman School of Public Health Courses website at http://www.mailman.hs.columbia.edu/academics/courses
This course will provide an introduction to the basics of regression analysis. The class will proceed systematically from the examination of the distributional qualities of the measures of interest, to assessing the appropriateness of the assumption of linearity, to issues related to variable inclusion, model fit, interpretation, and regression diagnostics. We will primarily use scalar notation (i.e. we will use limited matrix notation, and will only briefly present the use of matrix algebra).
The main objective of this course is to provide Columbia University's Clinical & Translational Science award trainees, students, and scholars with skills and knowledge that will optimize their chances of entering into a satisfying academic career. The course will emphasize several methodological and practical issues related to the development of a science career. The course will also offer support and incentives by facilitating timely use of CTSA resources, obtaining expert reviews on writing and curriculum vitae, and providing knowledge and resources for the successful achievement of career goals.
The course aims to present the fundamental principles behind probability theory and lay the foundations for various kinds of statistical/biostatistical courses such as statistical inference, multivariate analysis, regression analysis, clinical trials, asymptotics, and so on. Students will learn how to implement probability methods in various types of applications.
Contemporary biostatistics and data analysis depends on the mastery of tools for computation, visualization, dissemination, and reproducibility in addition to proficiency in traditional statistical techniques. The goal of this course is to provide training in the elements of a complete pipeline for data analysis. It is targeted to MS, MPH, and PhD students with some data analysis experience.
The first portion of this course provides an introductory-level mathematical treatment of the fundamental principles of probability theory, providing the foundations for statistical inference. Students will learn how to apply these principles to solve a range of applications. The second portion of this course provides a mathematical treatment of (a) point estimation, including evaluation of estimators and methods of estimation; (b) interval estimation; and (c) hypothesis testing, including power calculations and likelihood ratio testing.
This course focuses on methods for the analysis of survival data, or time-to-event data. Survival analysis is a method for analyzing survival data or failure (death) time data, that is time-to-event data, which arises in a number of applied fields, such as medicine, biology, public health, epidemiology, engineering, economics, and demography. A special course of difficulty in the analysis of survival data is the possibility that some individual may not be observed for the full time to failure. Instead of knowing the failure time t, all we know about these individuals is that their time-to-failure exceeds some value y where y is the follow-up time of these individuals in the study. Students in this class will learn how to make inference for the event times with censored. Topics to be covered include survivor functions and hazard rates, parametric inference, life-table analysis, the Kaplan-Meier estimator, k-sample nonparametric test for the equality of survivor distributions, the proportional hazards regression model, analysis of competing risks and bivariate failure-time data.
This course will introduce the statistical methods for analyzing censored data, non-normally distributed response data, and repeated measurements data that are commonly encountered in medical and public health research. Topics include estimation and comparison of survival curves, regression models for survival data, logit models, log-linear models, and generalized estimating equations. Examples are drawn from the health sciences.
This course covers the fundamental principles and techniques of experimental designs in clinical studies. This is a required course for MS, DrPH and Ph.D. in Biostatistics. Topics include reliability of measurement, linear regression analysis, parallel groups design, analysis of variance (ANOVA), multiple comparison, blocking, stratification, analysis of covariance (ANCOVA), repeated measures studies; Latin squares design, crossover study, randomized incomplete block design, and factorial design.
A comprehensive overview of methods of analysis for binary and other discrete response data, with applications to epidemiological and clinical studies. It is a second level course that presumes some knowledge of applied statistics and epidemiology. Topics discussed include 2 × 2 tables, m × 2 tables, tests of independence, measures of association, power and sample size determination, stratification and matching in design and analysis, interrater agreement, logistic regression analysis.
This is an applied statistical methods course. The course will introduce main techniques used in sampling practice, including simple random sampling, stratification, systematic sampling, cluster sampling, probability proportional to size sampling, and multistage sampling. Using national health surveys as examples, the course will introduce and demonstrate the application of statistical methods in analysing across-sectional surveys and repeated and longitudinal surveys, and conducting multiple imputation for missing data in large surveys. Other topics will include methods for variance estimation, weighting, post-stratification, and non-sampling errors. If time allows, new developments in small area estimation and in the era of data science will also be discussed.
This is a course at the intersection of statistics and machine learning, focusing on graphical models. In complex systems with many (perhaps hundreds or thousands) of variables, the formalism of graphical models can make representation more compact, inference more tractable, and intelligent data-driven decision-making more feasible. We will focus on representational schemes based on directed and undirected graphical models and discuss statistical inference, prediction, and structure learning. We will emphasize applications of graph-based methods in areas relevant to health: genetics, neuroscience, epidemiology, image analysis, clinical support systems, and more. We will draw connections in lecture between theory and these application areas. The final project will be entirely “hands on,” where students will apply techniques discussed in class to real data and write up the results.
This one-semester course introduces basic applied descriptive and inferential statistics. The first part of the course includes elementary probability theory, an introduction to statistical distributions, principles of estimation and hypothesis testing, methods for comparison of discrete and continuous data including chi-squared test of independence, t-test, analysis of variance (ANOVA), and their non-parametric equivalents. The second part of the course focuses on linear models (regression) theory and their practical implementation.
Clinical trials are the pilars of clinical research. The main objective of this course is to prepare researchers to design and conduct complex clinical trials that yield valid and reliable results. The course emphasizes on several methodological and practical issues related to the design and analysis of clinical experiments. The course builds on the knowledge and skills gained in the course Randomized Clinical Trial (P8140). The objective of this course is to provide students with working knowledge of certain methodological issues that arise in designing a Clinical Trial. Topics include: Design of small studies (Phase I and II studies), Interim analyses and group sequential methods, Design of survival studies, Multiple outcome measures, Equivalency Trials, Multi-center studies, and trials with multiple outcome measures.
A good grasp of the fundamentals of Population Genetics is crucial for an understanding of any field of human genetics. This is precisely the aim of this course: to provide to students the key elements of Population Genetics with a view to equip them with the right tools to understand the field of genetics in general and to pursue further studies in human genetics. The course uses various evolutionary principles to explain key population genetics concepts.
The course will introduce students to statistical models and mthods for longitudinal data, i.e., repeatedly measured data over time or under different conditions. The topics will include design and sample size calculation, Hotelling's T^2, multivariate analysis of variance, multivariate linear regression (Generalized linear models), models for correlation, unbalanced repeated measurements, Mixed effects models, EM algorithm, methods for non-normally distributed data, Generalized estimating equations, Generalized linear mixed models, and Missing data.
In this course, you will learn to design and build relational databases in MySQL and to write and optimize queries using the SQL programming language. Application of skills learned in this course will be geared toward research and data science settings in the healthcare field; however, these skills are transferable to many industries and application areas. You will begin the course examining the pitfalls of using Excel spreadsheets as a data storage tool and then learn how to build properly-designed relational databases to eliminate the issues related to spreadsheets and maintain data integrity when storing and modifying data. You will then learn two aspects of the SQL programming language: 1) the data manipulation language (DML), which allows you to retrieve data from and populate data into database tables (e.g., SELECT, INSERT INTO, DELETE, UPDATE, etc.), and 2) the data definition language (DDL), which allows you to create and modify tables in a database (e.g., CREATE, ALTER, DROP, etc.). You will additionally learn how to optimize SQL queries for best performance, use advanced SQL functions, and utilize SQL within common statistical software programs: R and SAS.
The biostatistical field is changing with new directions emerging constantly. Doing research in these new directions, which often involve large data and complex designs, requires advanced probability and statistics tools. The purpose of this new course is to collect these important probability methods and present them in a way that is friendly to a biostatistics audience. This course is designed for PhD students in Biostatistics. Its primary objective is to help the students achieve a solid understanding of these probability methods and develop strong analytical skills that are necessary for conducting methodological research in modern biostatistics. At the completion of this course, the students will a) have a working knowledge in Law of Large Numbers, Central Limit Theorems, martingale theory, Brownian motions, weak convergence, empirical process, and Markov chain theory; b) be able to understand the biostatistical literature that involves such methods; c) be able to do proofs that call for such knowledge.
This course offers a general introduction to essential materials in advanced statistical theory for doctoral students in biostatistics. The course is designed to prepare doctoral students in biostatistics for their written theory qualifying exam. Students in this course will learn theory of estimation, confidence sets and hypothesis testing. Specific topics include a quick review of measure-theoretic probability theory, concepts of sufficiency and completeness, unbiased estimation (UMVUE), least squares principle, likelihood estimation, a variety of estimators and their asymptotic properties, confidence sets, the Neyman-Pearson lemma and uniformly most powerful tests. If time permits, the likelihood ratio test, score test and Wald test, and sequential analysis will be covered.
The aim of this course is to provide students a systematic training in key topics in modern supervised statistical learning and data mining. For the most part, the focus will remain on a theoretically sound understanding of the methods (learning algorithms) and their applications in complex data analysis, rather than proving technical theorems. Applications of the statistical learning and data mining tools in biomedical and health sciences will be highlighted.
This is an advanced course for first-year Ph.D. students in Biostatistics. The aim is to provide a solid foundation of the theory behind linear models and generalized linear models. More emphasis will be placed on concepts and theory with mathematical rigor. Topics covered including linear regression models, logistic regression models, generalized linear regression models and methods for the analysis contingency tables.
This seminar-style course will lead students through the process of writing a Master's Essay in the form of an NIH-style grant application (required for the MS/POR degree track). The essay is undertaken during the fall semester of the second year of study. At the end of the fall term, each student submits a written research proposal following NIH guidelines for either an R01 or K (career development) award. The emphasis in this course is on the quality of the proposed research. The following February, students make an oral presentation to the POR Advisory Board, summarizing the research proposal. Final grades are awarded after the presentations in February.
In this course, students will apply the concepts and methods introduced in Statistical Practices and Research for Interdisciplinary Science (SPRIS) I to a real research setting. Each student will be paired with a Biostatistics faculty member. The student will participate in one of the mentor’s collaborative projects to learn how to be an effective member of an interdisciplinary team. The relationship will mimic that between a medical resident and an attending physician.
The SPRIS II experience will vary depending on the assigned faculty member, but all students will gain exposure to preparing collaborative grant applications, designing research studies, analyzing real data, interpreting and presenting results, and writing manuscripts. Mentors will help to develop the student’s data intuition skills, ability to ask good research questions, and leadership qualities. Where necessary, students may replicate projects already completed by the faculty mentor to gain experience.
For appropriately qualified students wishing to enrich their programs by undertaking literature reviews, special studies, or small group instruction in topics not covered in formal courses.