A friendly introduction to statistical concepts and reasoning with emphasis on developing statistical intuition rather than on mathematical rigor. Topics include design of experiments, descriptive statistics, correlation and regression, probability, chance variability, sampling, chance models, and tests of significance.
This course introduces core ideas in probability and statistics with a focus on building a foundation for data science. Topics include probability theory basics, common probability distributions, sampling and estimation, confidence intervals and hypothesis testing, and simple linear regression, resampling methods, smoothing techniques, and an introduction to the Bayesian inference. The course also offers a brief introduction to programming in R and Python.
Prerequisites: intermediate high school algebra. Designed for students in fields that emphasize quantitative methods. Graphical and numerical summaries, probability, theory of sampling distributions, linear regression, analysis of variance, confidence intervals and hypothesis testing. Quantitative reasoning and data analysis. Practical experience with statistical software. Illustrations are taken from a variety of fields. Data-collection/analysis project with emphasis on study designs is part of the coursework requirement.
This is only recitation for STAT UG1101. We are requesting 8 sections of recitation to align with the two sections of 1101 offered for Fall 2024.
Prerequisites: one semester of calculus. Designed for students who desire a strong grounding in statistical concepts with a greater degree of mathematical rigor than in STAT W1111. Random variables, probability distributions, pdf, cdf, mean, variance, correlation, conditional distribution, conditional mean and conditional variance, law of iterated expectations, normal, chi-square, F and t distributions, law of large numbers, central limit theorem, parameter estimation, unbiasedness, consistency, efficiency, hypothesis testing, p-value, confidence intervals, maximum likelihood estimation. Serves as the pre-requisite for ECON W3412.
Prerequisites: Previous or concurrent enrollment in a course in statistics would make the talks more accessible. Prepared with undergraduates majoring in quantitative disciplines in mind, the presentations in this colloquium focus on the interface between data analysis, computation, and theory in interdisciplinary research. Meetings are open to all undergraduates, whether registered or not. Presenters are drawn from the faculty of department in Arts and Sciences, Engineering, Public Health and Medicine.
Corequisites: An introductory course in statistic (STAT UN1101 is recommended). This course is an introduction to R programming. After learning basic programming component, such as defining variables and vectors, and learning different data structures in R, students will, via project-based assignments, study more advanced topics, such as conditionals, modular programming, and data visualization. Students will also learn the fundamental concepts in computational complexity, and will practice writing reports based on their data analyses.
Prerequisites: An introductory course in statistics (STAT UN1101 is recommended). Students without programming experience in R might find STAT UN2102 very helpful. Develops critical thinking and data analysis skills for regression analysis in science and policy settings. Simple and multiple linear regression, non-linear and logistic models, random-effects models. Implementation in a statistical package. Emphasis on real-world examples and on planning, proposing, implementing, and reporting.
Prerequisites: STAT UN2103 is strongly recommended. Students without programming experience in R might find STAT UN2102 very helpful. This course covers statistical models amd methods for analyzing and drawing inferences for problems involving categofical data. The goals are familiarity and understanding of a substantial and integrated body of statistical methods that are used for such problems, experience in anlyzing data using these methods, and profficiency in communicating the results of such methods, and the ability to critically evaluate the use of such methods. Topics include binomial proportions, two-way and three-way contingency tables, logistic regression, log-linear models for large multi-way contingency tables, graphical methods. The statistical package R will be used.
Prerequisites: STAT UN2103. Students without programming experience in R might find STAT UN2102 very helpful. This course is a machine learning class from an application perspective. We will cover topics including data-based prediction, classification, specific classification methods (such as logistic regression and random forests), and basics of neural networks. Programming in homeworks will require R.
This course provides a non-mathematical introduction to the principles and architectures of deep learning and generative AI models. Designed for undergraduates in the Applied Data Science minor, the curriculum covers the mathematical foundations of neural networks and their application to spatial, temporal, and multimodal data. Students will examine the mechanics of convolutional and recurrent architectures, the self-attention mechanism in Transformers, and the training objectives of Large Language Models (LLMs). The course also addresses optimization strategies, reinforcement learning for model alignment, and generative paradigms, including diffusion and autoregressive models. Emphasis is placed on understanding model internal representations, architectural tradeoffs, and the evaluation of complex AI systems.
Prerequisites: Calculus through multiple integration and infinite sums. A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, conditioning, expectations, law of large numbers, central limit theorem, point and confidence interval estimation, hypothesis tests, linear regression. This course replaces SIEO 4150.
Prerequisites: At least one semester, and preferably two, of calculus. An introductory course (STAT UN1201, preferably) is strongly recommended. A calculus-based introduction to probability theory. A quick review of multivariate calculus is provided. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.
Prerequisites: STAT GU4203. At least one semester of calculus is required; two or three semesters are strongly recommended. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GU4204 or the equivalent, and a course in linear algebra. Theory and practice of regression analysis. Simple and multiple regression, testing, estimation, prediction, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyse data.
Course Overview:
This course introduces Python programming, covering data structures, control-flow, objects, and functions, along with libraries like re, requests, numpy, pandas, scikit-learn, scipy, and more. These skills are applied to real-world data science tasks, including AB testing, data manipulation, modeling, optimization, simulations, and data visualization.
Students will develop computational thinking abilities, including problem decomposition, pattern recognition, data representation, abstraction, and algorithm design, through practical exercises.
Prerequisites: STAT GU4203 and two, preferably three, semesters of calculus. Review of elements of probability theory. Poisson processes. Renewal theory. Walds equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.
Prerequisites: STAT GU4205 or the equivalent. Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.
This course is an introduction to Causal Inference at the masters and advanced undergraduate
level. Students will be introduced to a broad range of causal inference methods including randomized
experiments, observational studies, instrumental variables, di?erence-in-di?erences,
regression discontinuity design, and synthetic controls. In addition, the course will cover modern,
controversial debates regarding the foundations and limitations of causal inference.
The primary learning goal of this course will be to familiarize students with a variety of the
most popular causal inference methods: which causal e?ects they seek to estimate, basic assumptions
required for identi?cation and estimation, and their practical implementation. To
this end, the course will focus both on developing the pre-requisite statistical / methodological
theory and as well as gaining hands-on experience through implementation exercises with
real datasets. By the end of the course, students should have deep familiarity of various causal
inference methods and—more importantly—be able to determine which method is most appropriate
for a given applied problem and to judge whether the pre-requisite identifying conditions
are appropriate.
Prerequisites: STAT GU4206. The course will provide an introduction to Machine Learning and its core models and algorithms. The aim of the course is to provide students of statistics with detailed knowledge of how Machine Learning methods work and how statistical models can be brought to bear in computer systems - not only to analyze large data sets, but to let computers perform tasks that traditional methods of computer science are unable to address. Examples range from speech recognition and text analysis through bioinformatics and medical diagnosis. This course provides a first introduction to the statistical methods and mathematical concepts which make such technologies possible.
This course covers various topics in advanced machine learning. Topics may include optimization algorithms, Python libraries for ML, principles for applied supervised and unsupervised learning, hyperparameter selection, computational trade-offs, modern neural network architectures such as ConvNets, LSTMs, and transformers for computer vision and natural language processing, and deep learning for LLMs.
Project-based topics course in data science and artificial intelligence. Students build a portfolio by implementing and applying modern data science methods.
Description.
Unsupervised Learning is a masters level course on foundations, methods, practice, and applications in machine learning from data without associated labels or outcomes. This course will focus on dimension reduction and clustering techniques while also covering graphical models, missing data imputation, anomaly detection, generative models, and others. The course will also emphasize conceptual understanding and practical applications of unsupervised learning in data visualization, exploratory data analysis, data pre-processing, and data-driven discovery.
Prerequisites: STAT GU4205 or the equivalent. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.
Prerequisites: STAT GU4204 or the equivalent. STAT GU4205 is recommended. Modeling and inference for random processes, from natural sciences to finance and economics. ARMA, ARCH, GARCH and nonlinear models, parameter estimation, prediction and filtering. This is a core course in the MS program in mathematical finance.
Prerequisites: STAT GU4203. STAT GU4207 is recommended. Basics of continuous-time stochastic processes. Wiener processes. Stochastic integrals. Ito's formula, stochastic calculus. Stochastic exponentials and Girsanov's theorem. Gaussian processes. Stochastic differential equations. Additional topics as time permits.
Prerequisites: STAT GU4264. Mathematical theory and probabilistic tools for modeling and analyzing security markets are developed. Pricing options in complete and incomplete markets, equivalent martingale measures, utility maximization, term structure of interest rates. This is a core course in the MS program in mathematical finance.
Prerequisites: STAT GU4205 and at least one statistics course numbered between GU4221 and GU4261. This is a course on getting the most out of data. The emphasis will be on hands-on experience, involving case studies with real data and using common statistical packages. The course covers, at a very high level, exploratory data analysis, model formulation, goodness of fit testing, and other standard and non-standard statistical procedures, including linear regression, analysis of variance, nonlinear regression, generalized linear models, survival analysis, time series analysis, and modern regression methods. Students will be expected to propose a data set of their choice for use as case study material.
Topics in Modern Statistics provide students with an opportunity to study a specialized area of statistics in more depth to meet the educational needs of a rapidly changing field.
Topics in Modern Statistics provide students with an opportunity to study a specialized area of statistics in more depth to meet the educational needs of a rapidly changing field.
Topics in Modern Statistics provide students with an opportunity to study a specialized area of statistics in more depth to meet the educational needs of a rapidly changing field.
Topics in Modern Statistics provide students with an opportunity to study a specialized area of statistics in more depth to meet the educational needs of a rapidly changing field.
Prerequisites: At least one semester of calculus. A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markovs inequality.
Prerequisites: STAT GR5203 or the equivalent, and two semesters of calculus. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals, maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GR5203 and GR5204 or the equivalent. Theory and practice of regression analysis, Simple and multiple regression, including testing, estimation, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyse data.
Course Overview:
This course introduces Python programming, covering data structures, control-flow, objects, and functions, along with libraries like re, requests, numpy, pandas, scikit-learn, scipy, and more. These skills are applied to real-world data science tasks, including AB testing, data manipulation, modeling, optimization, simulations, and data visualization.
Students will develop computational thinking abilities, including problem decomposition, pattern recognition, data representation, abstraction, and algorithm design, through practical exercises.
Prerequisites: STAT GR5205 Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.
This course is an introduction to Causal Inference at the masters level. Students will be introduced to a broad range of causal inference methods including randomized
experiments, observational studies, instrumental variables, di?erence-in-di?erences, regression discontinuity design, and synthetic controls. In addition, the course will cover modern, controversial debates regarding the foundations and limitations of causal inference.
The primary learning goal of this course will be to familiarize students with a variety of the most popular causal inference methods: which causal e?ects they seek to estimate, basic assumptions required for identi?cation and estimation, and their practical implementation. To this end, the course will focus both on developing the pre-requisite statistical / methodological theory and as well as gaining hands-on experience through implementation exercises with real datasets. By the end of the course, students should have deep familiarity of various causal inference methods and—more importantly—be able to determine which method is most appropriate
for a given applied problem and to judge whether the pre-requisite identifying conditions are appropriate.
Prerequisites: STAT GR5241 This course covers some advanced topics in machine learning and has an emphasis on applications to real world data. A major part of this course is a course project which consists of an in-class presentation and a written project report.
Description.
Unsupervised Learning is a masters level course on foundations, methods, practice, and applications in machine learning from data without associated labels or outcomes. This course will focus on dimension reduction and clustering techniques while also covering graphical models, missing data imputation, anomaly detection, generative models, and others. The course will also emphasize conceptual understanding and practical applications of unsupervised learning in data visualization, exploratory data analysis, data pre-processing, and data-driven discovery.
Prerequisites.
STAT GR 5206 Statistical Computing and Intro to Data Science
STAT GR 5241 Statistical Machine Learning (strongly recommended)
STAT GR 5205 Linear Regression (recommended)
STAT GR 5203 Probability (recommended)
Students should also be familiar with linear algebra.
Prerequisites: STAT GR5204 or the equivalent. STAT GR5205 is recommended. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.
Available to SSP, SMP Modeling and inference for random processes, from natural sciences to finance and economics. ARMA, ARCH, GARCH and nonlinear models, parameter estimation, prediction and filtering.
Prerequisites: STAT GR5203 or the equivalent. Basics of continuous-time stochastic processes. Wiener processes. Stochastic integrals. Ito's formula, stochastic calculus. Stochastic exponentials and Girsanov's theorem. Gaussian processes. Stochastic differential equations. Additional topics as time permits.
Prerequisites: STAT GR5264 Available to SSP, SMP. Mathematical theory and probabilistic tools for modeling and analyzing security markets are developed. Pricing options in complete and incomplete markets, equivalent martingale measures, utility maximization, term structure of interest rates.
Course Description
STAT GR5291 Advanced Data Analysis serves as one of the required capstone experiences for MA students in statistics. This course is project-based and covers advanced topics in traditional data analysis. Students are presented with a mix of theory and application in homework assignments. The final project is a major contribution to the final grade and is arguably considered the capstone project for the MA in Statistics Program.
Students will learn a myriad of topics related to data analysis and hypothesis testing, and are responsible for application through statistical packages or manual programming. Topics include, exploratory data analysis & descriptive statistics, review of sampling distribution, point estimation, review of hypothesis testing & confidence interval procedures, non-parametric tests, computational methods (Monte Carlo, bootstrap, permutation tests), categorical data analysis, linear regression, diagnostics & residual analysis, robust regression, model selection, non-linear regression & smoothers, aspects of experimental design (ANOVA, two-way ANOVA, blocking, multiple comparisons, ANCOVA, semi-parametric procedures, random effects models, mixed effects models, nested models, repeated measures), and general linear models (logistic regression, penalized logistic, multinomial regression, link functions).
Also, time permitting the class covers:
survival analysis (hazard function, survival curve), time series analysis (stationarity, ACF/PACF, MA, AR, ARMA, ARIMA, order selection, forecasting).
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
This upcoming fall, we are going to kick off the “Practitioners Seminar” course, where successful practitioners from various industry fields (tech, finance, insurance, pharmaceutical, etc..) will have a chance to meet our students and present the projects they work on, technologies they utilize to achieve their goals, solutions they came up with etc. In addition, guest speakers will share their career development path (what kind of obstacles they faced, what pitfalls to avoid, and in general give advice on career development in their fields). We will finish up the meeting with a Q&A session with students.
The course aims to teach MA in Statistics students how to manage their careers and develop professionally. Topics include resume and cover-letter writing, negotiation, mentoring, interviewing skills and communication across global teams. Top professionals from across the globe speak to students and help improve leadership skills.
This course is intended to provide a mechanism to MA students in Statistics who undertake on-campus project work or research. The course may be signed up with a faculty member from the Department of Statistics for academic credit. Students seeking to enroll in the course should identify an on-campus project and a congenial faculty member whose research is appealing to them, and who are able to serve as their mentor. Students should then submit an application to enroll in this course, which will be reviewed and approved by the Faculty Director of the MA in Statistics program.
Prerequisites: GR5203; GR5204 &GR5205 and at least 4 approved electives This course is an elective course for students in the M.A. in Statistics program that counts towards the degree requirements. To receive a grade and academic credits for this course, students are expected to engage in approved off-campus internships that can be counted as an elective. Statistical Fieldwork should provide students an opportunity to apply their statistical skills and gain practical knowledge on how statistics can be applied to solve real-world challenges.
A rigorous introduction to probability theory. Topics covered include probability spaces and measures, Borel-Cantelli lemma, zero-one laws, conditional probability, Bayes rule, independence, random variables and distribution functions, random vectors and multivariate distributions, expectation, important distributions, characteristic functions, conditional distributions, transformations of random variables, probability and expectation inequalities, laws of large numbers, central limit theorem.
A rigorous introduction to the theory of statistics. Topics covered include elementary decision theory, distribution of the sample mean, point estimation methods and asymptotic properties, the bias variance tradeoff, exponential families, sufficiency and minimal sufficiency, completeness, Lehmann Scheffe, UMVUE and BLUE, Bayes inference, Neyman-Pearson theory, hypothesis testing, most powerful unbiased tests, likelihood ratio tests, confidence sets.
This high-level course in linear regression delves deeply into the theoretical and geometric aspects of regression analysis, offering a comprehensive exploration of its foundational principles and advanced topics. Students will study regression within vector space contexts, emphasizing the role of inner products and orthogonal projections. The analysis of projection matrices will include their properties, such as idempotence and symmetry, and their implications for regression diagnostics and metrics. Students will explore why various test statistics follow t- and F-distributions, with careful attention to degrees of freedom and their derivations. As the course progresses, it will address the complexities of high dimensional regression scenarios.
This course is an introduction to probability and statistics for data science. Topics
include probability theory, probability distributions, simulations, parameters estima-
tion, hypothesis testing, simple regression. Python examples will be used throughout
the course for illustrations.
This course covers the following topics: Fundamentals of probability theory and statistical inference used in data science; Probabilistic models, random variables, useful distributions, expectations, law of large numbers, central limit theorem; Statistical inference; point and confidence interval estimation, hypothesis tests, linear regression.
This course is covers the following topics: fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification.
Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), STAT GR5701 or equivalent, and familiarity with a programming language (e.g. R, Python) for statistical data analysis. In this course, we will systematically cover fundamentals of statistical inference and modeling, with special attention to models and methods that address practical data issues. The course will be focused on inference and modeling approaches such as the EM algorithm, MCMC methods and Bayesian modeling, linear regression models, generalized linear regression models, nonparametric regressions, and statistical computing. In addition, the course will provide introduction to statistical methods and modeling that addresses various practical issues such as design of experiments, analysis of time-dependent data, missing values, etc. Throughpout the course, real-data examples will be used in lecture discussion and homework problems. This course lays the statistical foundation for inference and modeling using data, preparing the MS in Data Science students, for other courses in machine learning, data mining and visualization.
Each year, approximately 10–15% of MA in Statistics students participate in on-campus academic research, contributing to advances in statistical methodology and applied areas. Some projects demonstrate exceptional promise and benefit from additional time and support for further development. The MA Research Specialization in Statistics allows qualified students to extend their MA program to a fourth semester to continue their research under the supervision of a faculty mentor. This competitive, merit-based program requires demonstrated research progress, a nomination from a faculty mentor, and an outstanding academic record. STAT GR5999 serves as the course through which students admitted to the MA Research Specialization fulfill their research requirements.
First semester of the doctoral program sequence in applied statistics.
Prerequisites: STAT GR6102 Modern Bayesian methods offer an amazing toolbox for solving science and engineering problems. We will go through the book Bayesian Data Analysis and do applied statistical modeling using Stan, using R (or Python or Julia if you prefer) to preprocess the data and postprocess the analysis. We will also discuss the relevant theory and get to open questions in model building, computing, evaluation, and expansion. The course is intended for students who want to do applied statistics and also those who are interested in working on statistics research problems.
Prerequisites: STAT GR6102 or instructor permission. The Deparatments doctoral student consulting practicum. Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor.
Prerequisites: students in a masters program must seek the director of the M.A. program in statistics' permission; students in an undergraduate program must seek the director of undergraduate studies in statistics' permission. A general introduction to mathematical statistics and statistical decision theory. Elementary decision theory, Bayes inference, Neyman-Pearson theory, hypothesis testing, most powerful unbiased tests, confidence sets. Estimation: methods, theory, and asymptotic properties. Likelihood ratio tests, multivariate distribution. Elements of general linear hypothesis, invariance, nonparametric methods, sequential analysis.
Prerequisites: STAT G6201 and STAT G6201 This course will mainly focus on nonparametric methods in statistics. A tentavie list of topics to be covered include nonparametric density and regression function estimation -- upper bounds on the risk of kernel estimators and matching lower bounds on the minimax risk, reproducing kernel Hilbert spaces, bootstrap and resampling methods, multiple hypothesis testing, and high dimensional stastistical analysis.
Probabilistic Models and Machine Learning is a PhD-level course about how to design and use probability models. We study their mathematical properties, algorithms for computing with them, and applications to real problems. We study both the foundations and modern methods in this field. Our goals are to understand probabilistic modeling, to begin research that makes contributions to this field, and to develop good practices for building and applying probabilistic models.
Independent Study with Faculty Advisor must be registered for every semester after first academic year