A friendly introduction to statistical concepts and reasoning with emphasis on developing statistical intuition rather than on mathematical rigor. Topics include design of experiments, descriptive statistics, correlation and regression, probability, chance variability, sampling, chance models, and tests of significance.
Prerequisites: intermediate high school algebra. Designed for students in fields that emphasize quantitative methods. Graphical and numerical summaries, probability, theory of sampling distributions, linear regression, analysis of variance, confidence intervals and hypothesis testing. Quantitative reasoning and data analysis. Practical experience with statistical software. Illustrations are taken from a variety of fields. Data-collection/analysis project with emphasis on study designs is part of the coursework requirement.
Prerequisites: one semester of calculus. Designed for students who desire a strong grounding in statistical concepts with a greater degree of mathematical rigor than in STAT W1111. Random variables, probability distributions, pdf, cdf, mean, variance, correlation, conditional distribution, conditional mean and conditional variance, law of iterated expectations, normal, chi-square, F and t distributions, law of large numbers, central limit theorem, parameter estimation, unbiasedness, consistency, efficiency, hypothesis testing, p-value, confidence intervals, maximum likelihood estimation. Serves as the pre-requisite for ECON W3412.
Prerequisites: Previous or concurrent enrollment in a course in statistics would make the talks more accessible. Prepared with undergraduates majoring in quantitative disciplines in mind, the presentations in this colloquium focus on the interface between data analysis, computation, and theory in interdisciplinary research. Meetings are open to all undergraduates, whether registered or not. Presenters are drawn from the faculty of department in Arts and Sciences, Engineering, Public Health and Medicine.
Corequisites: An introductory course in statistic (STAT UN1101 is recommended). This course is an introduction to R programming. After learning basic programming component, such as defining variables and vectors, and learning different data structures in R, students will, via project-based assignments, study more advanced topics, such as conditionals, modular programming, and data visualization. Students will also learn the fundamental concepts in computational complexity, and will practice writing reports based on their data analyses.
Prerequisites: An introductory course in statistics (STAT UN1101 is recommended). Students without programming experience in R might find STAT UN2102 very helpful. Develops critical thinking and data analysis skills for regression analysis in science and policy settings. Simple and multiple linear regression, non-linear and logistic models, random-effects models. Implementation in a statistical package. Emphasis on real-world examples and on planning, proposing, implementing, and reporting.
This is a course in intermediate statistical inference techniques in the context of applied research
questions in data science. Assuming some prior exposure to probability and statistics, this course will
first introduce the student to the principles of Bayesian inference, then apply them in estimation and
prediction in the context of linear and generalized linear models, counting and classification, mixture and
multilevel models, including scientific computation (like MCMC methods). Students will also learn
about the main benefits of using Bayesian vs. frequentist methods, like naturally combining prior
information with the data; posterior probabilities as easier to interpret alternatives to p-values; parameter
estimation “pooling” in hierarchical model and so on.
Prerequisites: At least one, and preferably both, of STAT UN2103 and UN2104 are strongly recommended. Students without programming experience in R might find STAT UN2102 very helpful. This course is intended to give students practical experience with statistical methods beyond linear regression and categorical data analysis. The focus will be on understanding the uses and limitations of models, not the mathematical foundations for the methods. Topics that may be covered include random and mixed-effects models, classical non-parametric techniques, the statistical theory causality, sample survey design, multi-level models, generalized linear regression, generalized estimating equations and over-dispersion, survival analysis including the Kaplan-Meier estimator, log-rank statistics, and the Cox proportional hazards regression model. Power calculations and proposal and report writing will be discussed.
Prerequisites: the project mentors permission. This course provides a mechanism for students who undertake research with a faculty member from the Department of Statistics to receive academic credit. Students seeking research opportunities should be proactive and entrepreneurial: identify congenial faculty whose research is appealing, let them know of your interest and your background and skills.
Prerequisites: the project mentors permission. This course provides a mechanism for students who undertake research with a faculty member from the Department of Statistics to receive academic credit. Students seeking research opportunities should be proactive and entrepreneurial: identify congenial faculty whose research is appealing, let them know of your interest and your background and skills.
Prerequisites: Calculus through multiple integration and infinite sums. A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, conditioning, expectations, law of large numbers, central limit theorem, point and confidence interval estimation, hypothesis tests, linear regression. This course replaces SIEO 4150.
Prerequisites: At least one semester, and preferably two, of calculus. An introductory course (STAT UN1201, preferably) is strongly recommended. A calculus-based introduction to probability theory. A quick review of multivariate calculus is provided. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.
Prerequisites: At least one semester, and preferably two, of calculus. An introductory course (STAT UN1201, preferably) is strongly recommended. A calculus-based introduction to probability theory. A quick review of multivariate calculus is provided. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.
Prerequisites: STAT GU4203. At least one semester of calculus is required; two or three semesters are strongly recommended. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GU4203. At least one semester of calculus is required; two or three semesters are strongly recommended. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GU4204 or the equivalent, and a course in linear algebra. Theory and practice of regression analysis. Simple and multiple regression, testing, estimation, prediction, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyse data.
Prerequisites: STAT GU4204 and GU4205 or the equivalent. Introduction to programming in the R statistical package: functions, objects, data structures, flow control, input and output, debugging, logical design, and abstraction. Writing code for numerical and graphical statistical analyses. Writing maintainable code and testing, stochastic simulations, paralleizing data analyses, and working with large data sets. Examples from data science will be used for demonstration.
Prerequisites: STAT GU4203 and two, preferably three, semesters of calculus. Review of elements of probability theory. Poisson processes. Renewal theory. Walds equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.
Prerequisites: STAT GU4205 or the equivalent. Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.
This course introduces the Bayesian paradigm for statistical inference. Topics covered include prior and posterior distributions: conjugate priors, informative and non-informative priors; one- and two-sample problems; models for normal data, models for binary data, Bayesian linear models; Bayesian computation: MCMC algorithms, the Gibbs sampler; hierarchical models; hypothesis testing, Bayes factors, model selection; use of statistical software.
Prerequisites: A course in the theory of statistical inference, such as STAT GU4204 a course in statistical modeling and data analysis, such as STAT GU4205.
Prerequisites: Pre-requisite for this course includes working knowledge in Statistics and Probability, data mining, statistical modeling and machine learning. Prior programming experience in R or Python is required. This course will incorporate knowledge and skills covered in a statistical curriculum with topics and projects in data science. Programming will be covered using existing tools in R. Computing best practices will be taught using test-driven development, version control, and collaboration. Students finish the class with a portfolio of projects, and deeper understanding of several core statistical/machine-learning algorithms. Short project cycles throughout the semester provide students extensive hands-on experience with various data-driven applications.
Prerequisites: STAT GU4205 or the equivalent. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.
Prerequisites: STAT GU4204 or the equivalent. STAT GU4205 is recommended. Modeling and inference for random processes, from natural sciences to finance and economics. ARMA, ARCH, GARCH and nonlinear models, parameter estimation, prediction and filtering. This is a core course in the MS program in mathematical finance.
Prerequisites: STAT GU4203. STAT GU4207 is recommended. Basics of continuous-time stochastic processes. Wiener processes. Stochastic integrals. Ito's formula, stochastic calculus. Stochastic exponentials and Girsanov's theorem. Gaussian processes. Stochastic differential equations. Additional topics as time permits.
Prerequisites: STAT GU4205 and at least one statistics course numbered between GU4221 and GU4261. This is a course on getting the most out of data. The emphasis will be on hands-on experience, involving case studies with real data and using common statistical packages. The course covers, at a very high level, exploratory data analysis, model formulation, goodness of fit testing, and other standard and non-standard statistical procedures, including linear regression, analysis of variance, nonlinear regression, generalized linear models, survival analysis, time series analysis, and modern regression methods. Students will be expected to propose a data set of their choice for use as case study material.
Prerequisites: At least one semester of calculus. A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markovs inequality.
Prerequisites: STAT GR5203 or the equivalent, and two semesters of calculus. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals, maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GR5203 and GR5204 or the equivalent. Theory and practice of regression analysis, Simple and multiple regression, including testing, estimation, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyse data.
Corequisites: STAT GR5204 and GR5205 or the equivalent. Introduction to programming in the R statistical package: functions, objects, data structures, flow control, input and output, debugging, logical design, and abstraction. Writing code for numerical and graphical statistical analyses. Writing maintainable code and testing, stochastic simulations, paralleizing data analyses, and working with large data sets. Examples from data science will be used for demonstration.
Corequisites: STAT GR5204 and GR5205 or the equivalent. Introduction to programming in the R statistical package: functions, objects, data structures, flow control, input and output, debugging, logical design, and abstraction. Writing code for numerical and graphical statistical analyses. Writing maintainable code and testing, stochastic simulations, paralleizing data analyses, and working with large data sets. Examples from data science will be used for demonstration.
Corequisites: GR5203 or the equivalent. Review of elements of probability theory. Poisson processes. Renewal theory. Walds equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.
Prerequisites: STAT GR5205 Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.
This course introduces the Bayesian paradigm for statistical inference. Topics covered include prior and posterior distributions: conjugate priors, informative and non-informative priors; one- and two-sample problems; models for normal data, models for binary data, Bayesian linear models, Bayesian computation: MCMC algorithms, the Gibbs sampler; hierarchical models; hypothesis testing, Bayes factors, model selection; use of statistical software.
Prerequisites: A course in the theory of statistical inference, such as STAT GU4204/GR5204 a course in statistical modeling and data analysis such as STAT GU4205/GR5205.
Prerequisites: STAT GR5241 This course covers some advanced topics in machine learning and has an emphasis on applications to real world data. A major part of this course is a course project which consists of an in-class presentation and a written project report.
Prerequisites: Pre-requisite for this course includes working knowledge in Statistics and Probability, data mining, statistical modeling and machine learning. Prior programming experience in R or Python is required. This course will incorporate knowledge and skills covered in a statistical curriculum with topics and projects in data science. Programming will covered using existing tools in R. Computing best practices will be taught using test-driven development, version control, and collaboration. Students finish the class with a portfolio of projects, and deeper understanding of several core statistical/machine-learning algorithms. Short project cycles throughout the semester provide students extensive hands-on experience with various data-driven applications.
This course is an optional companion lab course for GR5242 Advanced Machine Learning. The aim of this course is to help students acquire the basic computational skills in a python-based Deep Learning library (such as Troch, TensorFlow) to implement deep learning models. lab class materials will be aligned closely with the topics covered in GR5242. Google Colab will be used as the main tools for the hands-on lab exercises.
Prerequisites: STAT GR5204 or the equivalent. STAT GR5205 is recommended. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.
Available to SSP, SMP Modeling and inference for random processes, from natural sciences to finance and economics. ARMA, ARCH, GARCH and nonlinear models, parameter estimation, prediction and filtering.
Prerequisites: STAT GR5203 or the equivalent. Basics of continuous-time stochastic processes. Wiener processes. Stochastic integrals. Ito's formula, stochastic calculus. Stochastic exponentials and Girsanov's theorem. Gaussian processes. Stochastic differential equations. Additional topics as time permits.
Prerequisites: W4315 and either another statistics course numbered above the 4200 or permission of instructor. Required for the major in statistics. Data analysis using a computer statistical package and selected exploratory data analysis subroutines. Topics include editing of data for errors, exploratory and standard techniques for one-way analysis of variance, linear regression, and two-way analysis of variance. Material is presented in case-study format.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
The course aims to teach MA in Statistics students how to manage their careers and develop professionally. Topics include resume and cover-letter writing, negotiation, mentoring, interviewing skills and communication across global teams. Top professionals from across the globe speak to students and help improve leadership skills.
This course is intended to provide a mechanism to MA students in Statistics who undertake on-campus project work or research. The course may be signed up with a faculty member from the Department of Statistics for academic credit. Students seeking to enroll in the course should identify an on-campus project and a congenial faculty member whose research is appealing to them, and who are able to serve as their mentor. Students should then submit an application to enroll in this course, which will be reviewed and approved by the Faculty Director of the MA in Statistics program.
Prerequisites: GR5203; GR5204 &GR5205 and at least 4 approved electives This course is an elective course for students in the M.A. in Statistics program that counts towards the degree requirements. To receive a grade and academic credits for this course, students are expected to engage in approved off-campus internships that can be counted as an elective. Statistical Fieldwork should provide students an opportunity to apply their statistical skills and gain practical knowledge on how statistics can be applied to solve real-world challenges.
This course covers the following topics: Fundamentals of probability theory and statistical inference used in data science; Probabilistic models, random variables, useful distributions, expectations, law of large numbers, central limit theorem; Statistical inference; point and confidence interval estimation, hypothesis tests, linear regression.
This course is covers the following topics: fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification.
Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), STAT GR5701 or equivalent, and familiarity with a programming language (e.g. R, Python) for statistical data analysis. In this course, we will systematically cover fundamentals of statistical inference and modeling, with special attention to models and methods that address practical data issues. The course will be focused on inference and modeling approaches such as the EM algorithm, MCMC methods and Bayesian modeling, linear regression models, generalized linear regression models, nonparametric regressions, and statistical computing. In addition, the course will provide introduction to statistical methods and modeling that addresses various practical issues such as design of experiments, analysis of time-dependent data, missing values, etc. Throughpout the course, real-data examples will be used in lecture discussion and homework problems. This course lays the statistical foundation for inference and modeling using data, preparing the MS in Data Science students, for other courses in machine learning, data mining and visualization.
First semester of the doctoral program sequence in applied statistics.
Prerequisites: STAT GR6102 Modern Bayesian methods offer an amazing toolbox for solving science and engineering problems. We will go through the book Bayesian Data Analysis and do applied statistical modeling using Stan, using R (or Python or Julia if you prefer) to preprocess the data and postprocess the analysis. We will also discuss the relevant theory and get to open questions in model building, computing, evaluation, and expansion. The course is intended for students who want to do applied statistics and also those who are interested in working on statistics research problems.
Prerequisites: STAT GR6102 or instructor permission. The Deparatments doctoral student consulting practicum. Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor.
We will learn about and get practice in several aspects of statistical communication, including teaching, writing, collaboration, programming, data display, and visualization of statistical models. After taking this class, you should be able to effectively communicate quantitative information and ideas.
Prerequisites: students in a masters program must seek the director of the M.A. program in statistics' permission; students in an undergraduate program must seek the director of undergraduate studies in statistics' permission. A general introduction to mathematical statistics and statistical decision theory. Elementary decision theory, Bayes inference, Neyman-Pearson theory, hypothesis testing, most powerful unbiased tests, confidence sets. Estimation: methods, theory, and asymptotic properties. Likelihood ratio tests, multivariate distribution. Elements of general linear hypothesis, invariance, nonparametric methods, sequential analysis.
Prerequisites: STAT G6201 and STAT G6201 This course will mainly focus on nonparametric methods in statistics. A tentavie list of topics to be covered include nonparametric density and regression function estimation -- upper bounds on the risk of kernel estimators and matching lower bounds on the minimax risk, reproducing kernel Hilbert spaces, bootstrap and resampling methods, multiple hypothesis testing, and high dimensional stastistical analysis.
Prerequisites: A thorough knowledge of elementary real analysis and some previous knowledge of probability. Overview of measure and integration theory. Probability spaces and measures, random variables and distribution functions. Independence, Borel-Cantelli lemma, zero-one laws. Expectation, uniform integrability, sums of independent random variables, stopping times, Wald's equations, elementary renewal theorems. Laws of large numbers. Characteristic functions. Central limit problem; Lindeberg-Feller theorem, infinitely divisible and stable distributions. Cramer's theorem, introduction to large deviations. Law of the iterated logarithm, Brownian motion, heat equation.
Probabilistic Models and Machine Learning is a PhD-level course about how to design and use probability models. We study their mathematical properties, algorithms for computing with them, and applications to real problems. We study both the foundations and modern methods in this field. Our goals are to understand probabilistic modeling, to begin research that makes contributions to this field, and to develop good practices for building and applying probabilistic models.
Independent Study with Faculty Advisor must be registered for every semester after first academic year