A friendly introduction to statistical concepts and reasoning with emphasis on developing statistical intuition rather than on mathematical rigor. Topics include design of experiments, descriptive statistics, correlation and regression, probability, chance variability, sampling, chance models, and tests of significance.
The advent of large scale data collection and the computer power to analyze the data has led to the emergence of a new discipline known as Data Science. Data Scientists in all sectors analyze data to derive business insights, find solutions to societal challenges, and predict outcomes with potentially high impact. The goal of this course is to provide the student with a rigorous understanding of the statistical thinking behind the fundamental techniques of statistical analysis used by data scientists. The student will learn how to apply these techniques to data, understand why they work and how to use the analysis results to make informed decisions. The student will gain this understanding in the classroom and through the analysis of real-world data in the lab using the programming language Python. The student will learn the fundamentals of Python and how to write and run code to apply the statistical concepts taught in the classroom.
Prerequisites: intermediate high school algebra. Designed for students in fields that emphasize quantitative methods. Graphical and numerical summaries, probability, theory of sampling distributions, linear regression, analysis of variance, confidence intervals and hypothesis testing. Quantitative reasoning and data analysis. Practical experience with statistical software. Illustrations are taken from a variety of fields. Data-collection/analysis project with emphasis on study designs is part of the coursework requirement.
Prerequisites: one semester of calculus. Designed for students who desire a strong grounding in statistical concepts with a greater degree of mathematical rigor than in STAT W1111. Random variables, probability distributions, pdf, cdf, mean, variance, correlation, conditional distribution, conditional mean and conditional variance, law of iterated expectations, normal, chi-square, F and t distributions, law of large numbers, central limit theorem, parameter estimation, unbiasedness, consistency, efficiency, hypothesis testing, p-value, confidence intervals, maximum likelihood estimation. Serves as the pre-requisite for ECON W3412.
Corequisites: An introductory course in statistic (STAT UN1101 is recommended). This course is an introduction to R programming. After learning basic programming component, such as defining variables and vectors, and learning different data structures in R, students will, via project-based assignments, study more advanced topics, such as conditionals, modular programming, and data visualization. Students will also learn the fundamental concepts in computational complexity, and will practice writing reports based on their data analyses.
Prerequisites: An introductory course in statistics (STAT UN1101 is recommended). Students without programming experience in R might find STAT UN2102 very helpful. Develops critical thinking and data analysis skills for regression analysis in science and policy settings. Simple and multiple linear regression, non-linear and logistic models, random-effects models. Implementation in a statistical package. Emphasis on real-world examples and on planning, proposing, implementing, and reporting.
Prerequisites: STAT UN2103 is strongly recommended. Students without programming experience in R might find STAT UN2102 very helpful. This course covers statistical models amd methods for analyzing and drawing inferences for problems involving categofical data. The goals are familiarity and understanding of a substantial and integrated body of statistical methods that are used for such problems, experience in anlyzing data using these methods, and profficiency in communicating the results of such methods, and the ability to critically evaluate the use of such methods. Topics include binomial proportions, two-way and three-way contingency tables, logistic regression, log-linear models for large multi-way contingency tables, graphical methods. The statistical package R will be used.
Prerequisites: STAT UN2103. Students without programming experience in R might find STAT UN2102 very helpful. This course is a machine learning class from an application perspective. We will cover topics including data-based prediction, classification, specific classification methods (such as logistic regression and random forests), and basics of neural networks. Programming in homeworks will require R.
Prerequisites: the project mentors permission. This course provides a mechanism for students who undertake research with a faculty member from the Department of Statistics to receive academic credit. Students seeking research opportunities should be proactive and entrepreneurial: identify congenial faculty whose research is appealing, let them know of your interest and your background and skills.
Prerequisites: the project mentors permission. This course provides a mechanism for students who undertake research with a faculty member from the Department of Statistics to receive academic credit. Students seeking research opportunities should be proactive and entrepreneurial: identify congenial faculty whose research is appealing, let them know of your interest and your background and skills.
Topics in Modern Statistics that provide undergraduate students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field. Courses listed are reviewed and approved by the Undergraduate Advisory Committee of the Department of Statistics. A good working knowledge of basic statistical concepts (likelihood,
Bayes' rule, Poisson processes, Markov chains, Gaussian random vectors), including especially linear-algebraic concepts related to regression and principal components analysis, is necessary. No previous experience with neural data is required.
Prerequisites: Calculus through multiple integration and infinite sums. A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, conditioning, expectations, law of large numbers, central limit theorem, point and confidence interval estimation, hypothesis tests, linear regression. This course replaces SIEO 4150.
Prerequisites: Calculus through multiple integration and infinite sums. A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, conditioning, expectations, law of large numbers, central limit theorem, point and confidence interval estimation, hypothesis tests, linear regression. This course replaces SIEO 4150.
Prerequisites: At least one semester, and preferably two, of calculus. An introductory course (STAT UN1201, preferably) is strongly recommended. A calculus-based introduction to probability theory. A quick review of multivariate calculus is provided. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.
Prerequisites: At least one semester, and preferably two, of calculus. An introductory course (STAT UN1201, preferably) is strongly recommended. A calculus-based introduction to probability theory. A quick review of multivariate calculus is provided. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.
Prerequisites: STAT GU4203. At least one semester of calculus is required; two or three semesters are strongly recommended. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GU4203. At least one semester of calculus is required; two or three semesters are strongly recommended. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GU4204 or the equivalent, and a course in linear algebra. Theory and practice of regression analysis. Simple and multiple regression, testing, estimation, prediction, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyse data.
Prerequisites: STAT GU4204 and GU4205 or the equivalent. Introduction to programming in the R statistical package: functions, objects, data structures, flow control, input and output, debugging, logical design, and abstraction. Writing code for numerical and graphical statistical analyses. Writing maintainable code and testing, stochastic simulations, paralleizing data analyses, and working with large data sets. Examples from data science will be used for demonstration.
Prerequisites: STAT GU4203 and two, preferably three, semesters of calculus. Review of elements of probability theory. Poisson processes. Renewal theory. Walds equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.
Prerequisites: STAT GU4203 and two, preferably three, semesters of calculus. Review of elements of probability theory. Poisson processes. Renewal theory. Walds equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.
Prerequisites: STAT GU4205 or the equivalent. Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.
Prerequisites: STAT GU4204 or the equivalent. Statistical inference without parametric model assumption. Hypothesis testing using ranks, permutations, and order statistics. Nonparametric analogs of analysis of variance. Non-parametric regression, smoothing and model selection.
This course introduces the Bayesian paradigm for statistical inference. Topics covered include prior and posterior distributions: conjugate priors, informative and non-informative priors; one- and two-sample problems; models for normal data, models for binary data, Bayesian linear models; Bayesian computation: MCMC algorithms, the Gibbs sampler; hierarchical models; hypothesis testing, Bayes factors, model selection; use of statistical software.
Prerequisites: A course in the theory of statistical inference, such as STAT GU4204 a course in statistical modeling and data analysis, such as STAT GU4205.
Prerequisites: STAT GU4204 or the equivalent. Introductory course on the design and analysis of sample surveys. How sample surveys are conducted, why the designs are used, how to analyze survey results, and how to derive from first principles the standard results and their generalizations. Examples from public health, social work, opinion polling, and other topics of interest.
Prerequisites: STAT GU4206. The course will provide an introduction to Machine Learning and its core models and algorithms. The aim of the course is to provide students of statistics with detailed knowledge of how Machine Learning methods work and how statistical models can be brought to bear in computer systems - not only to analyze large data sets, but to let computers perform tasks that traditional methods of computer science are unable to address. Examples range from speech recognition and text analysis through bioinformatics and medical diagnosis. This course provides a first introduction to the statistical methods and mathematical concepts which make such technologies possible.
Prerequisites: Pre-requisite for this course includes working knowledge in Statistics and Probability, data mining, statistical modeling and machine learning. Prior programming experience in R or Python is required. This course will incorporate knowledge and skills covered in a statistical curriculum with topics and projects in data science. Programming will be covered using existing tools in R. Computing best practices will be taught using test-driven development, version control, and collaboration. Students finish the class with a portfolio of projects, and deeper understanding of several core statistical/machine-learning algorithms. Short project cycles throughout the semester provide students extensive hands-on experience with various data-driven applications.
Prerequisites: STAT GU4205 or the equivalent. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.
Prerequisites: STAT GU4203. STAT GU4207 is recommended. Basics of continuous-time stochastic processes. Wiener processes. Stochastic integrals. Ito's formula, stochastic calculus. Stochastic exponentials and Girsanov's theorem. Gaussian processes. Stochastic differential equations. Additional topics as time permits.
Prerequisites: STAT GU4264. Mathematical theory and probabilistic tools for modeling and analyzing security markets are developed. Pricing options in complete and incomplete markets, equivalent martingale measures, utility maximization, term structure of interest rates. This is a core course in the MS program in mathematical finance.
Prerequisites: STAT GU4205 and at least one statistics course numbered between GU4221 and GU4261. This is a course on getting the most out of data. The emphasis will be on hands-on experience, involving case studies with real data and using common statistical packages. The course covers, at a very high level, exploratory data analysis, model formulation, goodness of fit testing, and other standard and non-standard statistical procedures, including linear regression, analysis of variance, nonlinear regression, generalized linear models, survival analysis, time series analysis, and modern regression methods. Students will be expected to propose a data set of their choice for use as case study material.
Prerequisites: At least one semester of calculus. A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markovs inequality.
Prerequisites: STAT GR5203 or the equivalent, and two semesters of calculus. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals, maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares and analysis of variance.
Prerequisites: STAT GR5203 and GR5204 or the equivalent. Theory and practice of regression analysis, Simple and multiple regression, including testing, estimation, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyse data.
Corequisites: STAT GR5204 and GR5205 or the equivalent. Introduction to programming in the R statistical package: functions, objects, data structures, flow control, input and output, debugging, logical design, and abstraction. Writing code for numerical and graphical statistical analyses. Writing maintainable code and testing, stochastic simulations, paralleizing data analyses, and working with large data sets. Examples from data science will be used for demonstration.
Corequisites: GR5203 or the equivalent. Review of elements of probability theory. Poisson processes. Renewal theory. Walds equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.
Prerequisites: STAT GR5205 Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.
Prerequisites: STAT GR5205 Statistical inference without parametric model assumption. Hypothesis testing using ranks, permutations, and order statistics. Nonparametric analogs of analysis of variance. Non-parametric regression, smoothing and model selection.
This course introduces the Bayesian paradigm for statistical inference. Topics covered include prior and posterior distributions: conjugate priors, informative and non-informative priors; one- and two-sample problems; models for normal data, models for binary data, Bayesian linear models, Bayesian computation: MCMC algorithms, the Gibbs sampler; hierarchical models; hypothesis testing, Bayes factors, model selection; use of statistical software.
Prerequisites: A course in the theory of statistical inference, such as STAT GU4204/GR5204 a course in statistical modeling and data analysis such as STAT GU4205/GR5205.
Prerequisites: STAT GR5204 Introductory course on the design and analysis of sample surveys. How sample surveys are conducted, why the designs are used, how to analyze survey results, and how to derive from first principles the standard results and their generalizations. Examples from public health, social work, opinion polling, and other topics of interest.
Prerequisites: STAT GR5206 or the equivalent. The course will provide an introduction to Machine Learning and its core models and algorithms. The aim of the course is to provide students of statistics with detailed knowledge of how Machine Learning methods work and how statistical models can be brought to bear in computer systems - not only to analyze large data sets, but to let computers perform tasks that traditional methods of computer science are unable to address. Examples range from speech recognition and text analysis through bioinformatics and medical diagnosis. This course provides a first introduction to the statistical methods and mathematical concepts which make such technologies possible.
Prerequisites: Pre-requisite for this course includes working knowledge in Statistics and Probability, data mining, statistical modeling and machine learning. Prior programming experience in R or Python is required. This course will incorporate knowledge and skills covered in a statistical curriculum with topics and projects in data science. Programming will covered using existing tools in R. Computing best practices will be taught using test-driven development, version control, and collaboration. Students finish the class with a portfolio of projects, and deeper understanding of several core statistical/machine-learning algorithms. Short project cycles throughout the semester provide students extensive hands-on experience with various data-driven applications.
Prerequisites: STAT GR5204 or the equivalent. STAT GR5205 is recommended. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.
Prerequisites: STAT GR5203 or the equivalent. Basics of continuous-time stochastic processes. Wiener processes. Stochastic integrals. Ito's formula, stochastic calculus. Stochastic exponentials and Girsanov's theorem. Gaussian processes. Stochastic differential equations. Additional topics as time permits.
Prerequisites: STAT GR5264 Available to SSP, SMP. Mathematical theory and probabilistic tools for modeling and analyzing security markets are developed. Pricing options in complete and incomplete markets, equivalent martingale measures, utility maximization, term structure of interest rates.
Prerequisites: W4315 and either another statistics course numbered above the 4200 or permission of instructor. Required for the major in statistics. Data analysis using a computer statistical package and selected exploratory data analysis subroutines. Topics include editing of data for errors, exploratory and standard techniques for one-way analysis of variance, linear regression, and two-way analysis of variance. Material is presented in case-study format.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
Topics in Modern Statistics will provide MA Statistics students with an opportunity to study a specialized area of statistics in more depth and to meet the educational needs of a rapidly growing field.
This course is intended to provide a mechanism to MA students in Statistics who undertake on-campus project work or research. The course may be signed up with a faculty member from the Department of Statistics for academic credit. Students seeking to enroll in the course should identify an on-campus project and a congenial faculty member whose research is appealing to them, and who are able to serve as their mentor. Students should then submit an application to enroll in this course, which will be reviewed and approved by the Faculty Director of the MA in Statistics program.
Prerequisites: GR5203; GR5204 &GR5205 and at least 4 approved electives This course is an elective course for students in the M.A. in Statistics program that counts towards the degree requirements. To receive a grade and academic credits for this course, students are expected to engage in approved off-campus internships that can be counted as an elective. Statistical Fieldwork should provide students an opportunity to apply their statistical skills and gain practical knowledge on how statistics can be applied to solve real-world challenges.
Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), STAT GR5701 or equivalent, and familiarity with a programming language (e.g. R, Python) for statistical data analysis. In this course, we will systematically cover fundamentals of statistical inference and modeling, with special attention to models and methods that address practical data issues. The course will be focused on inference and modeling approaches such as the EM algorithm, MCMC methods and Bayesian modeling, linear regression models, generalized linear regression models, nonparametric regressions, and statistical computing. In addition, the course will provide introduction to statistical methods and modeling that addresses various practical issues such as design of experiments, analysis of time-dependent data, missing values, etc. Throughpout the course, real-data examples will be used in lecture discussion and homework problems. This course lays the statistical foundation for inference and modeling using data, preparing the MS in Data Science students, for other courses in machine learning, data mining and visualization.
This is only recitation for STAT GR5703. We are requesting 3 sections of recitation to align with the one section of 5703 being offered.
Prerequisites: STAT GR6101 Continuation of STAT GR6101.
Prerequisites: STAT GR6102 or instructor permission. The Deparatments doctoral student consulting practicum. Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor.
Prerequisites: STAT GR6201 Continuation of STAT G6201
Prerequisites: STAT GR6301. Conditional distributions and expectations. Martingales; inequalities, convergence and closure properties, optimal stopping theorems, Burkholder-Gundy inequalities, Doob-Meyer decomposition, stochastic integration, Itos rule. Brownian motion: construction, invariance principles and random walks, study of sample paths, martingale representation results Girsanov Theorem. The heat equation, Feynman-Kac formula. Dirichlet problem, connections with potential theory. Introduction to Markov processes: semigroups and infinitesimal generators, diffusions, stochastic differential equations.
Independent Study with Faculty Advisor must be registered for every semester after first academic year