This course focuses on teaching students the fundamentals of learning the relationships between an output (response variable) and a set of inputs (predictors) in specific problems. The two major sub-fields covered are regression, where the response is quantitative, and classification, where the response is qualitative with labels such as diseased or non-diseased.
Another objective is to equip students with the skills to analyze real-life datasets and design appropriate learning functions to minimize prediction errors. This will be accomplished through computer exercises and a course project.
The course begins with a review of probability theory, including advanced topics like multivariate analysis with an emphasis on multinormal distributions. Basics of Linear Algebra will be introduced, covering important topics such as the four spaces of a matrix, Singular Value Decomposition (SVD), and its connection to Principal Component Analysis (PCA).
The course follows a breadth-first approach, starting with Statistical Decision Theory and transitioning into Linear Models and Regression. Bayes' classifier will be derived and explained, along with its application to multinormal distributions. Various statistical concepts will be defined, including estimation, loss function, and risk minimization.
This course is just Part 1 that prepares students for Part 2, which will cover basic methods for regression and classification will be covered at different levels of detail, such as Neural Networks (NN), K-Nearest Neighbor, logistic regression, and Classification and Regression Trees (CART), exploring different assessment metrics, such as Receiver Operating Characteristics Curve (ROC) and Area Under the Curve (AUC), and Cross Validation (CV).