Mathematics of Big Data

Readings should be done before class. All resources (including lecture slides, homework, starter files, hw solution, articles) can be found under the Resources tab.

The topics to cover and the readings to be assigned are subject to change.

<
Date Topics Homework
Supervised Learning
Jan 27
Introduction to Big Data
Linear Regression
Normal Equations and Optimization Techniques
Linear Algebra Review
Covariance Matrix
Read:
Murphy 1.{all}
Murphy, 7.{1,...,5}
Feb 3 Gaussian Distribution
Linear Regression (Probabilitic Approach)
Gradient Descent
Newton's Methods
Logistic Regression
Exponential Family
Generalized Linear Models
Read:
Murphy, 8.{1,2,3,5} \ 8.{3.4,3.5},
9.{1,2.2,2.4,3}

Due:
Homework 1
Brainstorm for midterm project
Feb 10 Probability Review
Generalized Linear Models continued
Poisson Regression
Softmax Regression
Covariance matrix
Multivariate Gaussian Distribution
Marginalized Gaussian and the Schur Complement
Read:
Murphy 9.7, 4.{1,2,3,4,5,6} (important background)

Due:
Homework 2
Project Proposal (<1 page)
Feb 17 Dimensionality Reduction
Spectral Decomposition
Singular Value Decomposition
Principal Component Analysis
Generative Learning Algorithms
Gaussian Discriminant Analysis
Cholesky Decomposition
Due:
Final Project Proposal
Homework 3
Feb 24 Naive Bayes
L1 Regularization and Sparsity
Lasso
Support Vector Machines
Kernels
Read:
Murphy 14.{1,2,3,4} \ 14.{4.4}
MapReduce: Simplified Data Processing on Large Clusters

Due:
Homework 4
Unsupervised Learning
Mar 2
Introduction to Unsupervised Learning
Clustering
K-Means
Mixture of Gaussians
Jensen's inequality
Expectation-Maximization (EM) Algorithm
Read:
Murphy 11.{1,2,3,4} \ 11.{4.6,4.9}
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Random Features for Large-Scale Kernel Machines

Due:
Homework 5
Mar 9 Summary of EM Algorithm
EM for MAP estimation
Kernel PCA
One Class Support Vector Machines
Learning Theory
Read:
Murphy 12.2.{0,1,2,3} 14.4.4
Support Vector Method for Novelty Detection

Due:
Homework 6
Midterm Project Work
Mar 23
Work on your midterm projects.
Read:
None
Due:
None
Midterm Project Presentation
Mar 30
Be ready to present your midterm projects in class.
Read:
None
Due:
Midterm presentation and slides
Midterm Project Due (11:59 pm)
Mar 31
Your midterm projects must be sent to Prof. Gu via email by 11:59 pm. Your submission should include all relevant code and the .tex files for your essay.
Read:
None
Due:
Midterm project write-up.
Learning Theory
Apr 6
Bayesian Learning
Bayesian Logistic and Linear Regressions (review)
Bayesian Inference
Intractable Integrals and Motivation for Approximate Methods
Learning Theory
Read:
Large-Scale Sparse Principal Component Analysis with Application to Text Data
On the Convergence Properties of the EM Algorithm

Due:
Homework 7
Recommender Systems
Apr 13
Introduction to Recommender Systems
Collaborative Filtering
Non-Negative Matrix Factorization
Using Non-Negative Matrix Factorization for Topic Modelling
Read:
Murphy 27.6.2
Netflix Update: Try This at Home

Due:
Homework 8
Graph Methods
Apr 20
Additional topics will be covered in a workshop from 7:00 to 9:45 pm. Read:
Murphy 10.{1,2,3,4,5,6}

Due:
Work on final
Apr 27
Read:


Due:
May 4 or 11 (TBD) Final Project Presentation (Mon. 7-9:50 pm) Due:
Final Project Presentation Slides
May 11 Final Project Due for non (Tue. 11:59 pm) Due:
Finish writing up final project