2018 Fall CS5785 Cornell Tech

Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.

CS 2800 or equivalent plus experience programming with Python or Matlab, or permission of the instructor.

Prof. Nathan Kallus

Office hours: Tuesdays 1:30 PM - 2:30 PM, in Bloomberg Center 316

Andrew Bennett

Office hours: Wednesdays 3:30 PM - 4:30 PM and Thursdays 1:30 PM - 2:30 PM, in Bloomberg Center 375

Tuesdays and Thursdays, 10:55 AM - 12:10 PM, in Bloomberg Center 161

**Links:** CMS for homework submission, Slack for discussions.

**Required:**

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008.

**Recommended:**

L. Wasserman, All of Statistics, Springer, 2004.

G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.

Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin, Learning from Data, AMLBook, 2012.

P. Harrington, Machine Learning in Action, Manning, 2012.

H. Daumé III, A Course in Machine Learning, v0.8.

**Grade Breakdown:**Your grade will be determined by the assignments (30%), one prelim (30%), a final exam (30%), and participation including scribing and in-class quizzes (10%).**Homework:**There will be four assignments and an “assignment 0” for environment setup. Each assignment will have a due date for completion. Half of the points of the lowest-scoring assignment will count as extra credit, meaning the points received for homeworks 1, 2, 3, and 4 is calculated as (sum of scores) / 3.5.**Late Policy:**Each student has a total of**one**slip day that may be used without penalty.**External Code:**Unless otherwise specified, you are allowed to use well known libraries such as*scikit-learn, scikit-image, numpy, scipy,*etc. in the assignments. Any reference or copy of public code repositories should be properly cited in your submission (examples include*Github, Wikipedia, Blogs*). In some assignment cases, you are NOT allowed to use any of the libraries above, please refer to individual HW instructions for more details.**Collaboration:**You are encouraged (but not required) to work in groups of no more than 2 students on each assignment. Please indicate the name of your collaborator at the top of each assignment and cite any references you used (including articles, books, code, websites, and personal communications). If you’re not sure whether to cite a source, err on the side of caution and cite it. You may submit just one writeup for the group. Remember not to plagiarize: all solutions must be written by members of the group.**Quizzes:**There will be surprise in-class quizzes to make sure you attend and pay attention to the class.**Prelim: October 30**in class. The exam is closed book but you are allowed to bring one sheet of written notes (Letter size, two-sided). You are allowed to use a calculator. The practice prelim is available here, and the slides used during the prelim review session are available here.**Final Exam: November 29 through December 10.**The final exam is take-home, open-internet, but must be done by your own group with thorough citations of all references used. The kaggle competition for the final exam is available here.