CSE 5243: Introduction to Data Mining (SP20, Wed/Fri 9:35-10:55am, Caldwell Lab 171)

Instructor: Yu Su

Teaching Assistants: Jiaqi Xu (xu.1629)

Level and credits: U/G, 3

Prerequisites: Introduction to Databases, Introduction to Algorithms, or grad standing or permission of instructor

Office hours and locations (Instructor): Fri 11:00AM-12:15PM, Dreese Labs 783

Office hours and locations (TA): Wed 03:00pm-04:00pm, Baker 418

Description

Introduction to the knowledge discovery process, key data mining techniques, efficient high performance mining algorithms, exposure to applications of data mining.

Grading Plan (Note: All deadlines are 11:59PM (midnight) of the due dates. No late submissions!)

Textbooks

No required textbook. Recommended books for reading:

Academic Integrity Policy

Academic integrity is essential to maintaining an environment that fosters excellence in teaching, research, and other educational and scholarly activities. Thus, The Ohio State University and the Committee on Academic Misconduct (COAM) expect that all students have read and understand the University’s Code of Student Conduct, and that all students will complete all academic and scholarly assignments with fairness and honesty. Students must recognize that failure to follow the rules and guidelines established in the University’s Code of Student Conduct and this syllabus may constitute “Academic Misconduct.” For more info, click here.

Course Syllabus and Schedule (To be updated later)

Week Date Topic Assignment Out Assignment Due Lecture Notes
1 01/08 Class Outline Chapter 1 (Han et al.)
1 01/10 Introduction Chapter 1 (Han et al.)
2 01/15 Review of Basic Probability and Statistics Concepts Review of Probability Theory
2 01/17 Data & Data Preprocessing Assignment 1 Chapter 3 (Han et al.), Jupyter Notebook Tutorial
3 01/22 Data Preprocessing and Classification: Basic Concepts/Methods Chapter 8 (Han et al.)
3 01/24 Classification: Basic Concepts/Methods
4 01/29 Classification: Basic Concepts/Methods Assignment 1
4 01/31 Classification: Basic Concepts/Methods Assignment 2 20 Newsgroup Dataset - Training
5 02/05 Classification: Advanced Methods Chapter 9 (Han et al.)
5 02/07 Clustering: Basic Concepts/Methods Chapter 10 (Han et al.)
6 02/12 Clustering: Basic Concepts/Methods
6 02/14 Clustering: Basic Concepts/Methods
7 02/19 Clustering: Basic Concepts/Methods Assignment 2
7 02/21 Homework Discussion + Midterm Review
8 02/24 Assignment 3
8 02/26 Midterm Exam
8 02/28 Mining Frequent Patterns and Associations: Basic Concepts Chapter 6 (Han et al.)
9 03/04 Mining Frequent Patterns and Associations: Basic Concepts
9 03/06 Mining Frequent Patterns and Associations: Basic Concepts
10 03/11 Spring Break
10 03/13 Spring Break
11 03/18 Sprint Break
11 03/20 Spring Break
12 03/25 Mining Frequent Patterns and Associations: Basic Methods Assignment 3
12 03/27 Mining Frequent Patterns and Associations: Advanced Methods Chapter 7 (Han et al.)
13 04/01 Mining Frequent Patterns and Associations: Advanced Methods
13 04/03 Introduction to Graphs
14 04/08 Word Embedding
14 04/10 Word Embedding
15 04/15 Advanced Word Embedding
15 04/17 Locality-Sensitive Hashing Chapter 4 (Lekovec et al.)
16 04/22 Locality-Sensitive Hashing
16 04/24 Review Session
17 04/27 Final Exam (10:00am-11:45am)

Course slides are partly adapted from similar courses offered by Prof. Jiawei Han in UIUC, Prof. Srinivasan Parthasarathy and Prof. Huan Sun in OSU, Prof. Yizhou Sun in UCLA, Prof. Yijun Zhao in Northeastern University and from books listed above.