CAP5778: Advanced Data Mining (Spring 2022)
Instructor: Peixiang Zhao
| Syllabus | Announcement | Schedule | Assignment | Resources |
Assignment Information
- There will be four assignments, each of which is designed for testing your understanding of the taught materials. It could be either programming or written analysis.
- All students are expected to follow the FSU Academic Honor Code.
- All assignments follow the "no-late" policy; That is, assignments received after the due time will receive zero credit.
Project Information
- The semester-long project involves a systematic study for a data mining research topic, by reading and understanding scientific publications, and writing a survey-like summary for that topic;
- The project needs to be done individually;
- The deliverables include (1) Project proposal (1-to-2 page): 1.5 points; (2) Project presentation (15 minutes video): 4.5 points; (3) Project report (5-7 pages, single column, Latex-preparation preferred): 9 points.
- Some recommended topics (and readings) are as follows:
- Tree-based ensemble learning
- XGBoost: a Scalable Tree Boosting System. KDD'16
- LightGBM: a Highly Efficient Gradient Boosting Decision Tree. NeurIPS'17
- CatBoost: Unbiased Boosting with Categorical Features. NeurIPS'18
- Frequent graph pattern mining
- gSpan: Graph-based Substructure Pattern Mining. ICDM'03
- A Quickstart in Frequent Structure Mining Can Make a Difference. KDD'04
- Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs. SIGMOD'17
- Generative Adversarial Nets (GAN)
- Generative Adversarial Nets. NeurIPS'14
- Wasserstein GAN. Arxiv'17
- Are GANs Created Equal? NeurIPS'18
- Graph Embedding
- DeepWalk - Online Learning of Social Representations (KDD'14)
- LINE - Large-scale Information Network Embedding (WWW'15)
- Node2vec - Scalable Feature Learning for Networks (KDD'16)
- Data Sketching for Data Streams
- Mergeable Summaries (TODS'13)
- Efficient Frequent Directions Algorithm for Sparse Matrices (KDD'16)
- A high-performance algorithm for identifying frequent items in data streams (IMC'17)