CAP5778: Advanced Data Mining

Instructor: Peixiang Zhao

| Syllabus | Announcement | Schedule | Assignment | Resources |


Assignment Information


Project Information

  1. Similarity Search
    • A unified framework for string similarity search with edit-distance constraint. (VLDB Journal'16)
    • Adaptive Top-k Overlap Set Similarity Joins. (ICDE'20)
    • Indexing Metric Spaces for Exact Similarity Search. (ACM Survey'22)
    • Similarity query processing for high-dimensional data. (VLDB'20)
    • MinSearch: An Efficient Algorithm for Similarity Search under Edit Distance. (KDD'20)
    • A Two-Level Signature Scheme for Stable Set Similarity Join. (VLDB'23)
    • A scalable index for top-k subtree similarity queries. (SIGMOD'19)
    • Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. (CACM'08)
  2. Data Streams
    • What is Data Sketching, and Why Should I Care? (CACM'17)
    • Network Applications of Bloom Filters: A Survey (Internet Mathematics'03)
    • Mergeable Summaries (TODS'13)
    • Efficient Frequent Directions Algorithm for Sparse Matrices (KDD'16)
    • Cuckoo filter: Practically better than Bloom. (Context'14)
  3. PageRank
    • Estimating Single-Node PageRank in Õ (min{dt, √m}) Time (VLDB'23)
    • Efficient Algorithms for Personalized PageRank Computation: A Survey (TKDE'24)
    • Massively parallel algorithms for personalized pagerank (VLDB'21)
  4. Graph Embedding
    • DeepWalk - Online Learning of Social Representations (KDD'14)
    • LINE - Large-scale Information Network Embedding (WWW'15)
    • Node2vec - Scalable Feature Learning for Networks (KDD'16)
  5. Generative Adversarial Nets (GAN)
    • Generative Adversarial Nets. NeurIPS'14
    • Wasserstein GAN. Arxiv'17
    • Are GANs Created Equal? NeurIPS'18