Danica J. Sutherlandshe

Assistant Professor, UBC Computer Science Canada CIFAR AI Chair, Amii

UBC Machine Learning AML-TN MILD (ML theory) CAIDA (AI) PIHOT/Kantorovich Initiative (optimal transport)
Queer in AI Name Change Policy Working Group

dsuth[a t]cs.ubc.ca or djs[a t]djsutherland.ml
CV orcid github crossvalidated bsky mastodon

Prospective students: Like most North American schools, we only accept applications through the departmental process, deadline December 15th. There is no need to email me about admissions; due to the volume of emails, I will probably not reply.

Anyone with specific research connections / questions / etc should feel free to get in touch at any time, via email / Bluesky DM / whatever.

Trans and gender-expansive or other queer people, also please reach out whenever, about specific things or just saying hi. Consider using my personal email (the djsutherland.ml one) for privacy reasons; Bluesky DMs or Queer in AI's Slack are also good.

I was previously at TTIC (non-tenure-track faculty, affiliated with Nati Srebro), Gatsby (postdoc with Arthur Gretton), and CMU (PhD with Jeff Schneider).

Publications and selected talks are listed below.

You may come across various old items referring to me with a different first name. Please only use the name Danica to cite or refer to me, and check that your old .bib entries are correct, e.g. by replacing them with the entries here.

Group

Wonho Bae (PhD, 2020–)
Mohamad Bazzi (undergrad intern, 2025)
Zheng He (PhD, 2023–)
Bingshan Hu (joint postdoc, 2023–)
Yi (Joshua) Ren (PhD, 2020–)
Hamed Shirzad (PhD, 2021–)
Aaron Wei (BSc, 2024–)
Nathaniel Xu (PhD, 2023–)

Alumni:

Achinth Bharadwaj (BSc 2023)
Namrata Deka (MSc 2023)
Anubhav Garg (course MSc, 2024)
Milad Jalali Asadabadi (MSc 2023)
Arsh Jhaj (course MSc, 2023)
Mohamad Amin Mohamadi (MSc 2023)

Courses

CPSC 532D Modern Statistical Learning Theory: Fall 25, Fall 24, Fall 23, Fall 22, Spring 22 (as 532S)
CPSC 440/550 Advanced Machine Learning: Spring 25, Spring 24, Spring 23 (as 440/540)
CPSC 340 Machine Learning and Data Mining: Fall 21 (with Mike Gelbart)

Publications

Below, ** denotes equal contribution, and this colour one of my students. Also available as a .bib file, and most of these are on Semantic Scholar. If you must (but I'd rather you didn't), here's Google Scholar.

Coauthor filters: (show) (hide)

Kamil Adamczewski (2)
Michael Arbel (3)
Wonho Bae (9)
Mikołaj Bińkowski (2)
Namrata Deka (4)
Wenlong Deng (3)
Seth Flaxman (3)
Roman Garnett (2)
Arthur Gretton (9)
Shangmin Guo (3)
Milad Jalali Asadabadi (2)
Frederic Koehler (3)
Ho Chung Leon Law (2)
Muchen Li (2)
Xiaoxiao Li (5)
Yazhe Li (3)
Zhiyuan Li (2)
Honghao Lin (2)
Feng Liu (3)
Jie Lu (2)
Yifei Ma (2)
Mohamad Amin Mohamadi (4)
Jyunhug Noh (3)
Michelle Ntampaka (3)
Junier B. Oliva (3)
Gabriel Oliveira (2)
Mijung Park (3)
Roman Pogodin (3)
A Pranav (2)
Barnabás Póczos (9)
Organizers of QueerInAI (2)
Yi Ren (10)
Jeff Schneider (11)
Dino Sejdinovic (2)
Hamed Shirzad (6)
Luca Soldaini (2)
Nathan Srebro (5)
Heiko Strathmann (3)
Arjun Subramonian (2)
Christos Thrampoulidis (3)
Hy Trac (3)
Ameya Velingker (4)
Balaji Venkatachalam (4)
David P. Woodruff (3)
Lei Wu (2)
Liang Xiong (2)
Pan Xu (2)
Wenkai Xu (2)
Yilin Yang (2)
Lijia Zhou (4)

Preprints

Learning Representations for Independence Testing. Nathaniel Xu, Feng Liu, and Danica J. Sutherland. Preprint 2025.

[arXiv]

Efficient kernelized bandit algorithms via exploration distributions. Bingshan Hu, Zheng He, and Danica J. Sutherland. Preprint 2025.

[arXiv]

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization. Wenlong Deng, Yi Ren, Muchen Li, Danica J. Sutherland, Xiaoxiao Li, and Christos Thrampoulidis. Preprint 2025.

[arXiv]

Practical Kernel Tests of Conditional Independence. Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, and Arthur Gretton. Preprint 2024.

[arXiv]

Journal and Low-Acceptance-Rate Conference Papers

Uncertainty Herding: One Active Learning Method for All Label Budgets. Wonho Bae, Danica J. Sutherland**, and Gabriel Oliveira**. International Conference on Learning Representations (ICLR) 2025.

[official] [arXiv]

Learning Dynamics of LLM Finetuning. Yi Ren and Danica J. Sutherland. International Conference on Learning Representations (ICLR) 2025. Outstanding Paper Award.

[official] [arXiv]

Even Sparser Graph Transformers. Hamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker, David P. Woodruff, and Danica J. Sutherland. Neural Information Processing Systems (NeurIPS) 2024.

[arXiv]

Bias Amplification in Language Model Evolution: An Iterated Learning Perspective. Yi Ren, Shangmin Guo, Linlu Qiu, Bailin Wang, and Danica J. Sutherland. Neural Information Processing Systems (NeurIPS) 2024.

[arXiv]

Generalized Coverage for More Robust Low-Budget Active Learning. Wonho Bae, Jyunhug Noh, and Danica J. Sutherland. European Conference on Computer Vision (ECCV) 2024.

[arXiv]

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling. Wonho Bae, Jing Wang, and Danica J. Sutherland. European Conference on Computer Vision (ECCV) 2024.

[arXiv]

Differentially Private Neural Tangent Kernels (DP-NTK) for Privacy-Preserving Data Generation. Yilin Yang, Kamil Adamczewski, Xiaoxiao Li, Danica J. Sutherland, and Mijung Park. Journal of Artificial Intelligence Research (JAIR) 2024.

[arXiv]

AdaFlood: Adaptive Flood Regularization. Wonho Bae, Yi Ren, Mohamad Osama Ahmed, Frederick Tung, Danica J. Sutherland, and Gabriel Oliveira. Transactions on Machine Learning Research (TMLR) 2024.

[official] [arXiv]

Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition. Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, and Danica J. Sutherland. International Conference on Machine Learning (ICML) 2024.

[arXiv]

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression. Lijia Zhou**, Frederic Koehler**, Danica J. Sutherland, and Nathan Srebro. ACM/IMS Journal of Data Science (JDS) 1, 2. 2024.

[arXiv]

Improving Compositional Generalization using Iterated Learning and Simplicial Embeddings. Yi Ren, Samuel Lavoie, Mikhail Galkin, Danica J. Sutherland, and Aaron Courville. Neural Information Processing Systems (NeurIPS) 2023.

[arXiv]

Exphormer: Scaling Graph Transformers with Expander Graphs. Hamed Shirzad**, Ameya Velingker**, Balaji Venkatachalam**, Danica J. Sutherland, and Ali Kemal Sinop. International Conference on Machine Learning (ICML) 2023.

[arXiv] [blog post]

A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel. Mohamad Amin Mohamadi, Wonho Bae, and Danica J. Sutherland. International Conference on Machine Learning (ICML) 2023.

[arXiv]

Queer in AI: A Case Study in Community-Led Participatory AI. Organizers of QueerInAI, Analeia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi J, Huan Zhang, Jaidev Shriram, Kruno Lehamn, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Melvin Selim Atay, Milind Agarwal, Nyx McLean, Pan Xu, A Pranav, Raj Korpan, Ruchira Ray, Sarah Mathew, Sarthak Arora, ST John, Tanvi Anand, Vishakha Agrawal, William Agnew, Yanan Long, Zijie J. Wang, Zeerak Talat, Avijit Ghosh, Nathaniel Dennler, Michael Noseworthy, Sharvani Jha, Emi Baylor, Aditya Joshi, Natalia Y. Bilenko, Andrew McNamara, Raphael Gontijo-Lopes, Alex Markham, Evyn Dǒng, Jackie Kay, Manu Saraswat, Nikhil Vytla, and Luke Stark. ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2023. Best Paper award.

[arXiv]

Efficient Conditionally Invariant Representation Learning. Roman Pogodin**, Namrata Deka**, Yazhe Li**, Danica J. Sutherland, Victor Veitch, and Arthur Gretton. International Conference on Learning Representations (ICLR) 2023. Selected as notable (top 5%), i.e. as an oral.

[official] [arXiv]

How to prepare your task head for finetuning. Yi Ren, Shangmin Guo, Wonho Bae, and Danica J. Sutherland. International Conference on Learning Representations (ICLR) 2023.

[official] [arXiv]

MMD-B-Fair: Learning Fair Representations with Statistical Testing. Namrata Deka and Danica J. Sutherland. Artificial Intelligence and Statistics (AISTATS) 2023.

[arXiv]

Pre-trained Perceptual Features Improve Differentially Private Image Generation. Frederik Harder, Milad Jalali Asadabadi, Danica J. Sutherland, and Mijung Park. Transactions on Machine Learning Research (TMLR) 2023.

[official] [arXiv]

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models. Lijia Zhou**, Frederic Koehler**, Pragya Sur, Danica J. Sutherland, and Nathan Srebro. Neural Information Processing Systems (NeurIPS) 2022.

[arXiv]

Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels. Mohamad Amin Mohamadi**, Wonho Bae**, and Danica J. Sutherland. Neural Information Processing Systems (NeurIPS) 2022.

[arXiv]

Evaluating Graph Generative Models with Contrastively Learned Features. Hamed Shirzad, Kaveh Hassani, and Danica J. Sutherland. Neural Information Processing Systems (NeurIPS) 2022.

[arXiv]

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection. Jinhwan Seo, Wonho Bae, Danica J. Sutherland, Jyunhug Noh, and Daijin Kim. European Conference on Computer Vision (ECCV) 2022.

[arXiv] [code]

One Weird Trick to Improve Your Semi-Weakly Supervised Semantic Segmentation Model. Wonho Bae, Jyunhug Noh, Milad Jalali Asadabadi, and Danica J. Sutherland. International Joint Conference on Artificial Intelligence (IJCAI) 2022.

[arXiv]

Better Supervisory Signals by Observing Learning Paths. Yi Ren, Shangmin Guo, and Danica J. Sutherland. International Conference on Learning Representations (ICLR) 2022.

[official] [arXiv]

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting. Frederic Koehler**, Lijia Zhou**, Danica J. Sutherland, and Nathan Srebro. Neural Information Processing Systems (NeurIPS) 2021. Selected for oral presentation.

[official] [arXiv] [NYU talk]

Self-Supervised Learning with Kernel Dependence Maximization. Yazhe Li**, Roman Pogodin**, Danica J. Sutherland, and Arthur Gretton. Neural Information Processing Systems (NeurIPS) 2021.

[official] [arXiv]

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data. Feng Liu**, Wenkai Xu**, Jie Lu, and Danica J. Sutherland. Neural Information Processing Systems (NeurIPS) 2021.

[official] [arXiv]

POT: Python Optimal Transport. Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong, and Titouan Vayer. Journal of Machine Learning Research (JMLR) 2021.

[official]

Does Invariant Risk Minimization Capture Invariance? Pritish Kamath, Akilesh Tangella, Danica J. Sutherland, and Nathan Srebro. Artificial Intelligence and Statistics (AISTATS) 2021. Selected for oral presentation.

[official] [arXiv]

On Uniform Convergence and Low-Norm Interpolation Learning. Lijia Zhou, Danica J. Sutherland, and Nathan Srebro. Neural Information Processing Systems (NeurIPS) 2020. Selected for spotlight presentation.

[official] [arXiv] [Penn State talk] [NYU talk]

Learning Deep Kernels for Non-Parametric Two-Sample Tests. Feng Liu**, Wenkai Xu**, Jie Lu, Guangquan Zhang, Arthur Gretton, and Danica J. Sutherland. International Conference on Machine Learning (ICML) 2020.

[official] [arXiv] [video and slides]

Learning deep kernels for exponential family densities. Li Wenliang**, Danica J. Sutherland**, Heiko Strathmann, and Arthur Gretton. International Conference on Machine Learning (ICML) 2019.

[official] [arXiv] [poster] [slides]

On gradient regularizers for MMD GANs. Michael Arbel**, Danica J. Sutherland**, Mikołaj Bińkowski, and Arthur Gretton. Neural Information Processing Systems (NeurIPS) 2018.

[official] [arXiv] [poster] [code] [UMass talk]

Demystifying MMD GANs. Mikołaj Bińkowski**, Danica J. Sutherland**, Michael Arbel, and Arthur Gretton. International Conference on Learning Representations (ICLR) 2018.

[official] [arXiv] [poster] [code] [UMass talk]

Efficient and principled score estimation with Nyström kernel exponential families. Danica J. Sutherland**, Heiko Strathmann**, Michael Arbel, and Arthur Gretton. Artificial Intelligence and Statistics (AISTATS) 2018. Selected for oral presentation.

[official] [arXiv] [slides] [poster] [code] [Turing talk] [UCL talk] [Gatsby Tri-Center talk]

Bayesian Approaches to Distribution Regression. Ho Chung Leon Law**, Danica J. Sutherland**, Dino Sejdinovic, and Seth Flaxman. Artificial Intelligence and Statistics (AISTATS) 2018.

[official] [arXiv] [poster] [code] [NeurIPS workshop version]

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy. Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, and Arthur Gretton. International Conference on Learning Representations (ICLR) 2017.

[official] [arXiv] [code] [poster] [ICML workshop talk] [DALI talk] [Oxford talk]

Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning. Michelle Ntampaka, Hy Trac, Danica J. Sutherland, Sebastian Fromenteau, Barnabás Póczos, and Jeff Schneider. The Astrophysical Journal (ApJ) 831, 2, 135. 2016.

[doi] [arXiv]

Linear-time Learning on Distributions with Approximate Kernel Embeddings. Danica J. Sutherland**, Junier B. Oliva**, Barnabás Póczos, and Jeff Schneider. AAAI Conference on Artificial Intelligence (AAAI) 2016.

[official] [arXiv] [poster] [NeurIPS workshop version]

On the Error of Random Fourier Features. Danica J. Sutherland and Jeff Schneider. Uncertainty in Artificial Intelligence (UAI) 2015. Chapter 3 / Section 4.1 of my thesis supersedes this paper, fixing a few errors in constants and providing more results.

[official] [arXiv] [poster]

Active Pointillistic Pattern Search. Yifei Ma**, Danica J. Sutherland**, Roman Garnett, and Jeff Schneider. Artificial Intelligence and Statistics (AISTATS) 2015.

[official] [pdf] [appendix] [poster] [NeurIPS workshop version]

A Machine Learning Approach for Dynamical Mass Measurements of Galaxy Clusters. Michelle Ntampaka, Hy Trac, Danica J. Sutherland, Nicholas Battaglia, Barnabás Póczos, and Jeff Schneider. The Astrophysical Journal (ApJ) 803, 2, 50. 2015.

[doi] [arXiv]

Active learning and search on low-rank matrices. Danica J. Sutherland, Barnabás Póczos, and Jeff Schneider. Knowledge Discovery and Data Mining (KDD) 2013. Selected for oral presentation.

[doi] [pdf] [poster] [slides] [code]

Nonparametric kernel estimators for image classification. Barnabás Póczos, Liang Xiong, Danica J. Sutherland, and Jeff Schneider. Computer Vision and Pattern Recognition (CVPR) 2012.

[doi] [pdf]

Managing User Requests with the Grand Unified Task System (GUTS). Andrew Stromme, Danica J. Sutherland, Alexander Burka, Benjamin Lipton, Nicholas Felt, Rebecca Roelofs, Daniel-Elia Feist-Alexandrov, Steve Dini, and Allen Welkie. Large Installation System Administration (LISA) 2012. Work done as part of the Swarthmore College Computer Society.

[official] [pdf] [slides] [talk]

Dissertations

Scalable, Flexible, and Active Learning on Distributions. Committee: Jeff Schneider, Barnabás Póczos, Maria-Florina Balcan, and Arthur Gretton. Computer Science Department, Carnegie Mellon University. Ph.D. thesis, 2016.

[pdf]

Integrating Human Knowledge into a Relational Learning System. Danica J. Sutherland. Computer Science Department, Swarthmore College. B.A. thesis, 2011.

[pdf]

Technical Reports, Posters, etc.

Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training. Wenlong Deng, Yi Ren, Danica J. Sutherland, Christos Thrampoulidis, and Xiaoxiao Li. AI for Math (AI4MATH), ICML 2025. Best Paper Award.

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization. Wenlong Deng, Yi Ren, Muchen Li, Danica J. Sutherland, Xiaoxiao Li, and Christos Thrampoulidis. AI for Math (AI4MATH), ICML 2025.

[full version]

Normalization Matters for Optimization Performance on Graph Neural Networks. Alan Milligan, Frederik Kunstner, Hamed Shirzad, Mark Schmidt, and Danica J. Sutherland. Optimization for Machine Learning (OPT), NeurIPS 2024.

A Theory for Compressibility of Graph Transformers for Transductive Learning. Hamed Shirzad, Honghao Lin, Ameya Velingker, Balaji Venkatachalam, David P. Woodruff, and Danica J. Sutherland. Machine Learning and Compression, NeurIPS 2024.

[arXiv]

Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics. Yi Ren and Danica J. Sutherland. Compositional Learning: Perspectives, Methods, and Paths Forward, NeurIPS 2024.

[arXiv]

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation. Yilin Yang, Kamil Adamczewski, Danica J. Sutherland, Xiaoxiao Li, and Mijung Park. Privacy-Preserving Artificial Intelligence (PPAI-24), AAAI 2024.

[full version]

Low-Width Approximations and Sparsification for Scaling Graph Transformers. Hamed Shirzad, Balaji Venkatachalam, Ameya Velingker, Danica J. Sutherland, and David P. Woodruff. New Frontiers in Graph Learning, NeurIPS 2023.

[full version]

Grokking modular arithmetic can be explained by margin maximization. Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, and Danica J. Sutherland. Mathematics of Modern Machine Learning, NeurIPS 2023.

[full version]

Learning Privacy-Preserving Deep Kernels with Known Demographics. Namrata Deka and Danica J. Sutherland. Privacy-Preserving Artificial Intelligence (PPAI-22), AAAI 2022.

[full version]

How to Make Virtual Conferences Queer-Friendly: A Guide. Organizers of QueerInAI, A Pranav, MaryLena Bleile, Arjun Subramonian, Luca Soldaini, Danica J. Sutherland, Sabine Weber, and Pan Xu. Workshop on Widening NLP (WiNLP), EMNLP 2021.

[web]

Unbiased estimators for the variance of MMD estimators. Danica J. Sutherland and Namrata Deka. Technical report 2019.

[arXiv]

The Role of Machine Learning in the Next Decade of Cosmology. Michelle Ntampaka, Camille Avestruz, Steven Boada, João Caldeira, Jessi Cisewski-Kehe, Rosanne Di Stefano, Cora Dvorkin, August E. Evrard, Arya Farahi, Doug Finkbeiner, Shy Genel, Alyssa Goodman, Andy Goulding, Shirley Ho, Arthur Kosowsky, Paul La Plante, François Lanusse, Michelle Lochner, Rachel Mandelbaum, Daisuke Nagai, Jeffrey A. Newman, Brian Nord, J. E. G. Peek, Austin Peel, Barnabás Póczos, Markus Michael Rau, Aneta Siemiginowska, Danica J. Sutherland, Hy Trac, and Benjamin Wandelt. White paper 2019.

[arXiv]

Bayesian Approaches to Distribution Regression. Ho Chung Leon Law**, Danica J. Sutherland**, Dino Sejdinovic, and Seth Flaxman. Learning on Distributions, Functions, Graphs and Groups, NeurIPS 2017. Selected for oral presentation.

[official] [full version] [slides] [poster]

Fixing an error in Caponnetto and de Vito (2007). Danica J. Sutherland. Technical report 2017.

[arXiv]

Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata. Seth Flaxman, Danica J. Sutherland, Yu-Xiang Wang, and Yee Whye Teh. Technical report 2016.

[arXiv] [code] [microdata analysis package]

List Mode Regression for Low Count Detection. Jay Jin, Kyle Miller, Danica J. Sutherland, Simon Labov, Karl Nelson, and Artur Dubrawski. IEEE Nuclear Science Symposium (IEEE NSS/MIC) 2016.

Linear-time Learning on Distributions with Approximate Kernel Embeddings. Danica J. Sutherland**, Junier B. Oliva**, Barnabás Póczos, and Jeff Schneider. Feature Extraction: Modern Questions and Challenges, NeurIPS 2015.

[full version] [pdf] [poster]

Deep Mean Maps. Junier B. Oliva**, Danica J. Sutherland**, Barnabás Póczos, and Jeff Schneider. Technical report 2015.

[arXiv]

Active Pointillistic Pattern Search. Yifei Ma**, Danica J. Sutherland**, Roman Garnett, and Jeff Schneider. Bayesian Optimization (BayesOpt), NeurIPS 2014.

[full version] [pdf] [poster]

Kernels on Sample Sets via Nonparametric Divergence Estimates. Danica J. Sutherland, Liang Xiong, Barnabás Póczos, and Jeff Schneider. Technical report 2012.

[arXiv]

Grounding Conceptual Knowledge with Spatio-Temporal Multi-Dimensional Relational Framework Trees. Matthew Bodenhamer, Thomas Palmer, Danica J. Sutherland, and Andrew H. Fagg. Technical report 2012.

[pdf]

Invited talks

Slides for conference and workshop talks directly for a paper are linked next to the paper above.

Local Learning Dynamics Help Explain (Post-)Training Behaviour. June 2025. University of Pennsylvania. Related papers: ICLR-25, Preprint 2025, ICLR-22, ICLR-23, NeurIPS-22, ICML-23.

[slides] [slides (pdf)]

Local Learning Dynamics Help Explain (Post-)Training Behaviour. June 2025. Machine Learning and Optimization seminar, Mila. Related papers: ICLR-25, Preprint 2025, ICLR-22, ICLR-23, NeurIPS-22, ICML-23.

[slides] [slides (pdf)]

Data-efficient learning, in general and in LLM preference tuning. February 2025. Snowflake. Related papers: ECCV-24, ICLR-25, ICLR-25.

[slides]

Expander Graphs and Low-Distortion Embeddings for Learning on Graphs. December 2024. Mathematics of Machine Learning, CMS. Related papers: ICML-23, NeurIPS-24, Machine Learning and Compression 2024.

[slides]

Scaling Graph Transformers with Sparse and Sparsified Attention. November 2024. Learning on Graphs Vancouver Meetup (LoG-Vancouver). Related papers: ICML-23, NeurIPS-24, Machine Learning and Compression 2024.

[slides]

Scaling Graph Transformers with Expander Graphs. June 2024. Simon Fraser University, AI seminar. Related papers: ICML-23, NeurIPS-24.

[slides]

Conditional independence measures for fairer, more reliable models. February 2024. Statistical Aspects of Trustworthy Machine Learning, Banff International Research Station. Related papers: ICLR-23, Preprint 2024.

[video] [slides]

Learning conditionally independent representations with kernel regularizers. June 2023. Lifting Inference with Kernel Embeddings (LIKE23), University of Bern. Related papers: ICLR-23.

[slides]

[Lecture] (Deep) Kernel Mean Embeddings for Representing and Learning on Distributions. June 2023. Lifting Inference with Kernel Embeddings (LIKE23), University of Bern.

[slides] [slides (pdf)]

Learning conditionally independent representations with kernel regularizers. June 2023. Gatsby25. Related papers: ICLR-23.

[slides]

Are these datasets different? Two-sample testing for data scientists. April 2023. Pacific Conference on Artificial Intelligence (PCA). Related papers: AISTATS-23, NeurIPS-21, ICML-20, ICLR-17.

[slides] [slides (pdf)]

A Defense of (Empirical) Neural Tangent Kernels. March 2023. AI Seminar, University of Michigan. Related papers: ICLR-22, ICLR-23, NeurIPS-22, ICML-23.

[slides] [slides (pdf)]

Post-Publication Name Change Policies, Why they Matter, and Whether they Work. March 2023. Robotics DEI Seminar, University of Michigan.

[slides] [slides (pdf)]

In Defence of (Empirical) Neural Tangent Kernels. March 2023. Microsoft Research Montréal. Related papers: ICLR-22, ICLR-23, NeurIPS-22, ICML-23.

[slides] [slides (pdf)]

[Lecture] Neural Tangent Kernels, Finite and Infinite. February 2023. Winter School on Deep Learning, Indian Statistical Institute.

[slides]

Name Change Policies: A Brief (Personal) Tour. November 2022. Queer in AI Workshop, NeurIPS.

[slides] [slides (pdf)]

[Lecture] Modern Kernel Methods in Machine Learning. October 2022. Research School on Uncertainty in Scientific Computing, Corsica (ETICS).

[slides 1] [slides 1 (pdf)] [slides 2] [slides 2 (pdf)] [practical materials]

Are These Datasets The Same? Learning Kernels for Efficient and Fair Two-sample Tests. April 2022. Toronto Womxn in Data Science Conference. Related papers: NeurIPS-21, ICML-20, ICLR-17.

[slides] [slides (pdf)]

Are These Datasets The Same? Learning Kernels for Efficient and Fair Two-sample Tests. February 2022. TrustML Young Scientist Seminars. Related papers: NeurIPS-21, ICML-20, ICLR-17.

[video] [slides] [slides (pdf)]

Better deep learning (sometimes) by learning kernel mean embeddings. January 2022. Lifting Inference with Kernel Embeddings (LIKE22), University of Bern. Related papers: NeurIPS-21, NeurIPS-21.

[slides] [slides (pdf)]

Can Uniform Convergence Explain Interpolation Learning? November 2021. NYU Center for Data Science Lunch Seminar Series. Related papers: NeurIPS-20, NeurIPS-21.

[slides] [slides (pdf)]

Deep kernel-based distances between distributions. January 2021. Kickoff Workshop, Pacific Interdisciplinary Hub on Optimal Transport. Related papers: ICML-20, ICLR-17, NeurIPS-18, ICLR-18.

[slides] [slides (pdf)]

[Lecture] Kernel Methods: From Basics to Modern Applications. January 2021. Data Science Summer School (DS3), École Polytechnique, Paris.

[slides] [slides (pdf)] [practical materials]

Can Uniform Convergence Explain Interpolation Learning? October 2020. Penn State, Statistics colloquium. Related papers: NeurIPS-20.

[slides] [slides (pdf)]

[Tutorial] Interpretable Comparison of Distributions and Models. December 2019. Neural Information Processing Systems (NeurIPS). Related papers: ICML-20, ICLR-17. With Arthur Gretton and Wittawat Jitkrittum.

[video] [slides 1] [slides 2] [slides 3]

Better GANs by Using Kernels. October 2019. University of Massachusetts Amherst, College of Information and Computer Sciences. Related papers: ICLR-18, NeurIPS-18.

[slides]

Kernel distances between distributions for generative models. July 2019. Distance Metrics and Mass Transfer Between High Dimensional Point Clouds, ICIAM. Related papers: ICLR-18, NeurIPS-18.

[slides]

[Lecture] Learning with Positive Definite Kernels: Theory, Algorithms and Applications. June 2019. Data Science Summer School (DS3), École Polytechnique, Paris. With Bharath Sriperumbudur.

[materials]

[Lecture] Introduction to Generative Adversarial Networks. June 2019. Machine Learning Crash Course (MLCC), University of Genoa.

[slides]

Kernel Distances for Better Deep Generative Models. September 2018. Advances in Kernel Methods, GPSS. Related papers: ICLR-18, NeurIPS-18.

[slides]

Better GANs by using the MMD. June 2018. Facebook AI Research New York. Related papers: ICLR-18, NeurIPS-18.

[slides]

Efficiently Estimating Densities and Scores with Kernel Exponential Families. June 2018. Gatsby Tri-Center Meeting. Related papers: AISTATS-18.

[slides]

Better GANs by using the MMD. June 2018. Machine Learning reading group, Google New York. Related papers: ICLR-18, NeurIPS-18.

[slides]

Better GANs by using the MMD. June 2018. Machine Learning reading group, Columbia University. Related papers: ICLR-18, NeurIPS-18. No slides actually used at the talk because of a projector mishap, but they would have been the same as the Google talk.

Advances in GANs based on the MMD. May 2018. Machine Learning Seminar, University of Sheffield. Related papers: ICLR-18, NeurIPS-18.

[slides] [abstract]

Efficient and principled score estimation with kernel exponential families. December 2017. Approximating high dimensional functions, Alan Turing Institute. Related papers: AISTATS-18.

[video] [slides]

Efficient and principled score estimation with kernel exponential families. December 2017. Computational Statistics and Machine Learning seminar, University College London. Related papers: AISTATS-18.

[slides]

Evaluating and Training Implicit Generative Models with Two-Sample Tests. August 2017. Implicit Models, ICML. Related papers: ICLR-17.

[slides]

Two-Sample Tests, Integral Probability Metrics, and GAN Objectives. April 2017. Theory of Generative Adversarial Networks, DALI. Related papers: ICLR-17.

[video] [slides]

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy. February 2017. Computational Statistics and Machine Learning seminar, Oxford University. Related papers: ICLR-17.

[slides]