Draft Schedule¶
Recording of lectures are accessible here.
Week 1
9/2: Intro Recording
syllabus
juypter book
review survey
about me and my research and a preview for the course
Week 2
9/9: Basic prob theory Recording
Random Variables
Probability space
Probability Mass and Density functions
Conditional Probability
Bayes Theorem
Quantifying prior odds via betting
Incoherent beliefs
Axioms of probability
Examples
Week 3
9/14: Class Recording
Conditional probability for continuous variables
Chain rule of probability
Sneak peek at graphical models
The Drake equation
Phosphine on Venus and Bayes Theorem
Marginal Distributions
Independence
Emperical Distribution
Expectation
Variance, Covariance, Correlation
Mutual Information
Simple Data Exploration
9/16: Class Recording
Likelihood
Change of variables
Demo change of variables with autodiff
Independence and correlation
Conditioning
Autoregressive Expansion
Graphical Models
Week 4
9/21: Recording
Change of variables formula
Probability Integral Transform
Intro to automatic differentiation
Demo with automatic differentiation
Transformation properties of the likelihood
Transformation properties of the MLE
Transformation properties of the prior and posterior
Transformation properties of the MAP
9/23: Estimators
Skipped material from last lecture
Lorentz-invariant phase space
Normalizing Flows
Copula
Bias, Variance, and Mean Squared Error
Simple Examples: Poisson and Gaussian
Cramer-Rao bound & Information Matrix
Bias-Variance tradeoff
James-Stein Demo
Shrinkage
HW:
James Stein
Week 5
9/28 (Yom Kippur): Random Numbers Recording
Decision Theory
generalized decision rules (“for some prior”)
Consistency
Sufficiency
Exponential Family
Score Statistic
Information Matrix
Information Geometry
Transformation properties of Information Matrix
Jeffreys’ prior
Transformation properties
Reference Prior
Sensitivity analysis
likelihood principle
9/30: Lecture 8: Consistency and homework
Neyman Scott phenomena (an example of inconsistent MLE)
Note: Elizabeth Scott was an astronomer by background. In 1957 Scott noted a bias in the observation of galaxy clusters. She noticed that for an observer to find a very distant cluster, it must contain brighter-than-normal galaxies and must also contain a large number of galaxies. She proposed a correction formula to adjust for (what came to be known as) the Scott effect.
Note: Revisiting the Neyman-Scott model: an Inconsistent MLE or an Ill-defined Model?
walk through of nbgrader and home work assignment
Week 6
10/5: Lecture 9: Propagaion of Errors
a simple example from physics 1: estimating \(g\)
Change of variables vs. Error propagation
Demo Error propagation fails
Error propagation and Marginalization
Convolution
Central Limit Theorem
Error propagation with correlation
track example
10/7: Lecture 10: Likelihood-based modeling
Building a probabilistic model for simple physics 1 example
Connection of MLE to traditional algebraic estimator
Connection to least squares regression
Week 7
10/12 Lecture 11: Sampling
Motiving examples:
Estimating high dimensional integrals and expectations
Bayesian credible intervals
Marginals are trivial with samples
Generating Random numbers
Scipy distributions
Probability Integral Transform
Accept-Reject MC
Acceptance and efficiency
native python loops vs. numpy broadcasting
Importance Sampling & Unweighting
Connetion to Bayesian Credible Intervals
Metropolis Hastings MCMC
Proposal functions
Hamiltonian Monte Carlo
Excerpts from A Conceptual Introduction to Hamiltonian Monte Carlo by Michael Betancourt
Stan and PyMC3
10/14: Lecture 12: Hypothesis Testing and Confidence Intervals
Simple vs. Compound hypotheses
TypeI and TypeII error
critical / acceptance region
Neyman-Pearson Lemma
Test statistics
Confidence Intervals
Interpretation
Coverage
Power
No UMPU Tests
Neyman-Construction
Likelihood-Ratio tests
Connection to binary classification
prior and domain shift
Week 8
10/19: Lecture 13:
Simple vs. Compound hypotheses
Nuisance Parameters
Profile likelihood
Profile construction
Pivotal quantity
Asymptotic Properties of Likelihood Ratio
Wilks
Wald
10/21 Canceled
Week 9
10/26: Lecture 14
Upper Limits, Lower Limits, Central Limits, Discovery
Power, Expected Limits, Bands
Sensitivity Problem for uppper limits
CLs
power-constrained limits
10/28: Lecture 15 flip-flopping, multiple testing
flip flopping
multiple testing
look elsewhere effect
Familywise error rate
False Discovery Rate
Hypothesis testing when nuisance parameter is present only under the alternative
Week 10
11/2 Lecture 16 Combinations, probabilistic modelling languages, probabilistic programming
Combinations
Combining p-values
combining posteriors
likelihood-based combinations
likelihood publishing
probabilistic modelling languages
computational graphs
Probabilistic Programming
First order PPLs
Stan
Universal Probabilistic Programming
pyro
pyprob and ppx
Inference compilation
11/4 Lecture 17: Goodness of fit
conceptual framing
difference to hypothesis testing
chi-square test
Kolmogorov-Smirnov
Anderson-Darling
Zhang’s tests
Bayesian Information Criteria
software
anomaly detection
Week 11
11/9: Lecture 18 Intro to machine learning
Supervised Learning
Statistical Learning Theory
Loss, Risk, Emperical Risk
Generalization
VC dimension and Emperical risk minimization
No Free Lunch
Cross-validation test/train
Preview: the mystery of deep learning
Least Squares
Regularized least squares
Bayesian Curve fitting
Bias-Variance tradeoff
11/11 Lecture 19
Generalization
Loss functions for regression
loss function for classification
Information theory background
Entropy
Mutual information
cross entropy
Relative Entropy
Week 12
11/16: Lecture 20 Density Estimation, Deep Generative Models
Unsupervised learning
Loss functions for density estimation
Divergences
KL Divergence
Fisher distance
Optimal Transport
Hellinger distance
f-divergences
Stein divergence
Maximum likelihood (Forward KL)
can approximate with samples, don’t need target distribution
Variational Inference (Reverse KL)
Connecton to statistical physics
LDA (Topic Modelling)
BBVI
Deep Generative models
Normalizing Flows intro
background on auto-encoders
Variational Auto-encoder intro
11/18: Lecture 21 Deep Generative Models
Deep Generative models comparison
Normalizing Flows
Autoregresive models
Variational Auto-encoder
GANs
Week 13
11/23: Lecture 22 The data manifold
what is it, why is it there
in real data
in GANs etc.
How it complicates distances based on likelihood ratios
Optimal transport
11/25 Lecture 23 Optimization
Gradient descent
Momentum, Adam
Differences of likelihood fits in classical statistics and loss landscape of deep learning models
stochastic gradient descent and mini-batching intro
what is it
Week 14
11/30: Lecture 23 Stochastic gradient descent
Robbins-Monro
connection to Langevin dynamics and approximate Bayesian inference
12/2: Lecture 24 Implicit bias and regularization in learning algorithms
dynamics of gradient descent
Double descent
Week 15
12/7 Lecture 25 Deep Learning
Loss landscape
random matrix theory
connection to statistical mechanics
Deep Model Zoo
MLP
Convolutions
Sequence Models: RNN and Tree RNN
vanishing and exploding gradients
Graph Networks
Transformers
images, sets, sequences, graphs, hyper-graphs
DL and functional programming
Differentiable programming
12/9: Review
Review
Other topics that we touched on or planned to touch on.¶
I need to move some of these topics that we discussed into the schedule. This is a place holder for now.
examples
unbinned likelihood exponential example
HW ideas
Conditional Distribuutions
Bernouli to Binomial
Binomial to Poisson
Poisson to Gaussian
Product of Poissons vs. Multinomial
CLT to Extreme Value Theory
some other shrinkage?
Jeffreys for examples
prior odds via betting example
Group Project: interactive Neyman-Construction Demo
Simulation-based inference
ABC
Diggle
likleihood ratio
likelihood
posterior
Mining Gold
Topics to Reschedule
Parametric vs. non-parametric
Non-parametric
Histograms
Binomial / Poisson statistical uncertainty
weighted entries
Kernel Density Estimation
bandwidth and boundaries
K-D Trees
Parameterized
Unsupervised learning
Maximum likelihood
loss function
Neural Denstiy Estimation
Adversarial Training
GANs
WGAN
Latent Variable Models
Simulators
Connections
graphical models
probability spaces
Change of variables
GANs
Classification
Binary vs. Multi-class classification
Loss functions
logistic regression
Softmax
Neural Networks
Domain Adaptation and Algorithmic Fairness
Kernel Machines and Gaussian Processes
Warm up with N-Dim Gaussian
Theory
Examples
Causal Inference
ladder of causality
simple examples
Domain shift, inductive bias
Statistical Invariance, pivotal quantities, Causal invariance
Elements of Causal Inference by Jonas Peters, Dominik Janzing and Bernhard Schölkopf free PDF