# Draft Schedule¶

Recording of lectures are accessible here.

Week 1

9/2: Intro Recording

syllabus

juypter book

review survey

about me and my research and a preview for the course

Week 2

9/9: Basic prob theory Recording

Random Variables

Probability space

Probability Mass and Density functions

Conditional Probability

Bayes Theorem

Quantifying prior odds via betting

Incoherent beliefs

Axioms of probability

Examples

Week 3

9/14: Class Recording

Conditional probability for continuous variables

Chain rule of probability

Sneak peek at graphical models

The Drake equation

Phosphine on Venus and Bayes Theorem

Marginal Distributions

Independence

Emperical Distribution

Expectation

Variance, Covariance, Correlation

Mutual Information

Simple Data Exploration

9/16: Class Recording

Likelihood

Change of variables

Demo change of variables with autodiff

Independence and correlation

Conditioning

Autoregressive Expansion

Graphical Models

Week 4

9/21: Recording

Change of variables formula

Probability Integral Transform

Intro to automatic differentiation

Demo with automatic differentiation

Transformation properties of the likelihood

Transformation properties of the MLE

Transformation properties of the prior and posterior

Transformation properties of the MAP

9/23: Estimators

Skipped material from last lecture

Lorentz-invariant phase space

Normalizing Flows

Copula

Bias, Variance, and Mean Squared Error

Simple Examples: Poisson and Gaussian

Cramer-Rao bound & Information Matrix

Bias-Variance tradeoff

James-Stein Demo

Shrinkage

HW:

James Stein

Week 5

9/28 (Yom Kippur): Random Numbers Recording

Decision Theory

generalized decision rules (“for some prior”)

Consistency

Sufficiency

Exponential Family

Score Statistic

Information Matrix

Information Geometry

Transformation properties of Information Matrix

Jeffreys’ prior

Transformation properties

Reference Prior

Sensitivity analysis

likelihood principle

9/30: Lecture 8: Consistency and homework

Neyman Scott phenomena (an example of inconsistent MLE)

Note: Elizabeth Scott was an astronomer by background. In 1957 Scott noted a bias in the observation of galaxy clusters. She noticed that for an observer to find a very distant cluster, it must contain brighter-than-normal galaxies and must also contain a large number of galaxies. She proposed a correction formula to adjust for (what came to be known as) the Scott effect.

Note: Revisiting the Neyman-Scott model: an Inconsistent MLE or an Ill-defined Model?

walk through of nbgrader and home work assignment

Week 6

10/5: Lecture 9: Propagaion of Errors

a simple example from physics 1: estimating \(g\)

Change of variables vs. Error propagation

Demo Error propagation fails

Error propagation and Marginalization

Convolution

Central Limit Theorem

Error propagation with correlation

track example

10/7: Lecture 10: Likelihood-based modeling

Building a probabilistic model for simple physics 1 example

Connection of MLE to traditional algebraic estimator

Connection to least squares regression

Week 7

10/12 Lecture 11: Sampling

Motiving examples:

Estimating high dimensional integrals and expectations

Bayesian credible intervals

Marginals are trivial with samples

Generating Random numbers

Scipy distributions

Probability Integral Transform

Accept-Reject MC

Acceptance and efficiency

native python loops vs. numpy broadcasting

Importance Sampling & Unweighting

Connetion to Bayesian Credible Intervals

Metropolis Hastings MCMC

Proposal functions

Hamiltonian Monte Carlo

Excerpts from A Conceptual Introduction to Hamiltonian Monte Carlo by Michael Betancourt

Stan and PyMC3

10/14: Lecture 12: Hypothesis Testing and Confidence Intervals

Simple vs. Compound hypotheses

TypeI and TypeII error

critical / acceptance region

Neyman-Pearson Lemma

Test statistics

Confidence Intervals

Interpretation

Coverage

Power

No UMPU Tests

Neyman-Construction

Likelihood-Ratio tests

Connection to binary classification

prior and domain shift

Week 8

10/19: Lecture 13:

Simple vs. Compound hypotheses

Nuisance Parameters

Profile likelihood

Profile construction

Pivotal quantity

Asymptotic Properties of Likelihood Ratio

Wilks

Wald

10/21 Canceled

Week 9

10/26: Lecture 14

Upper Limits, Lower Limits, Central Limits, Discovery

Power, Expected Limits, Bands

Sensitivity Problem for uppper limits

CLs

power-constrained limits

10/28: Lecture 15 flip-flopping, multiple testing

flip flopping

multiple testing

look elsewhere effect

Familywise error rate

False Discovery Rate

Hypothesis testing when nuisance parameter is present only under the alternative

Week 10

11/2 Lecture 16 Combinations, probabilistic modelling languages, probabilistic programming

Combinations

Combining p-values

combining posteriors

likelihood-based combinations

likelihood publishing

probabilistic modelling languages

computational graphs

Probabilistic Programming

First order PPLs

Stan

Universal Probabilistic Programming

pyro

pyprob and ppx

Inference compilation

11/4 Lecture 17: Goodness of fit

conceptual framing

difference to hypothesis testing

chi-square test

Kolmogorov-Smirnov

Anderson-Darling

Zhang’s tests

Bayesian Information Criteria

software

anomaly detection

Week 11

11/9: Lecture 18 Intro to machine learning

Supervised Learning

Statistical Learning Theory

Loss, Risk, Emperical Risk

Generalization

VC dimension and Emperical risk minimization

No Free Lunch

Cross-validation test/train

Preview: the mystery of deep learning

Least Squares

Regularized least squares

Bayesian Curve fitting

Bias-Variance tradeoff

11/11 Lecture 19

Generalization

Loss functions for regression

loss function for classification

Information theory background

Entropy

Mutual information

cross entropy

Relative Entropy

Week 12

11/16: Lecture 20 Density Estimation, Deep Generative Models

Unsupervised learning

Loss functions for density estimation

Divergences

KL Divergence

Fisher distance

Optimal Transport

Hellinger distance

f-divergences

Stein divergence

Maximum likelihood (Forward KL)

can approximate with samples, don’t need target distribution

Variational Inference (Reverse KL)

Connecton to statistical physics

LDA (Topic Modelling)

BBVI

Deep Generative models

Normalizing Flows intro

background on auto-encoders

Variational Auto-encoder intro

11/18: Lecture 21 Deep Generative Models

Deep Generative models comparison

Normalizing Flows

Autoregresive models

Variational Auto-encoder

GANs

Week 13

11/23: Lecture 22 The data manifold

what is it, why is it there

in real data

in GANs etc.

How it complicates distances based on likelihood ratios

Optimal transport

11/25 Lecture 23 Optimization

Gradient descent

Momentum, Adam

Differences of likelihood fits in classical statistics and loss landscape of deep learning models

stochastic gradient descent and mini-batching intro

what is it

Week 14

11/30: Lecture 23 Stochastic gradient descent

Robbins-Monro

connection to Langevin dynamics and approximate Bayesian inference

12/2: Lecture 24 Implicit bias and regularization in learning algorithms

dynamics of gradient descent

Double descent

Week 15

12/7 Lecture 25 Deep Learning

Loss landscape

random matrix theory

connection to statistical mechanics

Deep Model Zoo

MLP

Convolutions

Sequence Models: RNN and Tree RNN

vanishing and exploding gradients

Graph Networks

Transformers

images, sets, sequences, graphs, hyper-graphs

DL and functional programming

Differentiable programming

12/9: Review

Review

## Other topics that we touched on or planned to touch on.¶

I need to move some of these topics that we discussed into the schedule. This is a place holder for now.

examples

unbinned likelihood exponential example

HW ideas

Conditional Distribuutions

Bernouli to Binomial

Binomial to Poisson

Poisson to Gaussian

Product of Poissons vs. Multinomial

CLT to Extreme Value Theory

some other shrinkage?

Jeffreys for examples

prior odds via betting example

Group Project: interactive Neyman-Construction Demo

Simulation-based inference

ABC

Diggle

likleihood ratio

likelihood

posterior

Mining Gold

Topics to Reschedule

Parametric vs. non-parametric

Non-parametric

Histograms

Binomial / Poisson statistical uncertainty

weighted entries

Kernel Density Estimation

bandwidth and boundaries

K-D Trees

Parameterized

Unsupervised learning

Maximum likelihood

loss function

Neural Denstiy Estimation

Adversarial Training

GANs

WGAN

Latent Variable Models

Simulators

Connections

graphical models

probability spaces

Change of variables

GANs

Classification

Binary vs. Multi-class classification

Loss functions

logistic regression

Softmax

Neural Networks

Domain Adaptation and Algorithmic Fairness

Kernel Machines and Gaussian Processes

Warm up with N-Dim Gaussian

Theory

Examples

Causal Inference

ladder of causality

simple examples

Domain shift, inductive bias

Statistical Invariance, pivotal quantities, Causal invariance

Elements of Causal Inference by Jonas Peters, Dominik Janzing and Bernhard Schölkopf free PDF