Statistics and Data Science

This is the start of a book for a graduate-level course at NYU Physics titled Statistics and Data Science.

Here are some of the objectives of this course:

  • Learn essential concepts of probability

    • Become familiar with how intuitive notions of probability are connected to formal foundations.

    • Overcome barriers presented by unfamiliar notation and terminology.

    • Internalize the transformation properties of distributions, the likelihood function, and other probabilistic objects.

    • Understand the differences between Bayesian and Frequentist approaches, particularly in the context of physical theories.

    • Connect these concepts to modern data science tools and techniques like the scientific python ecosystem and automatic differentiation.

  • Learn essential concepts of statistics

    • Learn classical statistical procedures: point estimates, goodness of fit tests, hypothesis tests, confidence intervals and credible intervals.

    • Become familiar with statistical decision theory

    • Recognize probabilistic programs as statistical models

    • Become familiar with the computational challenges found in statistical inference and techniques developed to overcome them.

    • Understand the difference between statistical associations and causal inference

  • Learn essential concepts of software and computing

    • Become familiar with the scientific python ecosystem

    • Become familiar with software testing via use of nbgrader

    • Become familiar with automatic differentiation & differentiable programming

    • Become familiar with probabilistic programming

  • Learn essential concepts of machine learning

    • Become familiar with core tasks such as classification and regression

    • Understand the notion of generalization

    • Understand the role of regularization and inductive bias

    • Become familiar with the taxonomy of different types of models found in machine learning: linear models, kernel methods, neural networks, deep learning

    • Become familiar with the interplay of model, data, and learning (optimization) algorithms

    • Touch on different learning settings: supervised learning, unsupervised learning, reinforcement learning

  • Learn essential concepts of data science

    • Understand how data science connects to the topics above

    • Gain confidence in using scientific python and modern data science tools to analyze real data

Warning

Please note that the class website is under active development, and content will be added throughout the duration of the course.

Tip

If you would like to audit this class, email Prof. Cranmer (kyle.cranmer at nyu ) with your NYU netID

Note

In approaching this book I am torn between different styles. I like very much the atomic nature of Quantum Field Theory by Mark Srednicki as it is readable and a useful reference without too much narrative. On the other hand, I want to blend together the hands-on coding elements with fundamental concepts, and I am inspired by the book Functional Differential Geometry by Gerald Jay Sussman and Jack Wisdom.