# Statistics and Data Science

This is the start of a book for a graduate-level course at NYU Physics titled *Statistics and Data Science*.

Here are some of the objectives of this course:

 * **Learn essential concepts of probability**

    * Become familiar with how intuitive notions of probability are connected to formal foundations. 
    * Overcome barriers presented by unfamiliar notation and terminology.
    * Internalize the transformation properties of distributions, the likelihood function, and other probabilistic objects. 
    * Understand the differences between Bayesian and Frequentist approaches, particularly in the context of physical theories.
    * Connect these concepts to modern data science tools and techniques like the scientific python ecosystem and automatic differentiation.

 * **Learn essential concepts of statistics**

    * Learn classical statistical procedures: point estimates, goodness of fit tests, hypothesis tests, confidence intervals and credible intervals.
    * Become familiar with statistical decision theory 
    * Recognize probabilistic programs as statistical models
    * Become familiar with the computational challenges found in statistical inference and techniques developed to overcome them. 
    * Understand the difference between statistical associations and causal inference

 * **Learn essential concepts of software and computing**

    * Become familiar with the scientific python ecosystem
    * Become familiar with software testing via use of nbgrader
    * Become familiar with automatic differentiation & differentiable programming 
    * Become familiar with probabilistic programming

 * **Learn essential concepts of machine learning**

    * Become familiar with core tasks such as classification and regression
    * Understand the notion of generalization
    * Understand the role of regularization and inductive bias
    * Become familiar with the taxonomy of different types of models found in machine learning: linear models, kernel methods, neural networks, deep learning
    * Become familiar with the interplay of model, data, and learning (optimization) algorithms
    * Touch on different learning settings: supervised learning, unsupervised learning, reinforcement learning

 * **Learn essential concepts of data science**

    * Understand how data science connects to the topics above
    * Gain confidence in using scientific python and modern data science tools to analyze real data

```{warning} Please note that the class website is under active development, and content will be added throughout the duration of the course.
```


```{tip} If you would like to audit this class, email Prof. Cranmer (kyle.cranmer at nyu ) with your NYU netID
```

```{note}
In approaching this book I am torn between different styles. I like very much the atomic nature of [Quantum Field Theory by Mark Srednicki](https://www.amazon.com/Quantum-Field-Theory-Mark-Srednicki/dp/0521864496) as it is readable and a useful reference without too much narrative. On the other hand, I want to blend together the hands-on coding elements with fundamental concepts, and I am inspired by the book [Functional Differential Geometry by Gerald Jay Sussman and Jack Wisdom](https://mitpress.mit.edu/books/functional-differential-geometry). 
```