MACHINE LEARNING FOR PHYSICS AND THE NATURAL SCIENCES - Parma 2021

Code of Conduct: Diversity is considered a resource that enriches us culturally and intellectually in this class. No instances of harassment or attempts to marginalize students will be tolerated in my class. Be respectful and collaborate instead of competing. If you have concerns please come talk to me
Course Description
This course will teach you the basis of data driven inference in the physical sciences. You will learn examples of machine learning methods applied to current problems in Physics and the Natural Sciences. You will acquire basic computational skills, knowledge of statistical analysis, error analysis, good practises for handling, processing, and analyzing data and (including big-data) programmatically, and communication and visualization skills. Some of the simpler algorithms will be explored in detail and implemented from scratch, others will be implemented through the use of dedicated python libraries.

Don't worry about how much you already know, especially do not compare it to what other students know. You may have the wrong perception of your skills, and of the skills of your classmates, and your strengths and the strengths of your background may be less obvious than, say coding or moath, but just as important for a Physical Scientist. Some of you may have a good handle and understanding of some or all of the physics problems we will study, others a good handle on coding, others yet an easy time understanding the details of the analysis. All these components make proficiency in this class, and all these components make a good scientist. We, the class assistants and I, are here to help you develop the skills you do not yet have and strengthen the skills you already have.

Learning Outcomes
By the end of this class you should be able to formulate an appropriate analysis plan for a research question, select, gather, and prepare data for analysis, and choose and apply machine learning methods to the data.


The instructors is: Dr. Federica Bianco fbianco@udel.edu
office hours: TBD

Resources
The primary textbooks are:
Elements of Statistical Learning, Hastie, Tibshirani, Friedman, Springer 2001
Statistics, Data Mining, and Machine Learning in Astronomy, Ivezic, Connoly, VanderPlas, Gray, Princeton Press 2nd edition
ML in python: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow probably the book that is closer to the syllabus in terms of techniques, but doesn’t buy it, because the second edition is due to come out imminently and the deep learning chapters of the previous edition are out of date now


In addition, depending on your familiarity with coding, statistics, and visualization
Python Data Science Handbook, Jake VanderPlas, O'Reilly Media [https://www.oreilly.com/library/view/python-data-science/9781491912126/]
computing and coding: Beginning Python Visualization, 2009
Interactive Data Visualization, S. Murray, O'Reilly Media
Visualizations: Visualizations Analysis and Design, T. Munzer, 2014


Each week you will attend classes, which will include lectures and hands-on lab work. Attendance in lecture and lab is mandatory.


Technology
Google Collaboratory will be used for the class. Homework can be developed on any platform as long as the computational set up consistent the entire class: the class assistants and I need to be able to reproduce your work and obtain the same results. Modules and library used in your work need to be accessible to me, the graders, and your classmates. We may make a docker image and a virtual environment as well and instructions on how to set up your environment to allow you to work offline.

The course will be organized in a modular fashion, with some guest lectures. Each machine learning method will be studied as it is applied to a physical problem, based on open data and literature examples.

Homework will be exclusively received through github.
Homework projects must be turned in as iPython notebooks by checking them into your github account in the DSPS_/HW_ repo (unless otherwise stated).

Assessment
Grades are based on

10% pre-class questions
15% labs performance and participation
20% homework
20% midterm
35% final

Weekly assignments will be handed out at the end of the class, and will be due strictly before the first class of the following week (no submissions at all can be accepted after that as the homework may be reviewed in class). Please come to class on time: at the beginning of each class you will be handed a sheet of “Pre-class Questions” to be answered before each lecture and before each lab. You will have up to 7 minutes after the official start time of the class to answer them. The later you arrive at the class, the less time you will have to answer the questions. This will affect your homework grade as described above. The questions will cover
the material in the previous classes, and
the reading assignments.

Late homework will not be accepted. A single 72-hour exception is allowed throughout the semester, explicitly declare that you are going to use it before the deadline, and do use it wisely. The lowest grade in the first half of the course (before midterm), and the lowest grade in the second half will be disregarded in assigning you a final grade. If you fail to turn in an assignment that will be a 0, and (likely) the lowest grade. This means you will lose the chance to disregard your worst performance.

We encourage you to work in groups of up to 5 people, but as a collaborative project. Individual notebooks must be returned for each homework. Different group members should lead different aspects of the work. A statement must be included in the README explaining each team member’s contribution (similar to an acknowledge of contribution you would find in a Nature letter see, for example these contributions). Midterm and Final will include aspects of the work developed in the homework sessions. Failing to actively participate in the homework will result in not being able to get the Midterm and Final done.

For the Midterm and the Final you are responsible for material in the labs, the reading, and the homework. In preparing for the exams, use the homework as a guide to which material is essential. In the Midterm and Final you will be expected to work individually.

There will be opportunities for extra credit projects to improve your grade after the first half of the semester (grade counting toward participation).
Course Calendar
Lecture and reading schedule (details subject to change):

Lecture 1
Lecture: philosophy and good practices of data science: the flow chart of a data-driven project from idea to divulgation, the concepts of falsifiability, reproducibility, open science, the importance of version control, iPython Notebooks
LAB: github repositories, setting up your environment, Python vs iPython, and iPython notebooks

Lecture 2:
Lecture: Introduction to the statistics. why everything is gaussian (...or not), bias, basic distributions, moments, Hypothesis testing (chi-square, z-test, p-value).
Lab: finding the correct distribution data: TBD (Statistical mechanics)

Lecture 3:
Lecture: Uncertainties
Lab: discovering Dark Matter from the rotational curve of the Milky Way Dark Matter (astrophysics)

Lecture 4:
Guest Lecture: Data Ethics
Lecture: Preprocessing, acquiring and preparing data (CSV, TSV, downloadable ascii files, basic SQL) merging data from different files, plotting histograms and scatter plots, data types incl ordinal, continuous, categorical data, missing data, small data/
Lab: read and clean data, introduction to data structures (dictionaries, lists, arrays), style guides: data TB (condensed matter?)

Lecture 6:
Lecture: The nuances of line fitting and MCMC
Lab: discovering the accelerated expansion of the Universe fitting a line to data with uncertainties data: SN cosmology (astrophysics)


Midterm
Lecture 7:
Lecture: Likelihood, OLS, WLS, basic Bayesian concepts
Lab: Lab: goodness of fit, choosing a model, data: LIGO Gravitational Waves (gravitation, astrophysics)

Lecture 9:
Lecture: Visualizations. Communication through visualizations, history, significance, good and bad visualization examples, what have we learnt since the 1800s?
Lab: a visualization based on any data of choice

Lecture 8:
Lecture: classification: Tree methods
Lab: discover Higgs boson - data: LHC data (high energy astrophysics)

Lecture 10:
Lecture: (time)-series techniques: smoothing, detrending, stationary, non-stationary, homeo- & heteroscedastic noise, vectorization
Lab: data TBD

Lecture 11:
Lecture: Dimensionality Reduction: Clustering
Lab: (probably) discovering Phase Transitions data: (quantum physics)

Lecture 12:
Lecture: Neural Nets and Deep Learning
Lab: image analysis and pattern recognition with DL: data : (probably) Spectral Galaxy classification with DL

Lecture 13:
Lecture: Neural Nets and Deep Learning
Lab: image analysis and pattern recognition with DL: data : (probably) Spectral Inference Networks (quantum mechanics)

Maybe:
Lecture: Fitting and noise: Gaussian Processes
Lab: stellar variation data: Kepler (astrophysics)


The Final exam is cumulative: you are responsible for all of the material


Course Expectations and Policies

Attendance

Absences on religious holidays listed in university calendars are recognized as excused absences. Nevertheless, students are urged to remind the instructor of their intention to be absent on a particular upcoming holiday. Absences on religious holidays not listed in university calendars, as well as absences due to athletic participation or other extracurricular activities in which students are official representatives of the university, shall be recognized as excused absences when the student informs the instructor in writing during the first two weeks of the semester of these planned absences for the semester. All unexcused absences will result in loss of participation credit for the session in question.

Late assignmets

Late homework will not be accepted. A single 72-hour exception is allowed throughout the semester

Extra Credit
There may be opportunities for an extra credit project to catch your grade after the first half of the semester (grade counting toward participation).




Professional Conduct
● Adhere to the ​University Code of Conduct​.
● Be punctual.
● Complete all reading and homework assignments.
● Be respectful of your peers and instructor.
● Hold yourself accountable for your own academic performance.