The goal of this course is to introduce various methods to infer mathematical models from empirical observations. We will explore four subjects:
The least squares method. This is a cornerstone of applied mathematics, developed by Gauss at the beginning of the 19th century; every engineer should be familiar with it, and be ready to apply it as a general-purpose approximation method. The goal of this method is to find, in a family of functions indexed by a "parameter", the function that provides the best description of a set of data (xi, yi), where yi is supposed to be a function of xi plus some noise, and where "best" means that the function chosen by the method minimizes the sum of squares of the approximation errors. Such function can be useful, in particular, to predict the behavior of future data, or also as an indirect way to measure the "parameter", in case the latter has a physical interpretation but is not directly accessible.
The LSCR (Leave-out Sign-dominant Correlation Regions) method. The least squares method allows one to find the estimate of a certain "true" parameter, that is, a "point" in Rp. Despite the fact that in the limit the estimate converges to the "true" parameter, no probabilistic guarantee can be given for the least squares solution with finite data, unless strong assumptions are made. The LSCR method is designed to overcome this difficulty; its goal is to find a confidence region of Rp in which the "true" parameter lies with a certified probability, irrespective of the distribution of the noise that corrupts the data.
Interval predictor models. The ultimate goal of this method, alike the least squares method, is to predict the behavior of future data. In this case, though, the estimate will not come in the form of a function, fitted to the past data, that is supposed to map a future independent variable x to a prediction of the future dependent variable y; instead, the method will yield a function that maps a future x to an entire interval which, with certified probability, will contain the future y. In particular, the computation of such an "interval predictor" relies on the solution of a convex optimization problem, and the probabilistic guarantee comes from a clever application of a deep result in geometry, Helly's theorem.
Machine learning.In this part of the course we will introduce classification problems, focus on binary classification applied to simple models, and relate the empirical classification error (the proportion of errors over the training set) to the true error (the probability of error on yet unseen data). In doing so, we will explore some classical and fundamental results in non-parametric statistics, notably the theorem of Glivenko and Cantelli. The goal of this part of the course is not really to provide off-the-shelf methods that can be readily applied (as may be the case for the other three parts), but to understand with very simple examples the intrinsic limits in the art of learning from empirical data.
All these subjects will be illustrated with practical examples. Possibly, the course will be integrated with seminars on real applications, given by members of our research group.
The prerequisite of the course is a working knowledge of calculus, linear algebra, and probability theory; some results which are not usually taught in basic courses will be reviewed when needed.
Updated information and further material will be available at the course's website:
http://www.ing.unibs.it/~federico.ramponi/sida.html
A set of lecture notes will be made available during the first lectures. These will be sufficient for the course's purposes. Nevertheless, the good-willing student is of course welcome to refer to other sources as well.
A standard textbook that covers the subject of system identification in full generality is
L. Ljung, System Identification - Theory for the user, 2nd ed. Prentice-Hall.
For the sections regarding LSCR and the Interval Predictor Models, the interesed student can also refer to the original research papers (which cover the subjects in far more depth that we will do in class):
M. C. Campi and E. Weyer. Identification with finitely many data points: the LSCR approach (please refer to prof. Campi to obtain a copy).
M.C. Campi, G. Calafiore and S. Garatti. Interval Predictor Models: Identification and Reliability (available on prof. Campi's webpage).
Updated information and further material will be available at the course's website: