# Marco Cuturi

Regularization for Optimal Transport and Dynamic Time Warping Distances

Machine learning deals with mathematical objects that have structure. Two common structures arising in applications are point clouds / histograms, as well as time series. Early progress in optimization (linear and dynamic programming) have provided powerful families of distances between these structures, namely Wasserstein distances and dynamic time warping scores. Because they rely both on the minimization of a linear functional over a (discrete) space of alignments and a continuous set of couplings respectively, both result in non-differentiable quantities. We show how two distinct smoothing strategies result in quantities that are better behaved and more suitable for machine learning applications, with applications to the computation of Fréchet means.

_________________________________________________________________________

* Marco Cuturi is professor of statistics at CREST/ENSAE, Université Paris Saclay. His research is currently focused on the application of optimal transport theory to machine learning and more generally data sciences. He received his Ph.D. in 2005 from the Ecole des Mines de Paris, worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 2005 and 2007, in the financial industry until 2008, and in the ORFE department of Princeton University until 2010 as a lecturer. He was an associate professor at the Graduate School of Informatics of Kyoto University between 2010 and 2016. His research is supported by a « Chaire d’Excellence de l’IDEX Paris Saclay » (2017-2020).
*_________________________________________________________________________

**Julien Mairal**

Invariance and Stability to Deformations of Deep Convolutional Representations

The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this work, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms. This is a joint work with Alberto Bietti.

________________________________________________________________________

**Julien Mairal***is a research scientist at Inria. He received a graduate degree from Ecole Polytechnique, France, in 2005, and a PhD degree from the Ecole Normale Supérieure, Cachan, France, in 2010. Then, he went to the statistics department of UC Berkeley as a post-doctoral researcher, before joining Inria in 2012. His research interests include machine learning, computer vision, and statistical image and signal processing. In 2013, he received the Cor Baayen prize, awarded every year by ERCIM to a promising young researcher in computer science and applied mathematics. In 2016, he received an ERC starting grant and in 2017, he received the IEEE PAMI young researcher award.
*________________________________________________________________________

# Grégoire Montavon

**Machine Learning Models: Explaining their Decisions, and Validating the Explanations**

Machine learning models such as deep neural networks have been successful at solving complex tasks in image recognition, text understanding, or physics. There is also a high demand to use these models for assisting humans in taking decisions, e.g. medical diagnosis, or autonomous driving. For this, one needs to be able to trust the learned model, and it is therefore necessary to thoroughly validate it. In particular, we should ensure that its decisions are based on the correct input features.

In this talk, the deep Taylor decomposition framework for explaining decisions in terms of input features will be presented. The framework is applicable to a wide range of neural network architectures, including highly complex ones such as GoogleNet. It works by propagating the model’s decision backwards in the network until the input variables are reached. The propagation mechanism at each layer is based on a Taylor expansion principle.

Explanation techniques can be used to validate a trained model. But we also need to validate the explanation technique itself. Ground-truth explanations are usually not available. However one can still test the explanation technique for a number properties considered as desirable. We will show how free parameters of the Taylor expansion allow to induce these desirable properties.

_________________________________________________________________________

**Grégoire Montavon** received a Masters degree in Communication Systems from École Polytechnique Fédérale de Lausanne in 2009 and a Ph.D. degree in Machine Learning from the Technische Universität Berlin in 2013. He is currently a Research Associate in the Machine Learning Group at TU Berlin. His current research focuses on methods for interpreting machine learning models, in particular, deep neural networks.

_________________________________________________________________________

**Patrice Simard**

Machine Learning: What’s next?

For many Machine Learning (ML) problems, labeled data is readily available. When this is the case, algorithms and training time are the performance bottleneck. This is the ML researcher’s paradise! Vision and Speech are good examples of such problems because they have a stable distribution and additional human labels can be collected each year. Problems that extract their labels from history, such as click prediction, data analytics, and forecasting are also blessed with large numbers of labels. Unfortunately, there are only a few problems for which we can rely on such an endless supply of free labels. They receive a disproportionally large amount of attention from the media.

We are interested in tackling the much larger class of ML problems where labeled data is sparse. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “increase spacing between 2^{nd} and 3^{rd} paragraph”, “make doctor appointment after Hawaii vacation”. Anyone who has attempted building such a system has soon discovered that generalizing to new instances from a small custom set of labeled instances is far more difficult than they originally thought. Each domain has its own generalization challenges, data exploration and discovery, custom features, and decomposition structure. Creating labeled data to communicate custom knowledge is inefficient. It also leads to embarrassing errors resulting from over-training on small sets. ML algorithms and processing power are not a bottleneck when labeled data is scarce. The bottleneck is the teacher and the teaching language.

To address this problem, we change our focus from the learning algorithm to teachers. We define “Machine Teaching” as improving the human productivity *given* a learning algorithm. If ML is the science and engineering of extracting knowledge from data, Machine Teaching is the science and engineering of extracting knowledge from teachers. A similar shift of focus has happened in computer science. While computing is revolutionizing our lives, systems sciences (e.g., programming languages, operating systems, networking) have shifted their foci to human productivity. We expect a similar trend will shift science from Machine Learning to Machine Teaching.

The aim of this talk is to convince the audience that we are asking the right questions. We provide some answers and some spectacular results. The most exciting part, however, is the research opportunities that come with the emergence of a new field.

_________________________________________________________________________

**Patrice Simard** is a Distinguished Engineer in the Microsoft Research AI Lab in Redmond. He is passionate about finding new ways to combine engineering and science in the field of machine learning. Simard’s research is currently focused on human teachers. His goal is to extend the teaching language, science, and engineering, beyond the traditional (input, label) pairs.

*Simard completed his PhD thesis in Computer Science at the University of Rochester in 1991. He then spent 8 years at AT&T Bell Laboratories working on neural networks. He joined Microsoft Research in 1998. In 2002, he started MSR’s Document Processing and Understanding research group. In 2006, he left MSR to become the Chief Scientist and General Manager of Microsoft’s Live Labs Research. In 2009, he became the Chief Scientist of Microsoft’s AdCenter (the organization that monetizes Bing search). In 2012, he returned to Microsoft Research to work on his passion, Machine Learning research. Specifically, he founded the Computer-Human Interactive Learning (CHIL) group to study Machine Teaching and to make machine learning accessible to everyone.
*_________________________________________________________________________

# Chloé Clavel

Natural Language Processing for social computing : from opinion mining to human-agent interaction

The Social Computing topic aims to gather research around computational models for the analysis of social interactions whether for web analysis or social robotics. The peculiarity of this theme is its multidisciplinary approach: computational models are established in close collaboration with research fields such as psychology, sociology, and linguistics. They are based on methods from various fields in signal processing (eg speech signal processing for the recognition of emotions), in machine learning (e.g. structured output learning for the detection of opinions in texts ), in computer science (ex: the automatic processing of the natural language for the detection of opinions, the integration of the socio-emotional component in the human-machine interactions). This presentation will describe examples of studies conducted around Social Computing topic.

In particular, we will examine the role of natural language processing in human-agent interaction by presenting our progress on the different research topics we are currently working on, such as the analysis of the likes and dislikes of the user during her interactions with a virtual agent using symbolic methods (Langlet & Clavel, 2016) and machine learning methods (Barriere et al., 2018). Opinion mining methods and their challenges in terms of machine learning will also be tackled (Garcia et al., 2018).

_________________________________________________________________________

**Chloé Clavel** is Associate Professor at Telecom-Paristech. She owned a PhD on acoustic analysis of emotional speech. Before joining Telecom ParisTech she worked as a researcher at Thales Research and Technology where she focused on emotion analysis; then she became a researcher at EDF Lab working on sentiment analysis and opinion mining. At Telecom-ParisTech, she is currently working on interactions between humans and virtual agents, from user’s socio-emotional behavior analysis to socio-affective interaction strategies. She has participated to several collaborative european and national projects on Social computing (ex: H2020 ITN ANIMATAS, EU-ICT aria-valuspa, Labex smart).

_________________________________________________________________________

# Maxime Sangnier

What can a statistician expect from GANs?

Generative Adversarial Networks (GANs) are a class of generative algorithms that have been shown to produce state-of-the art samples, especially in the domain of image creation. The fundamental principle of GANs is to approximate the unknown distribution of a given data set by optimizing an objective function through an adversarial game between a family of generators and a family of discriminators. In this talk, we illustrate some statistical properties of GANs, focusing on the deep connection between the adversarial principle underlying GANs and the Jensen-Shannon divergence, together with some optimality characteristics of the problem. We also analyze the role of the discriminator family and study the large sample properties of the estimated distribution.

_________________________________________________________________________

**Maxime Sangnier** is assistant professor at Sorbonne University and is affiliated to the statistics and computer science labs. His research is focused on numerical optimization aspects involved in machine learning. He received his Ph.D. from Normandie University and worked as a post-doctoral researcher at Télécom ParisTech.