Regularization for Optimal Transport and Dynamic Time Warping Distances
Machine learning deals with mathematical objects that have structure. Two common structures arising in applications are point clouds / histograms, as well as time series. Early progress in optimization (linear and dynamic programming) have provided powerful families of distances between these structures, namely Wasserstein distances and dynamic time warping scores. Because they rely both on the minimization of a linear functional over a (discrete) space of alignments and a continuous set of couplings respectively, both result in non-differentiable quantities. We show how two distinct smoothing strategies result in quantities that are better behaved and more suitable for machine learning applications, with applications to the computation of Fréchet means.
Marco Cuturi is professor of statistics at CREST/ENSAE, Université Paris Saclay. His research is currently focused on the application of optimal transport theory to machine learning and more generally data sciences. He received his Ph.D. in 2005 from the Ecole des Mines de Paris, worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 2005 and 2007, in the financial industry until 2008, and in the ORFE department of Princeton University until 2010 as a lecturer. He was an associate professor at the Graduate School of Informatics of Kyoto University between 2010 and 2016. His research is supported by a “Chaire d’Excellence de l’IDEX Paris Saclay” (2017-2020).
Causal challenges in Artificial Intelligence
The route from machine learning to artificial intelligence remains uncharted. The goal of this talk is to investigate how much progress is possible by framing machine learning beyond learning correlations: that is, by uncovering and leveraging causal relations. To this end, we will first identify multiple failure cases in modern machine learning pipelines and try to understand such failures as instances of mistaking correlation by causation. If convinced, we will explore three different ways to reveal causation from data, with some preliminary results. I hope to motivate further research by relating how advances in understanding causation from data would allow machines to ignore confounding effects and spurious correlations, generalize across distributions, leverage structure to reason, design efficient interventions, benefit from compositionality, and build causal models of the world in an unsupervised way.
David Lopez-Paz is a research scientist at Facebook AI Research, where he studies how to leverage principles from causality to transition from machine learning to artificial intelligence. Pior to that, David completed his PhD at the Max Planck Institute for Intelligent Systems and the University of Cambridge, advised by Bernhard Schölkopf and Zoubin Ghahramani. His list of publications is available at https://lopezpaz.org.
Invariance and Stability to Deformations of Deep Convolutional Representations
The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this work, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms. This is a joint work with Alberto Bietti.
Julien Mairal is a research scientist at Inria. He received a graduate degree from Ecole Polytechnique, France, in 2005, and a PhD degree from the Ecole Normale Supérieure, Cachan, France, in 2010. Then, he went to the statistics department of UC Berkeley as a post-doctoral researcher, before joining Inria in 2012. His research interests include machine learning, computer vision, and statistical image and signal processing. In 2013, he received the Cor Baayen prize, awarded every year by ERCIM to a promising young researcher in computer science and applied mathematics. In 2016, he received an ERC starting grant and in 2017, he received the IEEE PAMI young researcher award.
Machine Learning Models: Explaining their Decisions, and Validating the Explanations
Machine learning models such as deep neural networks have been successful at solving complex tasks in image recognition, text understanding, or physics. There is also a high demand to use these models for assisting humans in taking decisions, e.g. medical diagnosis, or autonomous driving. For this, one needs to be able to trust the learned model, and it is therefore necessary to thoroughly validate it. In particular, we should ensure that its decisions are based on the correct input features.
In this talk, the deep Taylor decomposition framework for explaining decisions in terms of input features will be presented. The framework is applicable to a wide range of neural network architectures, including highly complex ones such as GoogleNet. It works by propagating the model’s decision backwards in the network until the input variables are reached. The propagation mechanism at each layer is based on a Taylor expansion principle.
Explanation techniques can be used to validate a trained model. But we also need to validate the explanation technique itself. Ground-truth explanations are usually not available. However one can still test the explanation technique for a number properties considered as desirable. We will show how free parameters of the Taylor expansion allow to induce these desirable properties.
Grégoire Montavon received a Masters degree in Communication Systems from École Polytechnique Fédérale de Lausanne in 2009 and a Ph.D. degree in Machine Learning from the Technische Universität Berlin in 2013. He is currently a Research Associate in the Machine Learning Group at TU Berlin. His current research focuses on methods for interpreting machine learning models, in particular, deep neural networks.
Do we have to revisit electric power consumption data analytics in the era of deep learning?
Electric power consumption data are more and more available at different aggregation levels (from individual customer to national level) and at different sampling rates (from one measure every second to every year). In this talk, we will present the main available datasets at EDF and the different applications of machine learning methods, both supervised and unsupervised. This includes for instance decomposition of consumption by usage and customer segmentation by clustering their consumption data. The improvement of deep learning methods for these applications will be discussed.
George Hebrail is a senior researcher in data science at EDF R&D and IRT SystemX. His domain of expertise covers information systems, business intelligence and data analytics. As a researcher at EDF R&D, he has been working on Big Data solutions for the different activities of the EDF Group (generation, electrical distribution network, smart metering, customer relationship management). From 2002 to 2010, he was a professor of computer science at Telecom ParisTech engineering school. In 2017, he joined IRT SystemX part time, as the head of data science scientific team.
Lisa Anne Hendricks
Generating Natural Language Explanations for Visual Decisions
Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. For example, providing a textual explanation like “This is a cardinal because it is red with a black cheek patch” can help human users better trust and interact with an AI agent. In this talk, I will first present a model to generate explanations for a fine-grained classification task. In particular, I will consider how grounding visual evidence can lead to better explanations, including explanations which accurately discuss visual evidence which is not present in an image (e.g., “This is not a cardinal because it is not a red bird”). Finally, I will outline a new dataset collected specifically for the task of generating explanations and discuss how deictic explanations (pointing to important regions of an image) are complementary to textual explanations.
Lisa Anne Hendricks is a Ph.D. researcher in the Electrical Engineering and Computer Science department at University of California at Berkeley. She is a member of the Berkeley Artificial Intelligence Research (BAIR) lab and is advised by Trevor Darrell. In 2013 she received her B.S.E.E. in Electrical and Computer Engineering from Rice University (summa cum laude). Her research interests span deep learning, computer vision, and natural language processing. Her Ph.D. work has focused on building deep learning models which can both express information about visual content using natural language and retrieve visual information given natural language queries. She has been awarded an NDSEG fellowship, UC Berkeley Chancellor’s Fellowship, Huawei Fellowship, and Adobe Fellowship. Lisa Anne was co-president of Women in Computer Science and Engineering (WICSE) at UC Berkeley during the 2015-2016 school year and co-organized the second Women in Computer Vision (WiCV) workshop at CVPR 2016.
Machine Learning: What’s next?
For many Machine Learning (ML) problems, labeled data is readily available. When this is the case, algorithms and training time are the performance bottleneck. This is the ML researcher’s paradise! Vision and Speech are good examples of such problems because they have a stable distribution and additional human labels can be collected each year. Problems that extract their labels from history, such as click prediction, data analytics, and forecasting are also blessed with large numbers of labels. Unfortunately, there are only a few problems for which we can rely on such an endless supply of free labels. They receive a disproportionally large amount of attention from the media.
We are interested in tackling the much larger class of ML problems where labeled data is sparse. For example, consider a dialog system for a specific app to recognize specific commands such as: “lights on first floor off”, “increase spacing between 2nd and 3rd paragraph”, “make doctor appointment after Hawaii vacation”. Anyone who has attempted building such a system has soon discovered that generalizing to new instances from a small custom set of labeled instances is far more difficult than they originally thought. Each domain has its own generalization challenges, data exploration and discovery, custom features, and decomposition structure. Creating labeled data to communicate custom knowledge is inefficient. It also leads to embarrassing errors resulting from over-training on small sets. ML algorithms and processing power are not a bottleneck when labeled data is scarce. The bottleneck is the teacher and the teaching language.
To address this problem, we change our focus from the learning algorithm to teachers. We define “Machine Teaching” as improving the human productivity given a learning algorithm. If ML is the science and engineering of extracting knowledge from data, Machine Teaching is the science and engineering of extracting knowledge from teachers. A similar shift of focus has happened in computer science. While computing is revolutionizing our lives, systems sciences (e.g., programming languages, operating systems, networking) have shifted their foci to human productivity. We expect a similar trend will shift science from Machine Learning to Machine Teaching.
The aim of this talk is to convince the audience that we are asking the right questions. We provide some answers and some spectacular results. The most exciting part, however, is the research opportunities that come with the emergence of a new field.
Patrice Simard is a Distinguished Engineer in the Microsoft Research AI Lab in Redmond. He is passionate about finding new ways to combine engineering and science in the field of machine learning. Simard’s research is currently focused on human teachers. His goal is to extend the teaching language, science, and engineering, beyond the traditional (input, label) pairs.
Simard completed his PhD thesis in Computer Science at the University of Rochester in 1991. He then spent 8 years at AT&T Bell Laboratories working on neural networks. He joined Microsoft Research in 1998. In 2002, he started MSR’s Document Processing and Understanding research group. In 2006, he left MSR to become the Chief Scientist and General Manager of Microsoft’s Live Labs Research. In 2009, he became the Chief Scientist of Microsoft’s AdCenter (the organization that monetizes Bing search). In 2012, he returned to Microsoft Research to work on his passion, Machine Learning research. Specifically, he founded the Computer-Human Interactive Learning (CHIL) group to study Machine Teaching and to make machine learning accessible to everyone.
The Splendors and Miseries of AI: Overview and Challenges
We give an overview of AI; we will in particular discuss
- the breakthrough of recent combinations of deep learning & Monte Carlo Tree Search (AlphaZero);
- evolutionary algorithms for AI;
- transfer learning and training with moderate data (after all humans can learn to recognize a platypus from just one single image and play decently at Pong after a few seconds of training…);
- common sense (we can play Angry Birds correctly with almost no training because we understand the semantics);
- reinforcement learning without simulator (i.e. cases in which Monte Carlo Tree Search approaches can not be applied);
- reality gap (if learning on a simulator was enough for learning to drive!);
- structured, complex & huge action spaces;
- privacy, fairness & verification, which are critical issues for widely applying artificial intelligence;
- adversarial examples.
Olivier Teytaud has been working in logics, optimization (including evolutionary and derivative-free methods), power systems, games and machine learning in many companies and at Inria. He currently works at Facebook AI Research in Paris.
Natural Language Processing for social computing : from opinion mining to human-agent interaction
The Social Computing topic aims to gather research around computational models for the analysis of social interactions whether for web analysis or social robotics. The peculiarity of this theme is its multidisciplinary approach: computational models are established in close collaboration with research fields such as psychology, sociology, and linguistics. They are based on methods from various fields in signal processing (eg speech signal processing for the recognition of emotions), in machine learning (e.g. structured output learning for the detection of opinions in texts ), in computer science (ex: the automatic processing of the natural language for the detection of opinions, the integration of the socio-emotional component in the human-machine interactions). This presentation will describe examples of studies conducted around Social Computing topic.
In particular, we will examine the role of natural language processing in human-agent interaction by presenting our progress on the different research topics we are currently working on, such as the analysis of the likes and dislikes of the user during her interactions with a virtual agent using symbolic methods (Langlet & Clavel, 2016) and machine learning methods (Barriere et al., 2018). Opinion mining methods and their challenges in terms of machine learning will also be tackled (Garcia et al., 2018).
Chloé Clavel is Associate Professor at Telecom ParisTech. She owned a PhD on acoustic analysis of emotional speech. Before joining Telecom ParisTech she worked as a researcher at Thales Research and Technology where she focused on emotion analysis; then she became a researcher at EDF Lab working on sentiment analysis and opinion mining. At Telecom-ParisTech, she is currently working on interactions between humans and virtual agents, from user’s socio-emotional behavior analysis to socio-affective interaction strategies. She has participated to several collaborative european and national projects on Social computing (ex: H2020 ITN ANIMATAS, EU-ICT aria-valuspa, Labex smart).
Criteo AI Lab : from applied to fundamental AI
This talk will present two recent work done at Criteo AI Lab. The first is about the Film layers in order to modulate deep neural networks trained for dialog systems. The second one is about how to leverage the auction theory and the machine learning from a buyer point of view in order to reduce the impact of personalized reserve prices. Then I’ll conclude the talk by a quick overview of the topic of interest of our lab.
Jérémie Mary achevied his PhD at Paris XI under supervision of Michele Sebag and Antoine Cornuéjols. He was assistant professor at the university of Lille and member of the Inria team SequeL which focuses on sequential machine learning. He won (2011 and 2014) and organised (2012) three challenges about online recommendation within 3 of the major conferences in machine learning (ICML’11, ICML’12, RecSys’14) on problems and data provided by Yahoo!, Adobe and Twitter and obtained his HDR in 2016. Since June 17 he his Senior Staff Researcher at Criteo in Paris and co-lead a fundamental research group on interpretability for (deep) machine learning.
What can a statistician expect from GANs?
Generative Adversarial Networks (GANs) are a class of generative algorithms that have been shown to produce state-of-the art samples, especially in the domain of image creation. The fundamental principle of GANs is to approximate the unknown distribution of a given data set by optimizing an objective function through an adversarial game between a family of generators and a family of discriminators. In this talk, we illustrate some statistical properties of GANs, focusing on the deep connection between the adversarial principle underlying GANs and the Jensen-Shannon divergence, together with some optimality characteristics of the problem. We also analyze the role of the discriminator family and study the large sample properties of the estimated distribution.
Maxime Sangnier is assistant professor at Sorbonne University and is affiliated to the statistics and computer science labs. His research is focused on numerical optimization aspects involved in machine learning. He received his Ph.D. from Normandie University and worked as a post-doctoral researcher at Télécom ParisTech.