MOD 2017 Industrial Session aims to bring together participants from academia and industry in a venue that highlights practical and real-world studies of machine learning, optimization and data science.
The ultimate goal of this event is to encourage mutually-beneficial exchange between scientific researchers and practitioners working to improve data science analytics.
The session will consist of a series of invited presentations from leading experts in industry on selected topics in machine learning, optimization and data science from industry perspective and with a special focus on real-world applications.
Moderator: Aris Anagnostopoulos, Sapienza University of Rome, Italy
Panellists:
In social media, attention concentrates on a relatively small number of popular items, while the vast majority of content produced by the crowd is almost neglected. Although popularity can be an indication of the perceived value of an item within its community, previous research has hinted to the fact that popularity is distinct from intrinsic quality. We embarked in a journey to quantitatively measure intangible properties such as quality and beauty using a multidisciplinary approach that ranges from social sciences to deep learning. Our research shows how algorithms that can reliable capture quality can democratise social media, improve our experience of online web services and even help us living a better life in our cities.
Question answering (QA) systems are able to answer questions posed by humans in a natural language. QA is a discipline within the fields of Information Retrieval (IR) and Natural Language Processing (NLP) that involves several machine learning approaches that need to scale on a huge amount of textual data.
QA is a complex task that requires to deeply understand the question, search relevant documents and extract the correct answer from the retrieved documents. This talk provides a brief introduction to QA focusing on the NLP and machine learning techniques, followed by details about a QA framework called QuestionCube.
Finally, I will present two different scenarios where QA is successfully applied: a famous language game called “Who Wants to Be a Millionaire?” and a semantic search engine for FAQs developed for supporting customers in a real application.
Algorithms and decision making based on Big Data have become pervasive in all aspects of our daily (offline and online) lives. Social media, e-commerce, professional, political, educational, and dating sites, to mention just a few, shape our possibilities as individuals, consumers, employees, voters, students, and lovers. In this process, vast amounts of personal data are collected and used to train machine-learning based systems. These systems are used to classify and rank people, and can discriminate us on grounds such as gender, age, or ethnicity, even without intention, and even if legally protected attributes, such as race, are not explicit in the data. Algorithmic bias exists even when there is no discrimination intention in the developer of the algorithm. Sometimes it may be inherent to the data sources used (software making decisions based on data can reflect, or even amplify, the results of historical discrimination), but even when the sensitive attributes have been suppressed from the input, a well trained machine learning algorithm may still discriminate on the basis of such sensitive attributes because of correlations existing in the data.
From a technical point of view, efforts at fighting algorithmic bias have led to developing two groups of solutions: (1) techniques for discrimination discovery from data and (2) discrimination prevention by means of fairness-aware data mining, develop data mining systems which are discrimination-conscious by-design. In this talk we mainly focus on the first groups of solutions. This talk is joint work with Sara Hajian and Francesco Bonchi, and an extended version of it was presented as a KDD 2016 tutorial: http://francescobonchi.com/algorithmic_bias_tutorial.html
Deep Learning: Theory, Architectures, Algorithms and Applications
Learning multi-level representations of data
and
learning from very large amounts of data.
Data-driven Algorithmics – Data for informed decisions
Data-driven algorithmics is an emerging topic that requires synthesis of prediction tools from machine learning with algorithms from theoretical computer science.
Topics: information theory in algorithm design, deep learning paradigms for data-driven optimization, convex optimization, generative models, barriers to implementation of algorithms in practice, new paradigms for balancing online and batch learning, space and precision tradeoffs, large-scale machine learning tasks.
“Multi-Objective Optimization Algorithms”
“Data Science for Smart Cities, Communities, Energy & Building”
“Algorithmics for Biology and Medicine”
Clarisse Dhaenens, University of Lille, France – Clarisse.Dhaenens@univ-lille1.fr
Laetitia Jourdan, University of Lille, France – laetitia.jourdan@univ-lille1.fr
Even if the term Big Data, is not always used with the same meaning, man agrees to say that it brings many challenges. When regarding the whole process related to the big-data context, starting from the generation of data, its storage and management, and analyzes that can be driven in order to help decision making, at each phase some important challenges arise. Indeed, during the generation and capture of data, some challenges may be related to technological aspects linked to the acquisition of real-time data, for example. But at this phase, challenges are also related to the identification of meaningful data. The storage and management phase leads to two critical challenges, first on the infrastructures for the storage of data and its transportation, but also on conceptual models to provide well-formed available data that may be used for analysis purpose. Then, the analysis phase has its own challenges, with the manipulation of heterogeneous massive data. In particular, when considering the knowledge extraction, in which unknown patterns have to be discovered, analysis may be very complex due to the nature of data manipulated. This is the heart of the data mining. A way to address data mining problems is to model them as optimization problems that can be of multi-objective nature. In the context of Big Data, most of these problems are large scale ones. Hence meta-heuristics seem to be good candidates to tackle them. But, it should be noticed that meta-heuristics are not only suitable to address the large size aspect of the problem but also to deal with other aspects of Big Data, such as variety and velocity for example. The aim of this special session is to group contributions in which meta-heuristics and multi-objective optimization can provide answers to some of the challenges induced by the Big Data context, and in particular within the data analytics phase. The scope of the special session MMO-BD includes, but is not limited to the following topics:
– Meta-heuristics for supervised data mining tasks (classification, association rules…)
– Meta-heuristics for unsupervised data mining tasks (clustering, bi-clustering…)
– Meta-heuristics for mining heterogeneous data
– Meta-heuristics for text mining
– Multi-objective models for data mining tasks
Short bio of the organizers
o Clarisse Dhaenens (Professor, CRIStAL, Univ Lille / CNRS, France) Clarisse Dhaenens is a full professor at the University of Lille. She is currently the vice-head of CRIStAL research laboratory. She obtained her PhD in 1998 from the polytechnicum University of Grenoble (INPG). She became an associate professor in 1999 at the University of Lille and a full professor in 2006. Clarisse Dhaenens works deal with operations research, combinatorial optimization with applications in knowledge discovery for bioinformatics and healthcare. She is, for example, interested in multi-objective optimization and links between structures of problems and their solving. She has just written a book with Laetitia Jourdan “meta-heuristics for big-data”
o Laetitia Jourdan (Professor, CRIStAL, Univ Lille / CNRS, France) Pr. Laetitia JOURDAN (F) is currently full Professor in Computer Sciences at University of Lille/CRIStAL. Her areas of research are modeling data mining task as combinatorial optimization problems, solving methods based on meta-heuristics, incorporate learning in meta-heuristics and multi-objective optimization. Pr. Jourdan received a master degree in computer science and mathematics for University Paris Dauphine in 1999. Pr. Jourdan hold a PhD in combinatorial optimization from the University of Lille 1 (France). From 2004 to 2005, she was research associate at University of Exeter (UK). Then she was researcher with tenure at INRIA. She holds her dissertation to lead researches (“HDR: Habilitation à Diriger des Recherches”) from the Univ. of Lille in 2010. Her areas of research are modeling data mining task as combinatorial optimization problems, solving methods based on meta-heuristics, incorporate learning in meta-heuristics and multi objective optimization with application to health and bioinformatics. She directed and co-supervised nine PhD and twelve Master students. She is (co)author of more than 100 papers published in international journals, book chapters, and conference proceedings. She organized several international conferences (LION 2015, MIC 2015, etc) and is reviewer editor for frontier in Big Data
Contact information of the organizers:
Clarisse Dhaenens CRIStAL Univ. Lille / CNRS France CRIStAL Bat M3 Cité Scientifique 59655 Villeneuve d’Ascq Cedex FRANCE Clarisse.Dhaenens@univlille1.fr http://www.cristal.univlille.fr/~dhaenens/
Laetitia Jourdan CRIStAL Univ. Lille / CNRS France CRIStAL Bat M3 Cité Scientifique 59655 Villeneuve d’Ascq Cedex FRANCE Laetitia.Jourdan@univlille1.fr http://www.cristal.univlille.fr/~jourdan/
“Scalable Data Mining on Cloud Computing Systems”
Domenico Talia
Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica
Università della Calabria, Italy
Syllabus. Parallel data mining techniques, distributed data mining, Cloud-based data analytics workflows, exascale programming.
Day: TBA
Short CV of the lecturer: Domenico Talia is a full professor of computer engineering at the University of Calabria. He is a partner of two startups, Exeura and DtoK Lab. His research interests include parallel and distributed data mining, cloud computing, social data analysis, mobile computing, peer-to-peer systems, and parallel programming. Talia published ten books and more than 300 papers in archival journals such as CACM, Computer, IEEE TKDE, IEEE TSE, IEEE TSMC-B, IEEE Micro, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal, the International Journal on Web and Grid Services, the Scalable Computing: Practice and Experience journal, MultiAgent and Grid Systems: An International Journal, International Journal of Web and Grid Services, and the Web Intelligence and Agent Systems International journal. Talia has been a project for several international institutions such as the European Commission, Aeres in France, Austrian Science Fund, Croucher Foundation, and the Russian Federation Government. He served as a chair, organizer, or program committee member of several international conferences and gave many invited talks and seminars in conferences and schools. Talia is a member of the ACM and the IEEE Computer Society.
“Mathematical Analysis of Nature-Inspired Algorithms”
Xin–She Yang
School of Science and Technology
Middlesex University London
United Kingdom
Many problems in optimization and computational intelligence are very challenging to solve, and some of these problems can be NP-hard, which means that there are often no efficient algorithms to tackle such hard problems. In many cases, nature-inspired metaheuristic algorithms can be a good alternative and such algorithms include genetic algorithms (GA), particle swarm optimization (PSO), ant colony optimization (ACO) and many others. Over the last two decades, nature-inspired optimization algorithms have become increasingly popular in solving large-scale, nonlinear, global optimization with many real-world applications. They also become an important of part of optimization and computational intelligence. This tutorial will provide a critical analysis of recent algorithms using mathematical theories such as Markov chains, dynamic systems, random walks and self-organization systems. This will provide some insight into these algorithms concerning their convergence rates and stability.
Audience and expected participants:
All participants, especially young researchers, who wish to gain in-depth understanding of evolutionary algorithms or to carry out more research in theoretical aspects of algorithms.
Timeliness of the Tutorial:
The number of new nature-inspired algorithms, especially those based on swarm intelligence or inspiration from natural systems, has increased significantly in recent years. It is estimated that there are more than 100 different variants of nature-inspired algorithms. Therefore, this tutorial is a timely attempt to provide a critical analysis of recent algorithms from a mathematical perspective and analyse these algorithms using a unified mathematical framework.
Topics and Format:
This tutorial intends to introduce the fundamentals and latest advances of the state-of-the-art nature-inspired algorithms with the focus on mathematical analysis on new algorithms using a unified theoretical framework of Markov chain theory, random walks, dynamic systems and self-organization theory. Topics include
Day: TBA
Tutor:
Prof. Xin-She Yang
School of Science and Technology, Middlesex University, London NW4 4BT, United Kingdom
http://scholar.google.co.uk/citations?user=fA6aTlAAAAAJ
Email: x.yang@mdx.ac.uk
Brief Biography:
Xin-She Yang is a Reader at School of Science and Technology, Middlesex University (UK) and an Adjunct Professor at Reykjavik University (Iceland). He is also an elected Bye-Fellow at Downing College, Cambridge University. He worked at Cambridge University and then National Physical Laboratory as a Senior Research Scientist after obtaining his DPhil in Applied Mathematics at Oxford University. With more 250 publications and more than 20 books, his research has been cited more than 20,000 times (according to Google Scholar) with an h-index of 57. He is also on the list of Highly Cited Researchers 2016 according to Thomson Reuters’ Web of Science. He is the Chair of IEEE CIS Task Force on Business Intelligence and Knowledge Management. He is the Editor-in-Chief of Int. J. Mathematical Modelling and Numerical Optimisation (IJMMNO), and the Director of International Consortium for Modelling and Optimization in Science and Technology (iCOMSI).
As the development of three nature-inspired algorithms (namely, cuckoo search, firefly algorithm and the bat algorithm), Yang has given many invited keynote talks at international conferences such as BIOMA2012 (Slovenia), Mendel’2012 (Czech Republic), ICCS2015 (Iceland), SIBGRAPI2015 (Brazil) and IEEE OIPE2016 (Italy). He has also given tutorials at international conferences such as FedCSIS2011 (Poland), IAMG2011 (Austria) and ECTA2015 (Portugal) on algorithms and nature-inspired computation.
For the list of publications, please see
https://scholar.google.co.uk/citations?user=fA6aTlAAAAAJ
Some recent books:
http://www.sciencedirect.com/science/book/9780124167438
http://www.sciencedirect.com/science/book/9780128045367