Special Sessions & Tutorials

 

MOD 2017  Special Sessions and Tutorials

Special Session 1, MOD 2017 Industrial Session – 2nd Edition, September 15th, 8:45-12:40

Industrial Session on Machine Learning, Optimization and Data Science for Real-World Applications

MOD 2017 Industrial Session aims to bring together participants from academia and industry in a venue that highlights practical and real-world studies of machine learning, optimization and data science. 

The ultimate goal of this event is to encourage mutually-beneficial exchange between scientific researchers and practitioners working to improve data science analytics. 

The session will consist of a series of invited presentations from leading experts in industry on selected topics in machine learning, optimization and data science from industry perspective and with a special focus on real-world applications. 

Moderator: Aris Anagnostopoulos, Sapienza University of Rome, Italy

 

Panellists:

  • A tale of Beauty and Happiness”, Luca Maria Aiello, Nokia Bell Labs, UK

In social media, attention concentrates on a relatively small number of popular items, while the vast majority of content produced by the crowd is almost neglected. Although popularity can be an indication of the perceived value of an item within its community, previous research has hinted to the fact that popularity is distinct from intrinsic quality. We embarked in a journey to quantitatively measure intangible properties such as quality and beauty using a multidisciplinary approach that ranges from social sciences to deep learning. Our research shows how algorithms that can reliable capture quality can democratise social media, improve our experience of online web services and even help us living a better life in our cities.

 

  • Question Answering for a Language Game and Customer Support”, Pierpaolo Basile, University of Bari, Italy

Question answering (QA) systems are able to answer questions posed by humans in a natural language. QA is a discipline within the fields of Information Retrieval (IR) and Natural Language Processing (NLP) that involves several machine learning approaches that need to scale on a huge amount of textual data.
QA is a complex task that requires to deeply understand the question, search relevant documents and extract the correct answer from the retrieved documents.  This talk provides a brief introduction to QA focusing on the NLP and machine learning techniques, followed by details about a QA framework called QuestionCube.

Finally, I will present two different scenarios where QA is successfully applied: a famous language game called “Who Wants to Be a Millionaire?” and a semantic search engine for FAQs developed for supporting customers in a real application.

 

  • Detecting Algorithmic Bias”, Carlos Castillo, Universitat Pompeu Fabra in Barcelona, Spain

Algorithms and decision making based on Big Data have become pervasive in all aspects of our daily (offline and online) lives. Social media, e-commerce, professional, political, educational, and dating sites, to mention just a few, shape our possibilities as individuals, consumers, employees, voters, students, and lovers. In this process, vast amounts of personal data are collected and used to train machine-learning based systems. These systems are used to classify and rank people, and can discriminate us on grounds such as gender, age, or ethnicity, even without intention, and even if legally protected attributes, such as race, are not explicit in the data. Algorithmic bias exists even when there is no discrimination intention in the developer of the algorithm. Sometimes it may be inherent to the data sources used (software making decisions based on data can reflect, or even amplify, the results of historical discrimination), but even when the sensitive attributes have been suppressed from the input, a well trained machine learning algorithm may still discriminate on the basis of such sensitive attributes because of correlations existing in the data.

From a technical point of view, efforts at fighting algorithmic bias have led to developing two groups of solutions: (1) techniques for discrimination discovery from data and (2) discrimination prevention by means of fairness-aware data mining, develop data mining systems which are discrimination-conscious by-design. In this talk we mainly focus on the first groups of solutions. This talk is joint work with Sara Hajian and Francesco Bonchi, and an extended version of it was presented as a KDD 2016 tutorial: http://francescobonchi.com/algorithmic_bias_tutorial.html

 

Special Session 2

Deep Learning: Theory, Architectures, Algorithms and Applications
Learning multi-level representations of data
and
learning from very large amounts of data.

 

Special Session 3

Data-driven Algorithmics – Data for informed decisions
Data-driven algorithmics is an emerging topic that requires synthesis of prediction tools from machine learning with algorithms from theoretical computer science.
Topics: information theory in algorithm design, deep learning paradigms for data-driven optimization, convex optimization, generative models, barriers to implementation of algorithms in practice, new paradigms for balancing online and batch learning, space and precision tradeoffs, large-scale machine learning tasks.

Special Session 4

“Multi-Objective Optimization Algorithms”

Special Session 5

“Data Science for Smart Cities, Communities, Energy & Building”

Special Session 6

“Algorithmics for Biology and Medicine”

 

Special Session 7

Metaheuristics and Multi-Objective Optimization for Big Data

Clarisse Dhaenens, University of Lille, France  – Clarisse.Dhaenens@univ-lille1.fr
Laetitia Jourdan, University of Lille, France – laetitia.jourdan@univ-lille1.fr

https://sites.google.com/view/mmo-bd2017/

Aim and scope

Even if the term Big Data, is not always used with the same meaning, man agrees to say that it brings many challenges. When regarding the whole process related to the big-data context, starting from the generation of data, its storage and management, and analyzes that can be driven in order to help decision making, at each phase some important challenges arise. Indeed, during the generation and capture of data, some challenges may be related to technological aspects linked to the acquisition of real-time data, for example. But at this phase, challenges are also related to the identification of meaningful data. The storage and management phase leads to two critical challenges, first on the infrastructures for the storage of data and its transportation, but also on conceptual models to provide well-formed available data that may be used for analysis purpose. Then, the analysis phase has its own challenges, with the manipulation of heterogeneous massive data. In particular, when considering the knowledge extraction, in which unknown patterns have to be discovered, analysis may be very complex due to the nature of data manipulated. This is the heart of the data mining. A way to address data mining problems is to model them as optimization problems that can be of multi-objective nature. In the context of Big Data, most of these problems are large scale ones. Hence meta-heuristics seem to be good candidates to tackle them. But, it should be noticed that meta-heuristics are not only suitable to address the large size aspect of the problem but also to deal with other aspects of Big Data, such as variety and velocity for example. The aim of this special session is to group contributions in which meta-heuristics and multi-objective optimization can provide answers to some of the challenges induced by the Big Data context, and in particular within the data analytics phase. The scope of the special session MMO-BD includes, but is not limited to the following topics:

– Meta-heuristics for supervised data mining tasks (classification, association rules…)

– Meta-heuristics for unsupervised data mining tasks (clustering, bi-clustering…)

– Meta-heuristics for mining heterogeneous data

– Meta-heuristics for text mining

– Multi-objective models for data mining tasks

Short bio of the organizers

o Clarisse Dhaenens (Professor, CRIStAL, Univ Lille / CNRS, France) Clarisse Dhaenens is a full professor at the University of Lille. She is currently the vice-head of CRIStAL research laboratory. She obtained her PhD in 1998 from the polytechnicum University of Grenoble (INPG). She became an associate professor in 1999 at the University of Lille and a full professor in 2006. Clarisse Dhaenens works deal with operations research, combinatorial optimization with applications in knowledge discovery for bioinformatics and healthcare. She is, for example, interested in multi-objective optimization and links between structures of problems and their solving. She has just written a book with Laetitia Jourdan “meta-heuristics for big-data”

o Laetitia Jourdan (Professor, CRIStAL, Univ Lille / CNRS, France) Pr. Laetitia JOURDAN (F) is currently full Professor in Computer Sciences at University of Lille/CRIStAL. Her areas of research are modeling data mining task as combinatorial optimization problems, solving methods based on meta-heuristics, incorporate learning in meta-heuristics and multi-objective optimization. Pr. Jourdan received a master degree in computer science and mathematics for University Paris Dauphine in 1999. Pr. Jourdan hold a PhD in combinatorial optimization from the University of Lille 1 (France). From 2004 to 2005, she was research associate at University of Exeter (UK). Then she was researcher with tenure at INRIA. She holds her dissertation to lead researches (“HDR: Habilitation à Diriger des Recherches”) from the Univ. of Lille in 2010. Her areas of research are modeling data mining task as combinatorial optimization problems, solving methods based on meta-heuristics, incorporate learning in meta-heuristics and multi objective optimization with application to health and bioinformatics. She directed and co-supervised nine PhD and twelve Master students. She is (co)author of more than 100 papers published in international journals, book chapters, and conference proceedings. She organized several international conferences (LION 2015, MIC 2015, etc) and is reviewer editor for frontier in Big Data

Contact information of the organizers:

Clarisse Dhaenens CRIStAL Univ. Lille / CNRS France CRIStAL Bat M3 Cité Scientifique 59655 Villeneuve d’Ascq Cedex FRANCE Clarisse.Dhaenens@univlille1.fr http://www.cristal.univlille.fr/~dhaenens/

Laetitia Jourdan CRIStAL Univ. Lille / CNRS France CRIStAL Bat M3 Cité Scientifique 59655 Villeneuve d’Ascq Cedex FRANCE Laetitia.Jourdan@univlille1.fr http://www.cristal.univlille.fr/~jourdan/

 

 

 

Tutorial 1

Scalable Data Mining on Cloud Computing Systems
Domenico Talia

Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica

Università della Calabria, Italy

Summary. The analysis of the massive and distributed data repositories is a challenging task and it requires the combined use of intelligent data analysis techniques, machine learning algorithms, and scalable architectures to find and extract useful information from them. Parallel computers, distributed systems and Cloud computing platforms offer an effective support for addressing both the computational and data storage needs of Big Data mining and parallel analytics applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large storage facilities together with high performance processors to get results in suitable times. In this tutorial we introduce the most relevant topics and the main research issues in high performance data mining including parallel data mining strategies, distributed analysis techniques, and Cloud-based data mining. We also present some data mining frameworks designed for developing distributed data analytics applications as workflows of services on Clouds. In these environment data sets, analysis tools, data mining algorithms and knowledge models are implemented as single services that are combined through a visual programming interface in distributed workflows. Application design and execution of data analysis use cases are discussed. Programming issues on exascale systems and applications will be also introduced.

Syllabus. Parallel data mining techniques, distributed data mining, Cloud-based data analytics workflows, exascale programming.

Day: TBA

Short CV of the lecturer: Domenico Talia is a full professor of computer engineering at the University of Calabria. He is a partner of two startups, Exeura and DtoK Lab. His research interests include parallel and distributed data mining, cloud computing, social data analysis, mobile computing, peer-to-peer systems, and parallel programming. Talia published ten books and more than 300 papers in archival journals such as CACM, Computer, IEEE TKDE, IEEE TSE, IEEE TSMC-B, IEEE Micro, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal, the International Journal on Web and Grid Services, the Scalable Computing: Practice and Experience journal, MultiAgent and Grid Systems: An International Journal, International Journal of Web and Grid Services, and the Web Intelligence and Agent Systems International journal. Talia has been a project for several international institutions such as the European Commission, Aeres in France, Austrian Science Fund, Croucher Foundation, and the Russian Federation Government. He served as a chair, organizer, or program committee member of several international conferences and gave many invited talks and seminars in conferences and schools. Talia is a member of the ACM and the IEEE Computer Society.

Tutorial 2

“Mathematical Analysis of Nature-Inspired Algorithms”

XinShe Yang

School of Science and Technology
Middlesex University London
United Kingdom

x.yang@mdx.ac.uk

Many problems in optimization and computational intelligence are very challenging to solve, and some of these problems can be NP-hard, which means that there are often no efficient algorithms to tackle such hard problems. In many cases, nature-inspired metaheuristic algorithms can be a good alternative and such algorithms include genetic algorithms (GA), particle swarm optimization (PSO), ant colony optimization (ACO) and many others. Over the last two decades, nature-inspired optimization algorithms have become increasingly popular in solving large-scale, nonlinear, global optimization with many real-world applications. They also become an important of part of optimization and computational intelligence. This tutorial will provide a critical analysis of recent algorithms using mathematical theories such as Markov chains, dynamic systems, random walks and self-organization systems. This will provide some insight into these algorithms concerning their convergence rates and stability.

 

Audience and expected participants:

All participants, especially young researchers, who wish to gain in-depth understanding of evolutionary algorithms or to carry out more research in theoretical aspects of algorithms.

 

Timeliness of the Tutorial:

The number of new nature-inspired algorithms, especially those based on swarm intelligence or inspiration from natural systems, has increased significantly in recent years. It is estimated that there are more than 100 different variants of nature-inspired algorithms. Therefore, this tutorial is a timely attempt to provide a critical analysis of recent algorithms from a mathematical perspective and analyse these algorithms using a unified mathematical framework.

 

Topics and Format:

This tutorial intends to introduce the fundamentals and latest advances of the state-of-the-art nature-inspired algorithms with the focus on mathematical analysis on new algorithms using a unified theoretical framework of Markov chain theory, random walks, dynamic systems and self-organization theory. Topics include

  • Essence of an evolutionary algorithm
  • Mathematical analysis of algorithms using Markov chains and self-organization
  • Convergence and stability using dynamic systems and Markov chain theory
  • Review of some recent theoretical results concerning evolutionary algorithms
  • Introduction of selected case studies in applications with demo codes in Matlab

Day: TBA

Tutor:
Prof. Xin-She Yang
School of Science and Technology,  Middlesex University, London NW4 4BT,  United Kingdom

http://scholar.google.co.uk/citations?user=fA6aTlAAAAAJ
Email: x.yang@mdx.ac.uk

Brief Biography:

Xin-She Yang is a Reader at School of Science and Technology, Middlesex University (UK) and an Adjunct Professor at Reykjavik University (Iceland). He is also an elected Bye-Fellow at Downing College, Cambridge University. He worked at Cambridge University and then National Physical Laboratory as a Senior Research Scientist after obtaining his DPhil in Applied Mathematics at Oxford University. With more 250 publications and more than 20 books, his research has been cited more than 20,000 times (according to Google Scholar) with an h-index of 57. He is also on the list of Highly Cited Researchers 2016 according to Thomson Reuters’ Web of Science. He is the Chair of IEEE CIS Task Force on Business Intelligence and Knowledge Management. He is the Editor-in-Chief of Int. J. Mathematical Modelling and Numerical Optimisation (IJMMNO), and the Director of International Consortium for Modelling and Optimization in Science and Technology (iCOMSI).

As the development of three nature-inspired algorithms (namely, cuckoo search, firefly algorithm and the bat algorithm), Yang has given many invited keynote talks at international conferences such as BIOMA2012 (Slovenia), Mendel’2012 (Czech Republic), ICCS2015 (Iceland), SIBGRAPI2015 (Brazil) and IEEE OIPE2016 (Italy). He has also given tutorials at international conferences such as FedCSIS2011 (Poland), IAMG2011 (Austria) and ECTA2015 (Portugal) on algorithms and nature-inspired computation.

 

For the list of publications, please see

https://scholar.google.co.uk/citations?user=fA6aTlAAAAAJ

Some recent books:

http://www.sciencedirect.com/science/book/9780124167438

http://www.sciencedirect.com/science/book/9780128045367