Data Mining

[Wiki] Data mining is an interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Goals

After completing this course, students will be able to:

  • Describe data mining concepts and considerations.
  • Select an appropriate method for a data extraction process.
  • perform the CRISP process on a data mining project.
  • Design and customize data mining algorithms using R library.
  • Learning  and discovering interesting relations between variables in large databases using Association rules.
  • Create a decision support tool based on machine learning and Decision Tree techniques.
  • Create scoring models optimizing direct marketing.
  • Perform hierarchical clustering (also called hierarchical cluster analysis or HCA) to describe large databases.

Outline

Course track Labs track (Pr. Nabila ZRIRA)
    • Chapter 1: Introduction to data mining
    • Chapter 2: Association rules
    • Chapter 3: Decision Tree
    • Chapter 4: Hierarchical Cluster Analysis
    • Chapter 5:  Data visualisation
    • Chapter 6  : Boosting and Bagging
  • Lab 1: Introduction to R langage (1/2)
  • Lab 2: Introduction to R langage (2/2)
  • lab 3 : Association rules
  • Lab 4 : Decision Tree
  • Lab 5: Data visualisation
  • Lab 6: Hierarchical Cluster Analysis
  • Lab 7  : Boosting and Bagging

 

Prerequisites and related courses

Basics about probability theory and a certain taste for Statistics, Data analysis techniques and data bases are required for this course.

Language and material

The classes will be given in French by default. Slides will be in French/ English and available in PDF.

Bibliography

  • coming soon

Tentative Schedule

DATE (DD/MM/YY) TIME CONTENT MATERIAL
21/04/17 08:00 1M to 12:00 AM
  • Introduction to data mining
  • Lab 1: Introduction to R langage (1/2)
28/04/17 08:00 1M to 12:00 AM
  • Association rules + Exercices series n° 1
  • Lab 2: Introduction to R langage (2/2)
05/05/17 08:00 1M to 12:00 AM
  • Decision Tree
  • Lab 3: Association rules
12/05/17 08:00 1M to 12:00 AM
  • Exercices series n° 2
  • Lab 4: Decision tree
19/05/17 08:00 1M to 12:00 AM
  • Hierarchical Cluster Analysis
  • Lab 5: Hierarchical Cluster Analysis
  • Lab 6:  Data visualisation

R labs (Download here)

Last exam (Download here)

Case study R (Download here)

Project (Download here)

Project requirements:

• Team of 2 students at most;
• The work must be provided as a report and CD containing the data and R scripts before: Thursday, December 15, 2016
• The date of the individual oral examination will be communicated later.