Data Warehouse

[Wiki] In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered as a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analysis.

The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.

Best Project First-place2017:

RACHIDI-ALAOUI Ilham & DIOURI Mahmoud (Download here)

Goals

After completing this course, students will be able to:

  • Describe data warehouse concepts and architecture considerations.
  • Design and implement a data warehouse.
  • Select an appropriate hardware/software platform for a data warehousing project.
  • Implement Data Flow using Talend, Pentaho BA, Microsoft BI Suite …
  • Implement Control Flow using Talend, Pentaho BA.
  • Implement an ETL solution that supports incremental data extraction using Talend/Pentaho Data Integration.
  • Implement an ETL solution that supports incremental data loading using Talend/Pentaho Data Integration.
  • Implement data cleansing and enforce data integrity.
  • Describe how IBM Cognos solutions can consume data from the data warehouse.

Outline

  • Chapitre 1: Introduction to decision Information systems
  • Chapitre 2: Data warehousing
  • Chapitre 3: Multidimensional analysis
  • Chapitre 4: Reporting (Optional training academy for IBM certification)
  • Chapitre 5: Cas study

Prerequisites and related courses

Basics about Relational Data Bases and a certain taste for SQL programming are required for this course.

Having attended the courses on DB Administration and on Web development can be useful and reveal interesting.

Language and material

The classes will be given in French by default. Slides will be in French/ English and available in PDF.

Bibliography

  • « Entrepôts de données, Guide paratique de modélisation dimentionnelle » Ralph Kimballe et Margy Ross, ed Vuibert, 2eme édition 2005.
  • « Le Datawarehouse, Guide de conduite de projets » Ralph Kimballe, Laura Reeves, Margy Ross, ed Eurolles 2005.
  • « Business intelligence avec SQL Server 2008 » Mise en oeuvre d’un projet décisionnel, Auteur : Bertrand Burquier, Editeur : Dunod, Septembre 2009
  • « Pentaho Solutions Business Intelligence and Data Warehousing with Pentaho and MySQL » Roland Bouman Jos van Dongen, Wiley Publishing, Inc, Sep 2009

Tentative Schedule

DATE (DD/MM/YY) CONTENT material
Week 1 Introduction to Decision Support Systems + Introduction to ETL Labs Chapter 1 + Lab1 + Data for  Lab
Week 2 Data warehousing + Exercices series n° 1 + Advanced ETL Labs Chapter 2 + Exercices n°1 + Lab2
Week 3 Design and implement a data warehouse + Exercices series n° 1 + GIMSI  Chapter 2 + Exercices n°1 + Lab2
Week 4  Multidimensional Analysis + Exercices series n° 3 Chapter 3 +  Exercices n°2 + Lab3 + Dump

Lab 4

Week 5 IBM Cognos Business Analytics (training academy for IBM certification) Introduction +  Cognos BA Fundamentals  1.1
Week 6 IBM Cognos Business Analytics (training academy for IBM certification)  Lab4Cognos BA Fundamentals  1.2
Week 7 Cas study Case study Pentaho BA + Script Dump 

New Project Download here:

Project requirements:

• Team of 2 students at most;
• The work must be provided as a report and CD containing the data and Oracle and talend scripts before: Sunday, May 22, 2017;
• The date of the individual oral evaluation will be communicated later.

Old Project Download here