Introduction to High Performance Data Analytics

Date

Tuesday 23 - Wednesday 24 May 2023

Location - The Cyprus Institute

This event is part of the EuroCC2 project and the National Competence Center activities, in collaboration with the Greek National Competence Center.

Pre-requisites

Attendees should be familiar with at least one programming language, such as C/C++, Fortran, Python, R.

Requirements

All attendees will need their own desktop or laptop with the following software installed:

Web browser - e.g. Firefox or Chrome
PDF viewer - e.g. Firefox, Adobe Acrobat
ssh client - Terminal for Mac or Linux is fine. For Windows Putty should be fine.

Participation and Registration

This will be a physical event, and thus participants can only attend on-site.

https://forms.gle/7SNDXDe6Ft1fZahn9

Git Repository

The Git Repository with all material of the training event - including presentations and code, will soon be made available.

Agenda

Tuesday 23 May 2023

09:45 - 10:00: Introduction
10:00 - 11:15: Large-scale generative models for language and vision (including LLMs): How they work – and what we still do not know about them. Speakers: Professor Constantine Dovrolis and Dr. Mihalis Nicolaou.
11:15 - 11:30: Break
11:30 - 13:00: PyTorch Neural Networks: Running on CPUs and GPUs. Speaker: Dr. Pantelis Georgiades.
13:00 - 13:30: Lunch Break
13:30 - 14:30: Research Seminar: “Tensorization and uncertainty quantification in machine learning”. Speaker: Dr. Yinchong Yang, Siemens AG.
14:30 - 15:00: Break
15:00 - 16:30: Streamlined Data Analysis with NBML: Harnessing AI Algorithms for Predictive Modelling. Speaker: Dr. Nikos Bakas.

Wednesday 24 May 2023

10:30 - 12:00: Efficient Data Cleaning and Pre-processing Techniques for Robust Machine Learning. Speaker: Dr. Charalambos Chrysostomou.
12:30 - 13:15: Lunch Break
13:15 - 14:45: GPU CUDA Programming - Session 1. Speaker: Dr. Giannis Koutsou.
14:45 - 15:00: Break
15:00 - 16:30: GPU CUDA Programming - Session 2. Speaker: Dr. Giannis Koutsou.

Large-scale generative models for language and vision (including LLMs): How they work – and what we still do not know about them

Speakers: Professor Constantine Dovrolis and Dr. Mihalis Nicolaou

Description: This research talk provides a comprehensive overview of large-scale generative models in machine learning, such as generative adversarial nets, transformers, and large language models (LLMs), focusing on key technologies such as ChatGPT, Berd, Generative Advesarial Networks, and Stable Diffusion. We will discuss the mathematical underpinnings of these models, including attention mechanisms, self-attention, and positional encoding. An examination of the deep neural network architectures used, such as the multi-layered transformer architecture, will offer insight into their impact on natural language processing and other fields.

The presentation will also cover the training and fine-tuning processes of these advanced models, highlighting how they enable a wide range of applications across diverse domains. Furthermore, we will address the limitations and open questions surrounding these technologies, including their interpretability, potential biases, energy consumption, and the development of more efficient and robust models. By offering a holistic understanding of the current state of machine learning transformers and large-language models, this talk aims to encourage further research and innovation in the field.

PyTorch Neural Networks: Running on CPUs and GPUs

Speaker: Dr. Pantelis Georgiades

Prerequisites: Trainees should be comfortable with the Python programming language.

Description: In this session we will present a simple introduction to neural networks and work through a classification problem using the PyTorch framework in Python using both CPUs and GPUs. PyTorch is a deep learning framework developed by Meta and offers a fast and flexible set of tools to develop and deploy deep learning neural network models on both CPUs and GPUs. The example will be presented in an interactive Jupyter Notebook and the trainees will have the opportunity to become familiar with the work-flow and implementation of a Data Science project using state-of-the-art deep learning libraries.

Streamlined Data Analysis with NBML: Harnessing AI Algorithms for Predictive Modelling

Speaker: Dr. Nikos Bakas

Description: NBML, an ML package to analyze tabular datasets and create predictive models. It is structured in an AutoML setting, and aims to support experts in the field and people from other disciplines who want to analyse their data. By giving only the dataset as input (in a specified format), the software:

Creates the descriptive statistics results
Trains predictive models with various ML algorithms, tuning and comparison
Conducts comprehensive analysis of the residual errors
Prepares sensitivity analysis plots
Checks the adequacy of the dataset’s size for the prediction

You may run the _nbml_.py notebook for fast analyses, investigate or change the *.py scripts if you want to explore further, and run the same code Locally, on Online Platforms or on a Supercomputing Cluster!

Efficient Data Cleaning and Pre-processing Techniques for Robust Machine Learning

Speaker: Dr. Charalambos Chrysostomou

Description: In this session, we will explore various data cleaning and pre-processing techniques that can enhance data quality and improve the performance of machine learning models. The session will cover handling missing values, outlier detection, data transformation, feature scaling, and encoding categorical variables. By applying these techniques, participants will learn how to create robust and high-performing machine-learning models. The examples will be presented using Python and popular data processing libraries such as Pandas and Scikit-learn. Attendees will have the opportunity to become familiar with the workflow and implementation of data cleaning and pre-processing techniques.

GPU CUDA Programming

Speaker: Dr. Giannis Koutsou

Prerequisites: Trainees should be comfortable programming using C.

Description: An introduction to the GPU programming model and CUDA in particular will be provided. The hands-on component will begin with a step-by-step tutorial on how to write your first GPU program using CUDA, and continue with examples that demonstrate how data-layout, use of shared memory, and GPU thread distribution affect GPU performance.

> Computation-based Science and Technology Research Center CaSToRC EuroCC National HPC Competence Center

Training	User Support	Access Resources	Vacancies	News & Events
Subscribe to the CaSToRC Mailing list

Computation-based Science and Technology Research Center

CaSToRC

EuroCC National HPC Competence Center

Training

User Support

Access Resources

Vacancies

News & Events

Subscribe to the CaSToRC Mailing list