Date

Thursday 18 May 2023, 15:00 - 17:00

Location - Hybrid

The event will take place at the premises of The Cyprus Institute and will be hosted at the Fresnel Auditorium.

People will also be able to participate remotely via Zoom and they can join with the following link:

https://zoom.us/j/9947402955?pwd=Um8wdTdHeStFMTM3LzNRL3l3Umc5QT09

Participation and Registration

Register for the event, by the end of Thursday 11 May 2023 at the following link:

https://forms.gle/naz7NvxX8X8wcrkEA

Description

The “EMBED-AI” project is funded by RIF, has 1 year duration and is a collaboration between Hyperion Systems Engineering Ltd. (www.hyperionsystems.net) and The Cyprus Institute.

The goal of this project is to develop machine learning algorithms to predict the quality of material during the polymer production process in real time, in effect creating virtual analysis instruments (soft sensors) to monitor material quality parameters in real-time. The algorithms use time series historical data taken from the plant’s instrumentation as input for training and prediction. These algorithms are integrated with the HYPPOS (Hyperion Predictive Production Online Software) which provides the raw data for the algorithms and manages the predictions while tracking the quality and location of the produced polymer.

This project builds on the results of an earlier collaboration and Proof of Concept implemented via the EuroCC project which The Cyprus Institutes participates in.

Related Publication

You can find a related to the project publication here.

Agenda

Thursday 18 May 2023

 


Introduction to EuroCC and its Services

We will introduce the EuroCC project, and the National Competence Center hosted by The Cyprus Institute. We will outline the goals of the project, and how the NCC reaches out to the community to provide its services for more competitive research and innovation and better public services.

 

Introduction to Hyperion

Hyperion Systems Engineering Ltd. (Hyperion) is a privately held, globally operating technology and engineering group formed in 1993, that has gained world class market reputation and global clientele. Hyperion provides engineering advisory and consulting services, and implements advanced industrial IT solutions in the hydrocarbons, chemicals, and power generation industries. Hyperion help its customers effectively deploy capital resources, reduce operating and supply chain costs, improve safety, and increase their overall profitability, always cognizant of environmental impact. Hyperion delivers systems engineering solutions in areas such as Process Modelling and Optimisation, Basic and Advanced Process Control, Manufacturing Execution Systems, Data Validation and Reconciliation, Laboratory Information Systems, Plant Performance Management and Advanced Supply Chain Planning and Scheduling.

Hyperion collaborated as part of the EuroCC initiative with computational scientists of Computation-based Science and Technology Research Centre (CaSToRC) of The Cyprus Institute (CyI) to develop Machine Learning based algorithms. The specific project objective was to establish correlations between a polymer plant’s operating conditions and laboratory quality test results which would allow the prediction of critical quality measurements.

 

Use of Machine Learning Predictive Technology in Polymer Production

Plastics in the form of polymers are widely used in the modern world, and their usage is increasing. With such high volumes of polymer production, it is critical to minimize production of waste and off-spec polymer both for profit and potential polluting impact.

Typical Polymer production is a combination of Continuous and Batch processes which makes maintaining the quality of the end-product a big challenge. Many plants rely on off-line laboratory measurements for the critical quality parameters; this causes a delay of 30 mins or more getting the test results, making it difficult to address production problems quickly.

To address this limitation, Machine learning (ML) algorithms were built to implement soft sensors for quality prediction using the existing plant instrumentation. This provides production staff with rapid and frequent predictions of the product quality, making it easier to detect quality issues and intervene quickly. The ML algorithms were trained, validated, and tested using historic measurements from a petrochemicals plant using a variety of ML models, comprising Linear and Polynomial Regression (LR, PR), XGBoost (XGB) and Random Forests (RF), as well as Artificial Neural Networks (ANNs). Statistical metrics for each ML model are presented, as well as predicted results in an operating polymers manufacturing facility.

Date

Tuesday 23 - Wednesday 24 May 2023

Location - The Cyprus Institute and via Zoom

This event is part of the EuroCC2 project and the National Competence Center activities, in collaboration with the Greek National Competence Center.

Pre-requisites

Attendees should be familiar with at least one programming language, such as C/C++, Fortran, Python, R.

Requirements

All attendees will need their own desktop or laptop with the following software installed:

  • Web browser - e.g. Firefox or Chrome
  • PDF viewer - e.g. Firefox, Adobe Acrobat
  • ssh client - Terminal for Mac or Linux is fine. For Windows Putty should be fine.

Participation and Registration

This will be a hybrid event, and participants can attend on-site via zoom.

Feedback form

1st day feedback form, please complete at the following link:
https://forms.gle/5WvJZryAqhSEt7CE9

2nd day feedback form, please complete at the following link:
https://forms.gle/9kJmbSDvMGirLTuv9

Git Repository

The Git Repository with all material of the training event - including presentations and code, will soon be made available.

Agenda

Videos of all above sessions and presented material can be found on the following website: https://eurocc.cyi.ac.cy/data-analytics-in-the-era-of-large-scale-machine-learning/

Tuesday 23 May 2023

Wednesday 24 May 2023

 

Videos of all above sessions and presented material can be found on the following website: https://eurocc.cyi.ac.cy/data-analytics-in-the-era-of-large-scale-machine-learning/


Large-scale generative models for language and vision (including LLMs): How they work – and what we still do not know about them

Speakers: Professor Constantine Dovrolis and Dr. Mihalis Nicolaou

Description: This research talk provides a comprehensive overview of large-scale generative models in machine learning, such as generative adversarial nets, transformers, and large language models (LLMs), focusing on key technologies such as ChatGPT, Bert, Generative Advesarial Networks, and Stable Diffusion. We will discuss the mathematical underpinnings of these models, including attention mechanisms, self-attention, and positional encoding. An examination of the deep neural network architectures used, such as the multi-layered transformer architecture, will offer insight into their impact on natural language processing and other fields.

The presentation will also cover the training and fine-tuning processes of these advanced models, highlighting how they enable a wide range of applications across diverse domains. Furthermore, we will address the limitations and open questions surrounding these technologies, including their interpretability, potential biases, energy consumption, and the development of more efficient and robust models. By offering a holistic understanding of the current state of machine learning transformers and large-language models, this talk aims to encourage further research and innovation in the field.

 

PyTorch Neural Networks: Running on CPUs and GPUs

Speaker: Dr. Pantelis Georgiades

Prerequisites: Trainees should be comfortable with the Python programming language.

Description: In this session we will present a simple introduction to neural networks and work through a classification problem using the PyTorch framework in Python using both CPUs and GPUs. PyTorch is a deep learning framework developed by Meta and offers a fast and flexible set of tools to develop and deploy deep learning neural network models on both CPUs and GPUs. The example will be presented in an interactive Jupyter Notebook and the trainees will have the opportunity to become familiar with the work-flow and implementation of a Data Science project using state-of-the-art deep learning libraries.

 

Research Seminar: Tensorization and uncertainty quantification in machine learning.

Speaker: Dr. Yinchong Yang, Siemens AG.

Biography: Yinchong Yang holds a master in statistics and a PhD in computer science from the Ludwig Maximilian University of Munich. As a senior key expert of robust AI at Siemens, he conducts research in the quantification and certification of robustness and uncertainty for industrial grade AI. He’ also interested in tensor decomposition methods in machine learning, such as tensorized neural networks and relational learning from tensor data.

Abstract

Modern deep neural networks, which consist of large weight matrices, are often prone to over-parameterization and can be computationally expensive to train and store. Tensorizing and decomposing these weight matrices has emerged as an effective solution to this problem, since it allows for neural networks to represent large matrices with significantly fewer parameters. This technology has been applied in various neural network architectures and use cases, making it an interesting topic of research. This talk will include a brief introduction on the basic idea, some hands-on tutorials on how to implement such models with very few code, and an overview on related publications.

Deep neural networks have achieved impressive results in a wide range of machine learning tasks, but accurately quantifying the uncertainty of these models remains a significant challenge. Gaussian processes, on the other hand, provide a principled approach to modeling uncertainty. However, they often struggle to scale to large amounts of training data. This talk will first introduce the fundamental concept of the latest research on scalable Gaussian Process models. Second, we would discuss two recent publications that demonstrate how to incorporate scalable Gaussian Processes with representation / deep learning. References to programming frameworks will also be included.

Parallel computing techniques for scaling hyperparameter tuning of Gradient Boosted Trees and Deep Learning

Speaker: Dr. Nikos Bakas

Description: The presentation discusses the hyperparameter tuning in machine learning model development when trained on supercomputers. We will present parallelization techniques using XGBoost and PyTorch on large-scale supercomputers aiming to scale up performance in terms of computing time and accuracy. Computational bottlenecks during hyperparameter tuning, and the impact of multiprocessing on CPU utilization, will be presented, along with a cross-validation algorithm for efficient exploration of the hyperparameter optimization search space. The usage of XGBoost and PyTorch in a multiprocessing setting on powerful CPUs will be demonstrated, as well as insights on handling multiple OpenMP runtimes. Scaling-up results from applying the parallelization techniques on supercomputers will be presented, analyzing the impact of increasing the number of threads on hyperparameter optimization and the resulting reduction in tuning time.

 

Efficient Data Cleaning and Pre-processing Techniques for Robust Machine Learning

Speaker: Dr. Charalambos Chrysostomou

Description: In this session, we will explore various data cleaning and pre-processing techniques that can enhance data quality and improve the performance of machine learning models. The session will cover handling missing values, outlier detection, data transformation, feature scaling, and encoding categorical variables. By applying these techniques, participants will learn how to create robust and high-performing machine-learning models. The examples will be presented using Python and popular data processing libraries such as Pandas and Scikit-learn. Attendees will have the opportunity to become familiar with the workflow and implementation of data cleaning and pre-processing techniques.

 

GPU CUDA Programming

Speaker: Dr. Giannis Koutsou

Prerequisites: Trainees should be comfortable programming using C.

Description: An introduction to the GPU programming model and CUDA in particular will be provided. The hands-on component will begin with a step-by-step tutorial on how to write your first GPU program using CUDA, and continue with examples that demonstrate how data-layout, use of shared memory, and GPU thread distribution affect GPU performance.

EuroCCLogo

The NCC will be organizing a tutorial series of half-day events with content selected by the community through a questionnaire which was recently answered.

Fast mathematics with Numpy in Python

Date and Time: Wednesday 10 May 2023, 14:00 - 17:00

Location - The Cyprus Institute

This event is part of the EuroCC2 project and the National Competence Center activities.

Description

Numpy is a package that offers high-performance tools for data manipulation and mathematics in Python. In this tutorial we will give an overview of some of its features and focus on the importance of using such tools in Python. Python, indeed, is an interpreted language and loops over the data are very slow and not recommended. Numpy on the other hand offers performance which in most cases is comparable to compiled codes.

Slides of Tutorial, Notebook for Tutorial

Watch on Youtube

Date

Tuesday 23 - Wednesday 24 May 2023

Location - The Cyprus Institute

This event is part of the EuroCC2 project and the National Competence Center activities, in collaboration with the Greek National Competence Center.

Pre-requisites

Attendees should be familiar with at least one programming language, such as C/C++, Fortran, Python, R.

Requirements

All attendees will need their own desktop or laptop with the following software installed:

  • Web browser - e.g. Firefox or Chrome
  • PDF viewer - e.g. Firefox, Adobe Acrobat
  • ssh client - Terminal for Mac or Linux is fine. For Windows Putty should be fine.

Participation and Registration

This will be a physical event, and thus participants can only attend on-site.

Register for the event, by the end of Monday 8 May 2023 at the following link:

https://forms.gle/7SNDXDe6Ft1fZahn9

Git Repository

The Git Repository with all material of the training event - including presentations and code, will soon be made available.

Agenda

Tuesday 23 May 2023

Wednesday 24 May 2023


Large-scale generative models for language and vision (including LLMs): How they work – and what we still do not know about them

Speakers: Professor Constantine Dovrolis and Dr. Mihalis Nicolaou

Description: This research talk provides a comprehensive overview of large-scale generative models in machine learning, such as generative adversarial nets, transformers, and large language models (LLMs), focusing on key technologies such as ChatGPT, Berd, Generative Advesarial Networks, and Stable Diffusion. We will discuss the mathematical underpinnings of these models, including attention mechanisms, self-attention, and positional encoding. An examination of the deep neural network architectures used, such as the multi-layered transformer architecture, will offer insight into their impact on natural language processing and other fields.

The presentation will also cover the training and fine-tuning processes of these advanced models, highlighting how they enable a wide range of applications across diverse domains. Furthermore, we will address the limitations and open questions surrounding these technologies, including their interpretability, potential biases, energy consumption, and the development of more efficient and robust models. By offering a holistic understanding of the current state of machine learning transformers and large-language models, this talk aims to encourage further research and innovation in the field.

 

PyTorch Neural Networks: Running on CPUs and GPUs

Speaker: Dr. Pantelis Georgiades

Prerequisites: Trainees should be comfortable with the Python programming language.

Description: In this session we will present a simple introduction to neural networks and work through a classification problem using the PyTorch framework in Python using both CPUs and GPUs. PyTorch is a deep learning framework developed by Meta and offers a fast and flexible set of tools to develop and deploy deep learning neural network models on both CPUs and GPUs. The example will be presented in an interactive Jupyter Notebook and the trainees will have the opportunity to become familiar with the work-flow and implementation of a Data Science project using state-of-the-art deep learning libraries.

 

Streamlined Data Analysis with NBML: Harnessing AI Algorithms for Predictive Modelling

Speaker: Dr. Nikos Bakas

Description: NBML, an ML package to analyze tabular datasets and create predictive models. It is structured in an AutoML setting, and aims to support experts in the field and people from other disciplines who want to analyse their data. By giving only the dataset as input (in a specified format), the software:

  • Creates the descriptive statistics results
  • Trains predictive models with various ML algorithms, tuning and comparison
  • Conducts comprehensive analysis of the residual errors
  • Prepares sensitivity analysis plots
  • Checks the adequacy of the dataset’s size for the prediction

You may run the _nbml_.py notebook for fast analyses, investigate or change the *.py scripts if you want to explore further, and run the same code Locally, on Online Platforms or on a Supercomputing Cluster!

 

Efficient Data Cleaning and Pre-processing Techniques for Robust Machine Learning

Speaker: Dr. Charalambos Chrysostomou

Description: In this session, we will explore various data cleaning and pre-processing techniques that can enhance data quality and improve the performance of machine learning models. The session will cover handling missing values, outlier detection, data transformation, feature scaling, and encoding categorical variables. By applying these techniques, participants will learn how to create robust and high-performing machine-learning models. The examples will be presented using Python and popular data processing libraries such as Pandas and Scikit-learn. Attendees will have the opportunity to become familiar with the workflow and implementation of data cleaning and pre-processing techniques.

 

GPU CUDA Programming

Speaker: Dr. Giannis Koutsou

Prerequisites: Trainees should be comfortable programming using C.

Description: An introduction to the GPU programming model and CUDA in particular will be provided. The hands-on component will begin with a step-by-step tutorial on how to write your first GPU program using CUDA, and continue with examples that demonstrate how data-layout, use of shared memory, and GPU thread distribution affect GPU performance.

EuroCCLogo

Date

Tuesday 23 - Wednesday 24 May 2023

Location - The Cyprus Institute

This event is part of the EuroCC2 project and the National Competence Center activities, in collaboration with the Greek National Competence Center.

Pre-requisites

Attendees should be familiar with at least one programming language, such as C/C++, Fortran, Python, R.

Requirements

All attendees will need their own desktop or laptop with the following software installed:

  • Web browser - e.g. Firefox or Chrome
  • PDF viewer - e.g. Firefox, Adobe Acrobat
  • ssh client - Terminal for Mac or Linux is fine. For Windows Putty should be fine.

Participation and Registration

This will be a physical event, and thus participants can only attend on-site.

Register for the event, by the end of Monday 8 May 2023 at the following link:

https://forms.gle/7SNDXDe6Ft1fZahn9

Git Repository

The Git Repository with all material of the training event - including presentations and code, will soon be made available.

Agenda

Tuesday 23 May 2023

Wednesday 24 May 2023


Large generative machine learning models (including LLMs): how they work – and what we still do not know about them

Large-scale generative models for language and vision (including LLMs): How they work – and what we still do not know about them

Speakers: Professor Constantine Dovrolis and Dr. Mihalis Nicolaou

Description: This research talk provides a comprehensive overview of large-scale generative models in machine learning, such as generative adversarial nets, transformers, and large language models (LLMs), focusing on key technologies such as ChatGPT, Berd, Generative Advesarial Networks, and Stable Diffusion. We will discuss the mathematical underpinnings of these models, including attention mechanisms, self-attention, and positional encoding. An examination of the deep neural network architectures used, such as the multi-layered transformer architecture, will offer insight into their impact on natural language processing and other fields.

The presentation will also cover the training and fine-tuning processes of these advanced models, highlighting how they enable a wide range of applications across diverse domains. Furthermore, we will address the limitations and open questions surrounding these technologies, including their interpretability, potential biases, energy consumption, and the development of more efficient and robust models. By offering a holistic understanding of the current state of machine learning transformers and large-language models, this talk aims to encourage further research and innovation in the field.

 

PyTorch Neural Networks: Running on CPUs and GPUs

Speaker: Dr. Pantelis Georgiades

Prerequisites: Trainees should be comfortable with the Python programming language.

Description: In this session we will present a simple introduction to neural networks and work through a classification problem using the PyTorch framework in Python using both CPUs and GPUs. PyTorch is a deep learning framework developed by Meta and offers a fast and flexible set of tools to develop and deploy deep learning neural network models on both CPUs and GPUs. The example will be presented in an interactive Jupyter Notebook and the trainees will have the opportunity to become familiar with the work-flow and implementation of a Data Science project using state-of-the-art deep learning libraries.

 

Streamlined Data Analysis with NBML: Harnessing AI Algorithms for Predictive Modelling

Speaker: Dr. Nikos Bakas

Description: NBML, an ML package to analyze tabular datasets and create predictive models. It is structured in an AutoML setting, and aims to support experts in the field and people from other disciplines who want to analyse their data. By giving only the dataset as input (in a specified format), the software:

  • Creates the descriptive statistics results
  • Trains predictive models with various ML algorithms, tuning and comparison
  • Conducts comprehensive analysis of the residual errors
  • Prepares sensitivity analysis plots
  • Checks the adequacy of the dataset’s size for the prediction

You may run the _nbml_.py notebook for fast analyses, investigate or change the *.py scripts if you want to explore further, and run the same code Locally, on Online Platforms or on a Supercomputing Cluster!

 

Efficient Data Cleaning and Pre-processing Techniques for Robust Machine Learning

Speaker: Dr. Charalambos Chrysostomou

Description: In this session, we will explore various data cleaning and pre-processing techniques that can enhance data quality and improve the performance of machine learning models. The session will cover handling missing values, outlier detection, data transformation, feature scaling, and encoding categorical variables. By applying these techniques, participants will learn how to create robust and high-performing machine-learning models. The examples will be presented using Python and popular data processing libraries such as Pandas and Scikit-learn. Attendees will have the opportunity to become familiar with the workflow and implementation of data cleaning and pre-processing techniques.

 

GPU CUDA Programming

Speaker: Dr. Giannis Koutsou

Prerequisites: Trainees should be comfortable programming using C.

Description: An introduction to the GPU programming model and CUDA in particular will be provided. The hands-on component will begin with a step-by-step tutorial on how to write your first GPU program using CUDA, and continue with examples that demonstrate how data-layout, use of shared memory, and GPU thread distribution affect GPU performance.

EuroCCLogo