Soha Kanso et William Jussiau

Séminaire jeunes chercheurs Diapro – 21 novembre 2024 – Saint-Jérôme

William Jussiau, jeune docteur de l’ONERA

Titre : Lois de commande pour le contrôle des écoulements oscillateurs

Résumé : Cette thèse porte sur la synthèse de lois de commande pour les écoulements oscillateurs à faible nombre de Reynolds. Nous y étudions deux configurations canoniques en 2D : l’écoulement autour d’un cylindre, et l’écoulement au-dessus d’une cavité ouverte. Ces deux cas d’étude présentent un équilibre stationnaire instable, et un régime d’oscillations auto-entretenues – respectivement, un cycle limite et un attracteur torique. L’objectif principal est la synthèse de lois de commande pour supprimer complètement le régime oscillatoire, pour réduire la traînée moyenne, les vibrations structurelles ou le rayonnement acoustique tonal. En pratique, la synthèse de lois de commande pour ces systèmes est rendue difficile par la diversité des phénomènes dynamiques émergeant des équations de Navier-Stokes, non-linéaires et de dimension infinie. Nous proposons trois méthodes distinctes pour réaliser cette tâche, utilisant respectivement une paramétrisation des correcteurs stabilisants, la continuation numérique et le formalisme de la résolvante moyenne.

Soha KANSO, doctorante, CRAN, Université de Lorraine, Nancy. ATER Polytech Nancy

Titre : Safe Reinforcement Learning and Degradation Tolerant Control Design

Résumé : Safety-critical dynamical systems are essential in various industries, such as aerospace domain, autonomous systems, robots in healthcare area etc., where violating safety constraints and structural or functional failure may lead to catastrophic consequences. A significant challenge in these systems is the degradation of components and actuators, which can compromise safety and stability of systems. As such, incorporating system’s health state within the control design framework is essential to ensure tolerance to functional degradation. Moreover, such system models often involve uncertainties and incomplete knowledge, especially as components degrade, altering system dynamics in a nonlinear manner. This underscores the necessity for the development of learning approaches that incorporate the available data within the control learning paradigm.
In this context, Reinforcement Learning (RL) emerges as a powerful approach, capable of learning optimal control laws for partially or fully unknown dynamic systems, in the presence of input-output data (without the exact knowledge of system models). However, a major challenge in applying RL methods to safety-critical systems lies in ensuring safety during both the exploration and exploitation phases. Exploration involves introducing probing noise to the policy in order to collect informative data across the state space, while exploitation refers to applying the learned policy to optimize performance in real operations.

To this end, this presentation will first explore an off-policy safe RL approach for the regulation and the tracking problem in continuous-time nonlinear systems affine in control input. A novel approach will be presented that ensures system stability and safety during all phases: initialization, exploration, and exploitation. By using quadratic programming with control Lyapunov function (CLF) and control barrier function (CBF), the proposed approach ensures stability and safety of the system during initialization and exploration phases. Furthermore, during exploitation, the safety of the learned policy is ensured by augmenting the cost function with reciprocal CBFs, thus balancing performance optimization and safety. The second part of the talk will focus on addressing actuator degradation, which poses a critical threat to system performance and stability. A degradation-tolerant controller based on RL is introduced for continuous-time nonlinear systems affine in control input. The objectives are twofold: ensuring system stability despite degradation, and decelerating the degradation rate to complete missions and extend actuator life. This is achieved by imposing constraints on degradation rates using CBFs. Furthermore, a cyclic off-policy algorithm is presented, enabling iterative exploration and exploitation across multiple learning cycles. This allows for continuous updates of neural network weights with recent information on degradation levels, ensuring that the learned policy effectively stabilizes the system while accounting for degradation effects.

In the developed approaches, neural networks are used to approximate both the value function and the control policy, thereby enabling efficient learning. Simulation results will be presented to demonstrate the efficiency of the proposed approaches.