Skip to main content

Post-doctorante ou Post-doctorant en contrôle agentique pour la gestion de clusters - CDD de 12 mois

**Who we are ?**

Télécom Paris, part of the IMT (Institut Mines-Télécom) and a founding member of the Institut Polytechnique de Paris, is one of France's top 5 general engineering schools.

The**mainspring** of Télécom Paris is to train, imagine and undertake to design digital models, technologies and solutions for a society and economy that respect people and their environment.

We are looking for a postdoctoral researcher specialising in agent-based control for cluster management to join the INFRES department at Telecom Paris.

Kubernetes has become a core platform for deploying and managing cloud-native systems and is increasingly used to host production AI workloads. Despite its maturity as an orchestration platform with built-in automation, day-to-day Kubernetes operations still often require significant human involvement. Cluster operators must inspect cluster state, interpret metrics, logs, traces, and events, diagnose failures, select corrective actions, execute commands or API operations, and verify that the system has returned to a healthy state. Recent LLM-based Kubernetes tools and research prototypes demonstrate the potential of language models to support these tasks through natural language interaction, command-line and API interaction, and cluster-aware reasoning, pointing towards more autonomous Kubernetes and Site Reliability Engineering (SRE) operations. The degree of autonomy varies across existing solutions from interactive human-in-the-loop assistance to more autonomous execution.

At the same time, the growing use of Kubernetes in edge computing environments makes autonomous cluster management an increasingly important research problem. While most existing studies focus on cloud environments or general Kubernetes management, edge deployments may involve multiple independently managed Kubernetes clusters operating under very different conditions. These clusters may be deployed at heterogeneous, resource-constrained, or physically hard-to-reach sites, including remote deployments for applications such as environmental monitoring. They may also face changing resource availability, unstable network conditions, and limited connectivity. In such environments, failures are harder and costlier to address through manual intervention, which increases the importance of zero-touch management and autonomous recovery at the level of each individual cluster. These constraints also make locally deployable open-weight models a practical option for supporting on-site reasoning, control, and recovery. Their utility can be further strengthened by retrieval-augmented generation, which allows decisions to be grounded in relevant local documents and operational data without continuous reliance on remote third-party services.

This postdoctoral project will investigate closed-loop agentic control for autonomous Kubernetes management in resource-constrained edge environments. The project will study how AI agents can observe the state of a Kubernetes cluster, interpret heterogeneous operational signals, reason over possible causes and corrective actions under safety constraints, execute selected recovery steps, and verify whether the cluster has returned to a healthy state. The research will particularly examine how locally deployable open-weight models, supported by retrieval-augmented generation over local documentation and operational data, can provide practical autonomy under limited connectivity and infrastructure constraints. The designed solution will be evaluated using either an existing evaluation framework, such as AIOpsLab, or through a dedicated Kubernetes operational benchmark developed within the project. This evaluation is planned to use realistic Kubernetes failure diagnosis and recovery scenarios, administration tasks inspired by the Certified Kubernetes Administrator (CKA) exam, and repeated experiments to assess reliability under resource-constrained edge conditions.

The current postdoctoral position will be conducted within the Computer Sciences and Networks Department (INFRES), in the Networks, Mobility and Services (RMS) team, which is affiliated with the LTCI research laboratory. The INFRES department addresses some of the scientific challenges arising from widespread digitization on the basis of its expertise in areas such as: Architecture, design and verification of software systems and communication networks, data science, the interaction between man and machine, security, mobility, and the control of energy consumption. The research activities of the RMS research team focus on very large networks and operated systems. In particular, we design the mobile networks and communications of tomorrow, the future Internet, the Internet of things or the evolutions of the cloud and of virtualization. Our methodologies go from experimentation to theory: We experiment on testbeds, develop metrology tools, design architectures and protocols, develop algorithms and analytical methods for evaluating and optimizing networks.

**Your main responsabilities** :

* To carry out research missions in the field of autonomous cluster management for resource-constrained edge environments.

* To contribute to the reputation of the School, the Institut Mines-Télécom and the Institut Polytechnique de Paris

D'autres ont aussi consulté

Post-doctorante ou Post-doctorant en contrôle agentique pour la gestion de clusters - CDD de 12 mois

Entreprise:
Institut Mines-Télécom
Ville:
Palaiseau
Type de contrat: 
Temps plein, CDI
Catégories: 
Automatisation, Automation / Robotique, Systeme, Production, Ingénieur Sécurité, Ingénieur Qualité, Laboratoire, Ingénieur DevOps, Community Manager
Publiée:
18.06.2026
Partagez maintenant: