03/24/2026 | Press release | Distributed by Public on 03/24/2026 08:13
There's a pattern in how complex technology matures. Early on, teams make their own choices: different tools, different abstractions, different ways of reasoning about failure. It looks like flexibility but at scale it reveals itself as fragmentation.
The fix is never just more capability; it's shared operational philosophy. Kubernetes proved this. It didn't just answer "how do we run containers?" It answered "how do we change running systems safely?" The community built those patterns, hardened them, and made them the baseline.
AI infrastructure is still in the chaotic phase. The shift from "working versus broken" to "good answers versus bad answers" is a fundamentally different operational problem, and it won't get solved with more tooling. It gets solved the way cloud-native did: open source creating the shared interfaces and community pressure that replace individual judgment with documented, reproducible practice.
That's what we're building toward. Since my last update at KubeCon + CloudNativeCon North America 2025, our teams have continued investing across open-source AI infrastructure, multi-cluster operations, networking, observability, storage, and cluster lifecycle. At KubeCon + CloudNativeCon Europe 2026 in Amsterdam, we're sharing several announcements that reflect that same goal: bring the operational maturity of Kubernetes to the workloads and demands of today.
Building the open source foundation for AI on Kubernetes
The convergence of AI and Kubernetes infrastructure means that gaps in AI infrastructure and gaps in Kubernetes infrastructure are increasingly the same gaps. A significant part of our upstream work this cycle has been building the primitives that make GPU-backed workloads first-class citizens in the cloud-native ecosystem.
On the scheduling side, Microsoft has been collaborating with industry partners to advance open standards for hardware resource management. Key milestones include:
Beyond scheduling, we've continued investing in the tooling needed to deploy, operate, and secure AI workloads on Kubernetes:
What's new in Azure Kubernetes Service
In addition to our upstream contributions, I'm happy to share new capabilities in Azure Kubernetes Service (AKS) across networking and security, observability, multi-cluster operations, storage, and cluster lifecycle management.
From IP-based controls to identity-aware networking
As Kubernetes deployments grow more distributed, IP-based networking becomes harder to reason about: visibility degrades, security policies grow difficult to audit, and encrypting workload communication has historically required either a full-service mesh or a significant amount of custom work. Our networking updates this cycle close that gap by moving security and traffic intelligence to the application layer, where it's both more meaningful and easier to operate.
Azure Kubernetes Application Network gives teams mutual TLS, application-aware authorization, and detailed traffic telemetry across ingress and in-cluster communication, with built-in multi-region connectivity. The result is identity-aware security and real traffic insight without the overhead of running a full-service mesh. For teams managing the deprecation of ingress-nginx, Application Routing with Meshless Istio provides a standards-based path forward: Kubernetes Gateway API support without sidecars, continued support for existing ingress-nginx configurations, and contributions to ingress2gateway for teams moving incrementally.
At the data plane level, WireGuard encryption with the Cilium data plane secures node-to-node traffic efficiently and without application changes. Cilium mTLS in Advanced Container Networking Services extends that to pod-to-pod communication using X.509 certificates and SPIRE for identity management: authenticated, encrypted workload traffic without sidecars. Rounding this out, Pod CIDR expansion removes a long-standing operational constraint by allowing clusters to grow their pod IP ranges in place rather than requiring a rebuild, and administrators can now disable HTTP proxy variables for nodes and pods without touching control plane configuration.
Visibility that matches the complexity of modern clusters
Operating Kubernetes at scale is only manageable with clear, consistent visibility into infrastructure, networking, and workloads. Two persistent gaps we've been closing are GPU telemetry and network traffic observability, both of which become more critical as AI workloads move into production.
Teams running GPU workloads have often had a significant monitoring blind spot: GPU utilization simply wasn't visible alongside standard Kubernetes metrics without manual exporter configuration. AKS now surfaces GPU performance and utilization directly into managed Prometheus and Grafana, putting GPU telemetry into the same stack teams are already using for capacity planning and alerting. On the network side, per-flow L3/L4 and supported L7 visibility across HTTP, gRPC, and Kafka traffic is now available, including IPs, ports, workloads, flow direction, and policy decisions, with a new Azure Monitor experience that brings built-in dashboards and one-click onboarding. For teams dealing with the inverse problem (metric volume rather than metric gaps) operators can now dynamically control which container-level metrics are collected using Kubernetes custom resources, keeping dashboards focused on actionable signals. Agentic container networking adds a web-based interface that translates natural-language queries into read-only diagnostics using live telemetry, shortening the path from "something's wrong" to "here's what to do about it."
Simpler operations across clusters and workloads
For organizations running workloads across multiple clusters, cross-cluster networking has historically meant custom plumbing, inconsistent service discovery, and limited visibility across cluster boundaries. Azure Kubernetes Fleet Manager now addresses this with cross-cluster networking through a managed Cilium cluster mesh, providing unified connectivity across AKS clusters, a global service registry for cross-cluster service discovery, and intelligent routing with configuration managed centrally rather than repeated per cluster.
On the storage side, clusters can now consume storage from a shared Elastic SAN pool rather than provisioning and managing individual disks per workload. This simplifies capacity planning for stateful workloads with variable demands and reduces provisioning overhead at scale.
For teams that need a more accessible entry point to Kubernetes itself, AKS desktop is now generally available. It brings a full AKS experience to your desktop, making it straightforward for developers to run, test, and iterate on Kubernetes workloads locally with the same configuration they'll use in production.
Safer upgrades and faster recovery
The cost of a bad upgrade compounds quickly in production, and recovery from one has historically been time-consuming and stressful. Several updates this cycle focus specifically on making cluster changes safer, more observable, and more reversible.
Blue-green agent pool upgrades create a parallel pool with the new configuration rather than applying changes in place, so teams can validate behavior before shifting traffic and maintain a clear rollback path if something looks wrong. Agent pool rollback complements this by allowing teams to revert a node pool to its previous Kubernetes version and node image when problems surface after an upgrade (without a full rebuild). Together, these give operators meaningful control over the upgrade lifecycle rather than a choice between "upgrade and hope" or "stay behind." For faster provisioning during scale-out events, prepared image specification lets teams define custom node images with preloaded containers, operating system settings, and initialization scripts, reducing startup time and improving consistency for environments that need rapid, repeatable provisioning.
Connect with the Microsoft Azure team in Amsterdam
The Azure team are excited to be at KubeCon + CloudNativeCon Europe 2026. A few highlights of where to connect with the Azure team on the ground:
Happy KubeCon + CloudNativeCon!
Brendan Burns
Corporate Vice President and Technical Fellow, Azure OSS and Cloud Native, Microsoft
Brendan Burns is a co-founder of the Kubernetes open source project and corporate vice president for Azure cloud-native open source and the Azure management system including Azure Arc. He is also the author and co-author of several books on Kubernetes and distributed systems. Prior to Microsoft he worked on Google web search infrastructure and the Google cloud platform. He has a PhD in Robotics from the University of Massachusetts Amherst and a BA in Computer Science and Studio Art from Williams College.
See more articles from this author