The Future Is Modular: What 10+ Years of Running Kubernetes Taught Platform Teams (and Why Bundles Are a Trap)

Platform engineering has a funny habit: it starts as a noble effort to “make developers faster,” and within a year it’s also responsible for compliance, cost controls, uptime, multi-cloud portability, and answering the eternal question: “Why is this cluster bill shaped like a hockey stick?”

That’s why the latest argument from Giant Swarm — that the future of Kubernetes-based platforms is modular — lands at exactly the right moment. In a February 13, 2026 post titled “The future is modular: what a decade of running Kubernetes taught us about platforms”, Giant Swarm’s Oliver Thylmann makes the case that bundled “all-in-one” platforms increasingly act as constraints rather than accelerators, especially for organizations that have already built opinions, skills, and investments across observability, CI/CD, and security.

This article expands on that thesis, adds industry context from CNCF and FinOps research, and (politely) interrogates the trade-offs. Because modularity can be liberation — or it can be the start of a new kind of chaos if you don’t design it with discipline.

Original source note: This piece is an independent analysis and expansion based on Giant Swarm’s RSS item and original blog post by Oliver Thylmann. You should read the original for the cleanest statement of their position: Giant Swarm blog.

Why “platform” means something different in 2026 than it did in 2016

Ten years ago, “we run Kubernetes” was often shorthand for “we have bravely adopted a complicated open-source control plane and now require a small priesthood to operate it.” Today, Kubernetes is still complicated, but its role has shifted from novelty to default substrate.

Even CNCF’s public reporting over the last couple of years reflects this maturity. Kubernetes adoption is widespread, and yet fewer developers interact with it directly — a sign that abstraction layers (internal platforms, portals, managed services) are becoming the primary interface to infrastructure. Giant Swarm cites a CNCF/SlashData data point that only about 30% of backend developers say they use Kubernetes directly, down from an earlier peak. The implications aren’t subtle: the platform layer — not the cluster — is where developer experience is won or lost. citeturn1view0turn3search0turn3search3

Meanwhile, the definition of “platform requirements” keeps expanding:

Security: policy enforcement, image provenance, runtime detection, auditability.
Observability: metrics/logs/traces, SLOs, multi-tenant access, incident workflows.
Cost governance: rightsizing, autoscaling, chargeback/showback, waste reduction.
Multi-cloud/hybrid: because regulatory and latency realities don’t care about your cloud strategy slide deck.
AI workloads: GPUs, model serving, telemetry across inference, and new cost units like tokens.

In that environment, the “one platform to rule them all” pitch starts to wobble — especially for mature orgs that already have functioning components and don’t want to rip-and-replace everything to adopt a vendor’s preferred bundle.

The “bundle trap”: paying for a platform, using a slice, and bolting on the rest

Thylmann names a dynamic many platform teams recognize instantly: the bundle trap. You buy (or build) a comprehensive platform, but in practice:

You only adopt the components that match your existing constraints and skills.
You run parallel tools for the parts that don’t fit.
You still carry integration and operational complexity — sometimes more, because now you have “the platform way” and “the reality way.”

The result is a weird outcome where a platform that was meant to reduce cognitive load becomes an additional abstraction layer with its own friction.

This is also where cost becomes political. Platform licensing isn’t the same as cloud waste, but it rhymes. The FinOps community has been loudly focused on waste reduction for years, and more recent FinOps reporting shows that waste and optimization remain top concerns. Giant Swarm points to FinOps research and the broader waste-reduction priority trend as context for why paying for unused capabilities (shelf-ware) is increasingly intolerable. citeturn1view0turn2search5turn2search9

Why bundles worked (and sometimes still do)

To be fair to bundled platforms: they were a rational response to the CNCF “landscape problem.” When the ecosystem was younger and less stable, buying a curated package could be a shortcut through evaluation paralysis. In regulated industries, a bundle can also simplify procurement and compliance by reducing the number of vendors and integration points.

And for smaller platform teams, the bundle can be a survival mechanism: fewer moving parts, fewer decisions, fewer integration chores.

But the trade-off is lockstep adoption. Bundles work best when your organization is willing to adopt the vendor’s worldview end-to-end. As soon as you have strong existing investments, bundles can become the wrong kind of “standardization.”

What modularity actually means (and what it should not mean)

“Modular platform” can mean several different things, ranging from “carefully designed capabilities that compose cleanly” to “an app store full of YAML, good luck.”

In the Giant Swarm post, the core argument is that modularity should let teams start with what they need, keep what already works, and add capabilities when they have a real reason. The modular model is also presented as a way to make costs more visible — you should know what you’re paying for and why. citeturn1view0

But Thylmann also acknowledges the hard part: modularity has trade-offs. You must make more decisions up front. You must think about integration. And you can absolutely create a patchwork platform nobody fully understands. citeturn1view0

The difference between “modular” and “fragmented”

Modularity is only a win if the modules are designed around a few principles:

Clear interfaces: each capability has well-defined APIs and lifecycle hooks.
Compatibility policy: versioning rules and tested combinations are documented.
Opinionated defaults: enough standardization that teams don’t reinvent basics.
Composable governance: policies can apply fleet-wide without breaking team autonomy.

Without those, “modular” becomes “an expensive scavenger hunt through the CNCF landscape, with incident response as your integration test suite.”

Why Kubernetes pushed platforms toward modularity in the first place

Kubernetes is simultaneously a unifier and an amplifier. It unifies because the Kubernetes API (and related ecosystems like Cluster API) offers a common control model. But it amplifies because once you standardize on Kubernetes, you now have an explosion of optional add-ons: networking, policy engines, identity, secrets, ingress, service mesh, telemetry, autoscaling, node provisioning, GitOps controllers, and so on.

At some point, every platform team ends up building a platform out of platforms.

Giant Swarm’s own documentation describes a layered architecture and a Kubernetes-centric “platform API” approach, using a management cluster as the central place to orchestrate workload clusters and platform capabilities. The platform is built on top of Cluster API for lifecycle management and uses Flux for GitOps-style reconciliation. citeturn0search0turn0search4

This is important because it hints at a practical path to modularity: treat each capability (observability, security tooling, app management, developer portal) as something that can be installed, configured, and evolved declaratively — but under a consistent control plane and workflow.

Modular platforms and the rise of the internal developer portal (Backstage is the usual suspect)

One reason modularity is becoming more viable is that we finally have a credible “front door” for platforms: the developer portal. Rather than forcing developers to learn every underlying subsystem, the portal can present curated workflows, templates, documentation, and service ownership metadata.

Backstage is the most prominent open-source example. CNCF describes Backstage as an open framework for building developer portals, and notes its CNCF trajectory (accepted in 2020, incubating since 2022). citeturn5search0turn5search1

From a modularity perspective, portals matter because they decouple experience composition from infrastructure composition. Your underlying modules can change (new policy engine, different telemetry backend, new cluster provisioning mechanism) without constantly breaking the developer-facing workflow — as long as the portal’s contracts remain stable.

Practical example: a “create service” flow that doesn’t care what you run underneath

Consider a typical golden path:

Create a new backend service (repository, CI pipeline, base manifests, ownership metadata).
Deploy to dev and staging via GitOps.
Enforce security baselines (Pod Security Standards, network policies, image scanning).
Expose dashboards (latency, errors, resource usage, cost allocation tags).

In a bundled platform, those steps often assume specific tool choices. In a modular platform, the portal can provide the workflow while allowing the platform team to swap components — within a compatibility framework — as long as the workflow remains consistent.

GitOps as the glue for modularity (and why Flux keeps showing up)

If you want modules that can be independently installed and upgraded without turning your platform into a pet project, you need repeatable change management. GitOps is the obvious candidate because it gives you:

A single source of truth (Git).
Auditability (pull requests, code review, history).
Reconciliation (drift detection and correction).
Repeatability across fleets and environments.

Giant Swarm’s documentation and tutorials emphasize Flux-based GitOps for managing clusters and applications, including recommended repository structures that map to management clusters, organizations, and workload clusters. citeturn0search3

Flux itself has been evolving toward better operational tooling, including efforts to connect AI assistants to GitOps context through a dedicated MCP server. Whether you want AI in the loop or not, the underlying point is that GitOps ecosystems are increasingly building richer “operator experience” tooling. citeturn5search4

A warning: GitOps doesn’t magically eliminate integration work

GitOps makes changes reproducible. It does not make them automatically safe. A modular platform still needs:

Staging environments that mirror production enough to matter.
Compatibility matrices and upgrade playbooks.
Policy testing (e.g., “will this Kyverno change block deploys?”).
Rollback strategy and “blast radius” thinking.

Modularity shifts the challenge from “how do we install all of this?” to “how do we continuously evolve this without breaking teams every Tuesday?” Which is progress — but it’s still work.

Cluster fleets make bundles painful: one size doesn’t fit dozens (or hundreds) of clusters

As fleets grow, the cost of rigid bundling rises. Giant Swarm’s platform messaging references operating at scale across many production clusters, and their documentation talks about fleet management practices aligned with organizational structure. citeturn0search4turn0search0

Fleet reality introduces a few modularity drivers:

Different risk profiles: prod vs. dev vs. edge clusters won’t share the same requirements.
Different regulatory constraints: data residency and access control can vary per business unit or region.
Different performance economics: AI/GPU clusters vs. general compute clusters behave like different planets.
Different lifecycle needs: some clusters are long-lived, others ephemeral for testing.

Bundles generally assume uniformity. Fleets punish uniformity.

Security as a modular capability: installable stacks, consistent baselines

Security is where modular platforms get tested hardest, because the cost of inconsistency is not “a developer got annoyed,” it’s “you shipped a compliance violation.”

Giant Swarm’s security documentation describes a secure-by-default posture and a set of integrated open-source tools and approaches: policy enforcement, RBAC, network policies, Pod Security Standards, and optional components such as Trivy/Trivy Operator, Kyverno, Falco, and Harbor. They also explicitly frame this as a stack with independently installable components to match different security requirements. citeturn5search2turn5search8

This is a good example of modularity done responsibly: you can tailor the stack, but you don’t reinvent the fundamentals every time. There’s a baseline, and then there are selectable layers.

Modular security still needs centralized policy ownership

One lesson platform teams learn the hard way: if nobody owns global policy, “team autonomy” becomes “team lottery.” Modular platforms should still support:

Fleet-wide minimum standards (PSS levels, network isolation, image scanning requirements).
Exception workflows (time-boxed policy bypass with approval and auditing).
Evidence generation for audits (what was enforced, where, when).

Modularity can increase flexibility, but it should not dilute accountability.

Observability and the modular platform: standard signals, flexible backends

Observability is another domain where organizations have strong opinions. Some teams are standardized on the Grafana stack; others are deep into OpenTelemetry pipelines; others run vendor platforms for logs, metrics, and traces. A bundle that insists you use “observability the vendor way” is likely to be rejected by the teams who already have a mature setup.

The modular approach suggests a compromise: define standard signals (metrics naming conventions, tracing propagation, log schemas, SLO definitions) and allow backends to vary — with an integration contract that keeps the developer experience stable.

In other words: don’t force everyone onto the same dashboards; force everyone to emit the same signals.

Cost visibility is driving modularity more than most people admit

There is a gentle fiction in platform engineering that platforms are purchased because of “developer productivity.” That’s true, but it’s also incomplete. Platforms are purchased because of risk reduction and cost governance.

FinOps research and commentary across the industry consistently highlights waste reduction and optimization as top priorities, and the scope of FinOps is expanding beyond pure public cloud into SaaS, licensing, data centers, and AI. This means platform decisions increasingly get evaluated through a “total technology spend” lens rather than a narrow infrastructure lens. citeturn2search5turn4search4turn4search1

That context strengthens the modular argument: if cost discipline is a first-class requirement, paying for unused platform capabilities becomes harder to justify, and swapping components to match economic reality becomes more attractive.

Modular platform ROI is easier to prove incrementally

One practical win of modularity is sequencing:

Start with Kubernetes fleet management and secure defaults.
Add autoscaling/capacity optimization when spend becomes painful.
Add portal workflows when onboarding and discoverability are slowing teams.
Add AI infrastructure modules when experimentation moves to production.

Each step can be justified with a measurable bottleneck. That’s a far easier internal conversation than “we need a giant bundle because the future is complex.” The future is always complex. Finance prefers receipts.

AI changed the platform conversation: why “add it later” is suddenly the sane option

AI infrastructure is a perfect stress test for modular platforms, because AI adoption is uneven. Some organizations are all-in on internal model serving and GPU fleets; others are using managed APIs and have minimal infrastructure requirements.

The CNCF launched a Certified Kubernetes AI Conformance Program to standardize expectations for AI workloads on Kubernetes, explicitly aiming to reduce fragmentation and improve interoperability. The announcement frames this as a community-led effort to define minimum capabilities required for reliable AI on Kubernetes. citeturn2search0

Giant Swarm’s post mentions that it is among the first platforms certified through this program (and ties that to their broader modular capability story). citeturn1view0turn2search0

Even if you ignore certifications, the operational reality is this: GPUs are expensive, AI workloads are spiky, and the “unit economics” are still evolving (tokens, inference, training runs, storage, data egress). In that environment, bundling AI infrastructure into every platform contract is like bundling a snowplow into every car purchase. Some people need it. Others live in Florida.

Industry perspective: “choice and modularity” is becoming an expectation

Thylmann references a comment attributed to Benjamin Brial (Cycloid) about businesses expecting internal developer platforms to offer choice and modularity. Whether or not you take that as a universal truth, the broader industry discourse supports the trend: IDPs and platform tooling increasingly position themselves as curation layers rather than monoliths.

Brial has also discussed IDPs publicly in other venues, emphasizing that IDPs address scaling DevOps, hybrid complexity, and cloud waste, and describing common IDP components like service catalogs and orchestration layers. citeturn4search5

The bigger point: modularity is not only a technical preference. It’s becoming a procurement and organizational preference, because it aligns with how enterprises change: slowly, unevenly, and with lots of legacy constraints.

Case studies and comparisons: DIY vs bundled vs modular-curated

Most organizations end up choosing one of three models, even if they don’t label them this way.

1) DIY platform engineering (assemble your own stack)

Pros: maximum control, best fit for unique constraints, no vendor coupling.

Cons: integration tax, upgrade burden, staffing burden, operational risk. You become the vendor. Congratulations, you now support your own product — and your customers are very loud engineers.

DIY can work brilliantly when you have strong platform talent and stable requirements. It becomes painful when your platform team is small, the fleet is large, and compliance requirements evolve faster than your backlog.

2) Bundled platform (one vendor stack)

Pros: faster initial time-to-value, fewer integration decisions, simpler procurement.

Cons: bundle trap risk, paying for unused capabilities, parallel tool sprawl, reduced flexibility when requirements change.

Bundles often shine for greenfield teams or organizations that are willing to standardize aggressively. They struggle in enterprises with existing tooling and strong opinions.

3) Modular-curated platform (capabilities you can add/remove, with tested integration)

Pros: incremental adoption, reduced shelf-ware, flexibility to keep existing investments, easier to prove ROI per capability.

Cons: more decisions, governance complexity, risk of fragmentation if contracts and compatibility aren’t managed.

This is the space Giant Swarm is arguing for: modularity with curation and integration, not modularity as “here is a catalog of CNCF projects.” citeturn1view0turn0search0

How to adopt modularity without building a patchwork platform

If you’re a platform lead reading this and thinking, “Modular sounds good, but I’ve seen ‘plug and play’ become ‘plug and pray’,” you’re not wrong. Here are practical guardrails that make modularity survivable.

Define your platform contracts first

Before you pick modules, define what must stay stable:

How teams request environments and resources.
How deployments happen (e.g., GitOps workflow, CI triggers).
How identity and authorization are mapped to org structures.
How telemetry is accessed and what minimum signals exist.
How policy exceptions are handled.

Contracts are the difference between “modular” and “constantly re-platforming.”

Run a compatibility program (yes, like a mini CNCF conformance)

Even if you never publish a logo, you need internal conformance:

Supported versions of Kubernetes and core addons.
Supported combinations of modules.
Upgrade sequencing and rollback procedures.

CNCF’s conformance programs exist for a reason: interoperability is expensive without shared tests and standards. citeturn2search1turn2search0

Invest in “module lifecycle” automation, not just day-one installation

Most pain happens after go-live: upgrades, CVEs, new org requirements, expansions, migrations. Modular platforms should be judged on their lifecycle ergonomics, not their demo day.

GitOps helps, but you also need:

Automated policy testing.
Progressive delivery patterns for platform changes.
Fleet-wide drift reporting.
Clear deprecation pathways.

So is the future modular?

Yes — but not because modularity is fashionable. Because the enterprise reality is modular already:

Your org is modular (business units, regions, compliance regimes).
Your infrastructure is modular (hybrid, multi-cloud, edge).
Your workload types are modular (web, data, AI, batch, streaming).
Your developer experience needs are modular (golden paths differ by team maturity).

A monolithic platform bundle can still work in certain contexts, but the default trend line is toward pick-and-choose capabilities, integrated through consistent workflows and APIs — especially as AI infrastructure becomes a new, unevenly adopted requirement.

Giant Swarm’s argument is ultimately a pragmatic one: platforms should adapt to how organizations actually evolve, not force organizations to adapt to a vendor’s reference architecture. That’s the kind of position that sounds obvious — right up until you try to operationalize it across a fleet, a compliance regime, and a budget meeting.

In other words, modularity is the future… provided you do the unglamorous work that makes modularity coherent.

Sources

Bas Dorland, Technology Journalist & Founder of dorland.org