AWS Weekly Roundup (Jan 26, 2026): EC2 G7e with NVIDIA Blackwell, Corretto security updates, and the quiet platform tweaks that matter

AI generated image for AWS Weekly Roundup (Jan 26, 2026): EC2 G7e with NVIDIA Blackwell, Corretto security updates, and the quiet platform tweaks that matter

AWS has a particular talent for launching something that sounds like a minor alphabet soup update (“G7e is now generally available”) and then watching half the internet quietly reorder its infrastructure roadmap.

In the AWS Weekly Roundup: Amazon EC2 G7e instances with NVIDIA Blackwell GPUs (January 26, 2026), AWS recaps a cluster of changes that—taken together—tell a pretty clear story: 2026 is going to be about shipping generative AI inference at scale while also tightening the bolts on the software supply chain and the day‑to‑day operations that keep production systems from melting down.

The roundup post is authored by Micah (as signed in the post). I’ll use it as the foundation, but we’ll go deeper: what the new EC2 G7e instances actually offer, why Corretto’s quarterly security updates should be on every Java team’s calendar, and why features like cross-repository layer sharing in ECR can save you more money (and CI time) than your latest “let’s optimize Dockerfiles” workshop.

What’s in this AWS Weekly Roundup (January 26, 2026)

The roundup highlights:

  • Amazon EC2 G7e instances are generally available, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, positioned for generative AI inference, spatial computing, and scientific workloads. Available initially in US East (N. Virginia) and US East (Ohio). (Roundup source)
  • Amazon Corretto January 2026 quarterly updates for LTS OpenJDK builds: Corretto 25.0.2, 21.0.10, 17.0.18, 11.0.30, and 8u482. (What’s New source)
  • Amazon ECR cross-repository layer sharing via blob mounting to speed pushes and reduce duplicated storage. (What’s New source)
  • CloudWatch Database Insights on-demand analysis expands to Asia Pacific (New Zealand), Asia Pacific (Taipei), Asia Pacific (Thailand), and Mexico (Central). (What’s New source)
  • Amazon Connect Step-by-Step Guides gains conditional logic and real-time updates (auto refresh). (What’s New source)
  • Upcoming event: Best of AWS re:Invent virtual event on January 28–29, 2026. (Event page)

If you only skimmed the roundup, the headline is “new GPU instances.” If you read it like a platform engineer, it’s “AWS is sanding down friction everywhere: inference hardware, Java patching, container pushes, database triage, and contact center workflows.” That’s a strategy, not a coincidence.

EC2 G7e: Blackwell arrives for inference (and it’s not just about AI)

Let’s start with the one that will get the most Slack reactions: Amazon EC2 G7e instances are now generally available. AWS positions G7e for generative AI inference, spatial computing, and scientific computing. The roundup notes an “up to 2.3x better inference performance compared to G6e,” along with “two times the GPU memory” and support for up to 8 GPUs totaling 768 GB of GPU memory. (Roundup)

On the product page, AWS spells out the hardware profile that matters to buyers who have already been burned by “GPU instance” marketing:

  • Up to 8 NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs
  • Up to 768 GB total GPU memory (96 GB per GPU) using GDDR7
  • Up to 192 vCPUs and up to 2 TiB system memory
  • Up to 1,600 Gbps networking with EFA and cluster placement groups
  • Up to 15.2 TB local NVMe SSD
  • 5th Gen Intel Xeon Scalable (Emerald Rapids) processors

(G7e instance page)

There are two big themes here: memory and networking. If you’ve spent any time running inference at scale, you know why. Modern model serving isn’t limited only by raw FLOPS; it’s limited by what you can keep in memory, how efficiently you can batch and stream, and how fast you can fetch weights, embeddings, and context. And if you’re doing multi-GPU serving (or GPU‑accelerated rendering / spatial workloads), your “GPU instance” is only as good as its ability to move data around without turning your GPUs into very expensive idle heaters.

What “up to 2.3x better inference performance” usually means in practice

AWS claims G7e offers up to 2.3x inference performance compared to G6e. (G7e page) Marketing numbers are always “up to,” so here’s the practical translation:

  • If your inference pipeline is GPU‑bound (compute and memory bandwidth), you may see dramatic gains—especially when you can use newer precisions and optimize kernels.
  • If your inference pipeline is CPU‑bound (tokenization, pre/post processing, Python overhead) or I/O bound (slow weight loading, remote storage bottlenecks), you might see less improvement until you fix the rest of the stack.
  • For teams currently squeezing models into smaller GPU memory footprints, the 96 GB per GPU is often as important as compute. Bigger memory lets you reduce sharding, reduce offloading, and simplify deployment.

In other words: for many organizations, the win is not only that G7e is faster—it’s that it can be simpler. And simplicity is performance’s underappreciated cousin. (It also gets invited to fewer incident postmortems.)

GPU memory and why AWS keeps highlighting “70B parameters”

The roundup says G7e can run “medium-sized models of up to 70B parameters with FP8 precision on a single GPU.” (Roundup) This is a very specific kind of reassurance aimed at a very specific buyer:

  • Teams that want to serve a reasonably capable LLM without building a multi-node serving architecture.
  • Teams that want lower operational complexity (and lower tail latency) by avoiding excessive cross-GPU and cross-node communication.

It’s also a hint: AWS expects many customers to run inference in mixed precision (FP8 and beyond), and to rely on newer GPU features that support those precisions. The G7e product page explicitly notes 5th-generation NVIDIA Tensor Cores with FP4 support (where applicable to your frameworks and accuracy targets). (G7e page)

The practical point: if you’re still serving in FP16 because “that’s what we did last year,” you’re leaving capacity (and money) on the table. But the other practical point: precision choices are application-specific, and your QA team should be involved before you roll out an “FP4 for everyone” policy like it’s a new dress code.

The spatial computing angle: ray tracing cores and video engines

AWS also positions G7e for spatial computing. That’s not just buzz. The G7e GPU specs include ray tracing cores and multiple NVENC/NVDEC engines with advanced encoding/decoding support. (G7e page) Those details matter when you’re doing:

  • 3D rendering pipelines (think design reviews, digital twins, simulation visualization)
  • Real-time streaming of high-resolution frames (remote visualization, virtual production workflows)
  • Robotics simulation and perception stacks where you want GPU acceleration beyond matrix math

AWS even includes customer commentary from Agility Robotics on the G7e page, framing the instance as useful for training higher performance controls for humanoid robots. (G7e page) Whether you’re building robots or just trying to keep your inference latency consistent, the point is the same: this is a “graphics + AI” profile, not an AI-only play.

Networking: 1,600 Gbps with EFA is a signal about cluster-scale inference

The G7e family tops out at 1,600 Gbps networking bandwidth with EFA, and AWS calls out this being 4x G6e networking bandwidth. (G7e page)

You don’t ship that kind of networking for casual single-instance deployments. That’s AWS saying: “Yes, we expect you to run these in clusters.” For big inference fleets, that can mean:

  • Model parallel inference and high-throughput batching across multiple GPUs
  • Fast distribution of updated model weights and caching layers
  • High-performance shared file systems and accelerated data loading

AWS also highlights GPUDirectStorage support with FSx for Lustre in its announcement blog post, specifically calling out increased throughput to instances compared to G6e. (G7e announcement) If your inference workflow involves frequent model reloads (multi-tenant serving, A/B experiments, rolling updates), faster loading becomes a competitive advantage—because “time to deploy an improved model” turns into a feature.

Which G7e sizes exist, and how to think about them

AWS lists multiple G7e sizes, from g7e.2xlarge (1 GPU) up to g7e.48xlarge (8 GPUs). (G7e page)

In practice:

  • 1-GPU sizes are often best for low-to-medium traffic inference endpoints, prototyping, or dedicated workloads where you want predictable costs.
  • 2–4 GPU sizes can be the sweet spot for production inference when you need scale but don’t want to manage a larger cluster.
  • 8 GPU is where you start thinking in terms of “mini-inference factory”: heavy throughput, multiple concurrent models, or very large context windows.

And yes: you will still want autoscaling and load testing. Bigger instances don’t automatically fix queueing, spiky traffic, or the “someone deployed a debug build” problem. But they do shift your capacity curve in a very welcome direction.

Corretto January 2026 quarterly updates: the Java patch train is still real

Every few months, a quiet truth of enterprise software reasserts itself: Java is still everywhere, and patching still matters. AWS announced Amazon Corretto January 2026 Quarterly Updates on January 20, 2026, delivering security and critical updates for its Long-Term Supported OpenJDK distributions: Corretto 25.0.2, 21.0.10, 17.0.18, 11.0.30, and 8u482. (Corretto What’s New)

Corretto is AWS’s “no-cost, multi-platform, production-ready distribution of OpenJDK.” (Corretto What’s New)

Why this matters even if you’re not “a Java shop”

Because you might be a Java shop accidentally.

Java shows up in places teams forget about until it’s 2 AM and someone asks why the build agent is crying:

  • CI tooling and plugins
  • Legacy services that “just work” and never get touched
  • Data processing pipelines and connectors
  • Enterprise middleware (still a thing, despite everyone’s best efforts)

Quarterly updates are where security and critical fixes typically land, and they’re a key part of keeping your dependency chain healthy. In 2026, with supply-chain attacks firmly in the “this happens to normal companies too” category, treating runtime patching as optional is a bold strategy. Bold like deploying on Friday at 4:55 PM.

Operational advice: treat Corretto updates as a routine, not a fire drill

Corretto updates are available for download and also via package repositories (Apt, Yum, Apk) according to AWS. (Corretto What’s New)

If you run Java workloads on AWS (or anywhere), a sane approach looks like this:

  • Maintain a tested “runtime bump” pipeline (a staging environment that runs representative traffic and checks for regressions).
  • Pin major versions (8 vs 11 vs 17 vs 21, etc.) but allow automated minor/patch upgrades after testing.
  • Track what’s running where (asset inventory for runtimes matters as much as code dependencies).

The more your runtime upgrades are boring, the less you’ll spend on “emergency meetings” and the more you’ll spend on shipping features. Your CFO will not complain about this trade.

Amazon ECR blob mounting: small feature, big CI/CD payoff

AWS also announced that Amazon ECR now supports cross-repository layer sharing through blob mounting (posted January 20, 2026). (ECR What’s New)

In plain language: if you have multiple repositories that share common image layers (base images, shared dependencies, identical build stages), ECR can reuse those layers across repositories within the same registry. That can mean:

  • Faster image pushes (less re-uploading identical blobs)
  • Reduced storage duplication (store common layers once)

AWS says enabling it is a registry-level setting (console or CLI), and it’s available in all AWS commercial and GovCloud (US) Regions. (ECR What’s New)

Why developers should care: this is a microservices tax cut

In a microservices environment, it’s common to have:

  • 20+ services
  • a few standardized base images
  • repeated dependency layers (language runtimes, OS packages, security agents)

Without layer sharing, each service’s repository can end up storing the same layers repeatedly. That’s not just a cost issue—it’s time. CI jobs push images constantly, and every minute spent uploading redundant layers is a minute not spent testing, scanning, or deploying.

Important constraints (read these before you celebrate)

The documentation clarifies several constraints. For example, blob mounting works within the same registry (same account and region), and repositories must have identical encryption configurations. (ECR docs)

That means if you were hoping for a magical cross-account global dedup layer cache, this isn’t that. But within an account/region—where a large percentage of teams run their core CI/CD—it can still be a meaningful improvement.

A quick example scenario

Consider a platform team running 40 services, all based on the same hardened base image and the same language runtime layer. When base images update weekly (security patches, CA bundle updates, etc.), the push storm can be loud.

Blob mounting doesn’t eliminate the need to rebuild images (you still need fresh layers when things change), but it can significantly reduce redundant storage and improve push performance when layers already exist elsewhere in the registry.

It’s the kind of feature you don’t notice in a demo but you notice in your monthly build minutes bill—and in the number of times your developers say “why is this push taking so long?”

CloudWatch Database Insights expands regions: observability for the database reality we actually have

Database performance problems are one of the few universal constants in computing. The other is that the problematic query is always “temporary” and always got deployed “just for a test.”

AWS announced that CloudWatch Database Insights on-demand analysis is now available in four additional regions: Asia Pacific (New Zealand), Asia Pacific (Taipei), Asia Pacific (Thailand), and Mexico (Central) (posted January 20, 2026). (Database Insights regions)

AWS describes Database Insights as a monitoring and diagnostics solution that provides visibility into metrics, query analysis, and resource utilization, and it uses machine learning models to help identify bottlenecks during a selected time period and suggest remediation. (Database Insights regions)

On-demand analysis: why “look at this time window” is the killer feature

Operations teams don’t suffer from a lack of dashboards; they suffer from a lack of time. “On-demand analysis” is basically CloudWatch saying: pick the incident window and we’ll help you reason about what changed.

That matters because database incidents often have these traits:

  • They’re time-bounded (a deploy, a batch job, a marketing spike).
  • They’re multi-dimensional (CPU, locks, IO, buffer cache, query plan changes).
  • They’re emotionally charged (nobody wants to be the person who broke the database).

Features that speed up diagnosis reduce mean time to recovery (MTTR), and MTTR is what your customers actually experience.

Context: Database Insights is becoming a fleet-scale tool

AWS has also been expanding Database Insights capabilities beyond a single database view. For example, in late 2025 AWS announced cross-account and cross-region database fleet monitoring for CloudWatch Database Insights. (Cross-account/region monitoring)

The region expansion in January 2026 fits a broader pattern: AWS is building observability experiences that match how modern organizations operate—multi-region, multi-account, and constantly in motion.

Amazon Connect Step-by-Step Guides: conditional logic and real-time updates for agent workflows

Contact centers are often where IT meets real humans at scale, with latency measured in customer patience rather than milliseconds. AWS announced that Amazon Connect Step-by-Step Guides now supports conditional logic and real-time updates (posted January 23, 2026). (Connect update)

According to AWS, managers can build conditional user interfaces—show/hide fields, change defaults, and adjust required fields based on earlier inputs. Guides can also automatically refresh data from Connect resources at specified intervals, so agents work with current information. (Connect update)

Why this matters: “agent assist” is only as good as the workflow

We talk a lot about AI in customer support—summaries, suggested replies, sentiment analysis. But a lot of real-world efficiency comes from boring workflow improvements:

  • Fewer irrelevant fields to fill out
  • Fewer opportunities for agents to pick the wrong option
  • Fewer “wait, the policy changed yesterday” moments

Conditional logic makes guides less like static scripts and more like interactive decision trees, while real-time refresh helps reduce the gap between “what the system says” and “what’s actually true right now.” In regulated industries, that gap can become a compliance issue, not just a customer experience issue.

Regional availability (and why you should verify your region)

AWS lists Step-by-Step Guides availability across multiple regions including US East (N. Virginia), US West (Oregon), Canada (Central), Africa (Cape Town), several Asia Pacific regions, Europe (Frankfurt/London), and AWS GovCloud (US-West). (Connect update)

If you’re operating Connect in multiple regions, treat feature availability as a deployment constraint: build workflows that degrade gracefully and don’t assume every region has the same feature set on day one.

Upcoming: Best of AWS re:Invent (Jan 28–29, 2026)

The roundup also points readers to a virtual event: Best of AWS re:Invent. AWS describes it as a curated, free virtual experience that brings “the most impactful announcements and top sessions from AWS re:Invent,” with an opening session featuring Jeff Barr, AWS VP and Chief Evangelist. (Event page)

According to the event page, the broadcasts are scheduled as:

  • AMER: January 28, 2026 – 9:00 AM PT
  • APJ: January 29, 2026 – 9:00 AM SGT
  • EMEA: January 29, 2026 – 9:00 AM CET (and 1:30 PM IST)

(Event page)

If you missed re:Invent but still want the highlights without the “my step count is now a health hazard” experience, this is the lightweight alternative. Also: it’s a great way to catch what AWS wants you to remember, which is a subtly different thing from everything AWS announced.

The bigger pattern: AWS is optimizing the AI era’s “boring” infrastructure

It’s tempting to treat this roundup as “new GPU boxes plus some miscellaneous stuff.” But the miscellaneous stuff is the platform story.

Inference is moving from experimentation to operations

G7e is inference-focused hardware with serious networking, memory, and storage characteristics. That implies AWS expects customers to run inference like a production workload: monitored, scaled, cost-managed, and integrated into larger systems.

Security updates and supply chain hygiene are not optional

Corretto quarterly updates and ECR’s improvements both tie into a world where your runtime, your images, and your dependencies are part of your security posture. Faster pushes and less duplication also encourage more frequent rebuilds—exactly what you want when responding to CVEs and critical patches.

Observability and workflow tooling are where productivity really lives

Database Insights and Connect’s workflow improvements both target mean time to clarity. The faster teams can understand what’s happening—and guide humans through the right steps—the more resilient systems become.

AI will keep getting headlines. But the organizations that win are the ones that can operate AI and cloud systems reliably without turning every incident into a documentary series.

Practical takeaways for builders (a short checklist)

  • Evaluating G7e for inference? Benchmark your real model and traffic pattern. Pay attention to memory footprint, batch sizes, and end-to-end latency—not just tokens/sec.
  • Running Java in production? Put Corretto quarterly updates on your patch calendar and automate validation in staging.
  • Using ECR heavily? Enable blob mounting and verify repository encryption alignment to benefit from cross-repo layer sharing.
  • Operating databases across regions? Explore CloudWatch Database Insights on-demand analysis, especially if you’re expanding into the newly supported regions.
  • Running a contact center on Connect? Revisit Step-by-Step Guides and consider conditional UI logic to reduce agent cognitive load and error rate.

Sources

Bas Dorland, Technology Journalist & Founder of dorland.org