SHINE at AWS: When AI Outpaced Security

I joined Amazon Web Services (AWS) in June 2022 as an Application Security Manager. My team was comprised of 11 high-judgment security engineers tasked with holding or raising the security bar of services across Amazon. We covered services like Amazon-produced open source, Infrastructure-as-Code (IaC), AWS Console, and most relevant to my current role the Next-Gen Developer Experience (NGDE) organization. As NDGE ramped up experimentation with Generative AI, standard security reviews couldn't keep pace. Development velocity exceeded what traditional review timelines could accommodate.

📄 A New Approach Begins with a Document

At Amazon, meaningful changes start with a document. When I recognized that our existing security review model couldn’t scale with NGDE’s speed and scope, I proposed a new approach: an automation first security team focused on innovation, not just inspection. I wrote a document, Amazon’s standard mechanism for cultural change, to outline a pilot program aimed at moving security check commonalities between services to a single, focused team.

We started in February 2024 with a pilot team of five early-career engineers focused on removing common security review work from all of AppSec across AWS. By June 2024, we had data showing that this small team was able to triage a significant portion of the security review work for roughly 120 security engineers. While there were plausible alternative explanations for this efficiency, such as reduced context switching and a narrower daily focus, the more compelling reason was the team’s deep specialization. With focus on a single scope, our engineers developed highly detailed and accurate runbooks that captured even the most obscure edge cases. Over time, their decision-making process became so structured that it resembled a binary decision tree, which we gradually automated, first as a CLI tool, then a Tampermonkey script, and eventually as a fully automated system running hourly via API calls.

YAD (Yet Another Document)

I don't know if I officially coined the term 'YAD', but it became part of my vocabulary. Amazon has a culture of writing, which did help with decision making, but I found myself writing a lot during my tenure.

I saw this as the moment to go all in. I wrote another document and secured long-term investment to establish a permanent team. This time, the proposal included adding Software Development Engineers (SDEs) to work alongside security engineers. Around that time, I had been mentoring an individual contributor to move into management, so I transitioned my existing Application Security team to her and focused on building this new group from the ground up. I looked for security engineers with strong development experience and a bias for ownership, while also retaining several of the original pilot engineers. Bringing together the right people to solve the right problems was only the beginning. That is how the ✨ SHINE team was officially born. This document became the foundation this team, the Security Hub of Innovation and Efficiency, designed to make security reviews faster, simpler, and more scalable across AWS.

📊 Measuring Success

Drake meme: rejecting perfect instrumentation vs accepting lightweight metrics

From the beginning, we knew that measuring success would be as important as achieving it. Without understanding where we started, we would never be able to demonstrate the time saved or the value created. Establishing baselines was critical, even if the data wasn't perfect.

During the pilot, our goal was to identify measurable outcomes without slowing the work itself. We couldn’t realistically track every minute of a security review, so we partnered with the team that owned our security engagement software to add a self-reported completion time at the end of each task. While this data relied on human input and wasn’t perfectly precise, it gave us a consistent way to compare our pilot engineers’ performance against traditional security reviews.

This approach allowed us to move quickly without waiting for fully instrumented data pipelines. It gave us just enough signal to validate that the model worked, and later, to quantify its impact. In hindsight, the way we structured our metrics—lightweight, flexible, and designed for iteration—became a cornerstone of SHINE’s long-term success. It ensured we could measure, learn, and continuously prove the value of automation at scale.

I often bring this experience up when discussing new initiatives. Perfect metrics are rare, but that should never stop you from starting. Find a way to measure what matters, even if it is simple at first. Establishing baselines early is essential to show progress and earn trust. Today, it’s how we operate. We used to think in terms of test driven development. Now, we think in terms of metrics driven development, because measurement is what turns effort into evidence, and evidence into impact.

Automation at AWS Scale

Coming out of the pilot, my team immediately began expanding automation across security review tasks. We started by analyzing our existing guidance to identify the work most suitable for automation; tasks that followed clear patterns and binary decisions.

One simple example involved verifying that all S3 buckets had Block Public Access (BPA) enabled before release. If BPA was already on, the system could automatically validate the task and attach compliance evidence. If it was not, the task was returned to the builder for review. (In practice, BPA is now enabled by default, so this serves mainly as an illustration of our approach.)

Our earlier partnership with the security review platform team proved invaluable. Together, we enabled automated validation and failure of these tasks through direct API calls. We also collaborated on schema changes, allowing the system to record and attach evidence of both successful and failed validations automatically.

To put the scale in perspective, AWS launches more than 13,000 new features each year, and each one requires a security review. As our automation matured, it eliminated time spent by both builders and security engineers; builders no longer needed to provide manual evidence, and engineers no longer needed to perform repetitive checks. That saved time was captured in the same self-reporting system used during the pilot, except now, if our automation handled the task, the reported time was effectively zero.

The impact added up quickly. Even if a single automated task took only five minutes to complete and appeared in just half of all security reviews, that alone would save more than 540 hours of engineering time per year. At AWS scale, those efficiencies compound, freeing our engineers to focus on higher judgment work and accelerating how quickly we can deliver secure innovation.

Wrapping Up

The SHINE journey started as a small pilot to test whether automation could scale security reviews without compromising quality. Along the way, we learned that measurement is what earns trust, specialization enables precision, and automation compounds both over time.

What began as a few early-career engineers building runbooks evolved into a team driving measurable impact across AWS. Today, automation not only accelerates reviews but also frees our engineers to focus on higher judgment security challenges, exactly where human expertise creates the most value.

In future posts, I’ll dive deeper into how we distinguish between deterministic automations (like the validation tasks described above) and Generative AI automations, which require context, interpretation, and decision-making beyond predefined rules. Both play critical roles in scaling security for the AI era, and both depend on the principles that guided SHINE from the start: measure, iterate, and automate.

TL;DR - Measure first, automate fast, and let data prove the value. SHINE turned those principles into a scalable security model for the AI-enabled era.