Developer Productivity in the Age of AI
Back in August, we kicked off a series on the future of builders, exploring the impact that generative AI could have on the craft of building software. My journey as a technologist has deeply informed my perspective on the topic, and this year, I’ve invested significant time and energy into updating my own workflow to reap the benefits of generative AI and researching what “productivity” means in that context. As a result, I’ve begun challenging my own definition of “developer productivity.” Let’s dive in and explore what I’ve learned, first by considering my growth as a technologist.
A Career of Transitions
When I tell people that my first job as a programmer came when I was only 15 years old, they are usually surprised. Because of my early start to my career as a technologist, I’ve been privileged to experience many transitions and disruptions driven by new technology, all of which have informed my perspective.
I started developing in languages like C, C++, and Pascal when Java hit the scene, promising a “write once, run anywhere” platform. I leapt in with gusto, excited by the potential provided by the JVM, JDK, garbage collection, and freedom from manual memory management, all in a language that looked familiar to me. My high school computer lab allowed me and a ragtag group of teenage technologists to build a Slackware Linux server connected to the school’s high-speed ISDN internet connection. I learned languages like PHP and Python, falling immediately in love with the latter, which made Java’s type system look draconian and stifling.
My first job involved developing an Electronic Medical Records platform in Python and Java, including web-based applications using HTML and CSS, with JavaScript just an early toy before AJAX made it an incredibly powerful tool for creating rich, interactive web apps. Today, JavaScript is ubiquitous both on the server and in browsers.
I’ve lived through the rise of Linux as a server platform, displacing old-school UNIX platforms like SCO UNIX and IBM’s AIX, advances in how software is delivered with SaaS, and the emergence of the public cloud. There have been dozens of other technological shifts in my time, the largest of which is likely the introduction of the public cloud—until now.
Evolving Workflows
Throughout my entire journey, programming workflows have evolved, but generally speaking, the act of writing code remains fundamentally unchanged. Developers leverage IDEs, compilers, and documentation to hand-craft software. Documentation has certainly evolved, now encompassing knowledge sharing on websites like Stack Overflow, but at a high level, software development workflows have stayed relatively stable.
Several years ago, it became clear that AI would be able to write code, very much like a programmer. It started small, feeling more like a glorified auto-complete than a code assistant, but has since blossomed into very powerful tools that can generate code much more quickly than a human. Quality and accuracy has varied, but is continuously improving at an incredible clip. I’ve made every effort to embrace generative AI to improve my productivity, helping me spend more time solving problems and less time searching through documentation and web searches.
Jarvis, Help Me Build a Game
My experiences with AI tooling have caused me to make deep changes to how I write code, with the most recent example being a fun project that I collaborated on with Mission’s marketing team — building a competitive game that attendees of AWS re:Invent can play at Mission’s booth on the trade show floor. In my 30 years of software development, gaming has not been a major part of my experience, so it was a great opportunity to try and maximize my use of generative AI to make me more productive.
I elected to try several different GenAI code assistants, including local models with Ollama, and Amazon Q Developer. In the past, my approach would start with research, searching the web, reading tutorials, and doing light testing. Instead, I asked my code assistant to advise me by providing a simple prompt:
I want to write a simple game that can run on a Mac. Players of the game will control a skydiver falling through the air, avoiding obstacles. I am most comfortable with Python. Please advise what the best approach would be.
Shortly after sending my request, my code assistant suggested that I use pygame, which is a library for game development in Python. From there, our conversation continued, with me focused on coaching the assistant to write a skeleton for my game. Before long, I had a 100-line script entirely generated by my assistant that presented a black square on a blue background that could be steered left and right to avoid red squares representing obstacles.
As I progressed, my interactions with the assistant slowly transitioned from high-level conversation to detailed iteration on specific parts of the codebase. Instead of using search engines and documentation for specific features of pygame, I would provide snippets of code to modify. For example, I asked the assistant to change the code that draws the black rectangle representing the player with an image, and it quickly provided an in-context example, which was easy for me to integrate. The assistant also never took offense when I told it I was unhappy with its approach, taking my feedback and revising code to align with my expectations. At no point did I feel like I was losing control of the codebase, as I was developing alongside the assistant the entire time. I was still writing lots of code, but I had an expert at my command to pair program with.
This was the first project I have completed from end-to-end, starting with no code, and ending with a full-fledged application using gen AI. My workflow was drastically different than in the past, and I could remain focused on solving problems rather than poring over documentation. I found that Amazon Q Developer, in particular, was able to stay highly effective as the code base grew, where local models and other assistants tended to suggest changes in isolation that would cause issues in other functions or objects in the same file. Q Developer also has the ability to operate against an entire workspace, spanning distinct modules and libraries, suggesting changes from end-to-end.
It wasn’t all roses and butterflies, of course, as I found that all of the assistants would sometimes struggle with making changes to larger amounts of code without causing other features to break. A simple redirection often was enough to resolve the issue. Overall, even with the occasional bump, the entire process was fun, productive, and I am very pleased with the finished product. It's difficult not to feel like Tony Stark collaborating with his Jarvis AI, only my experience isn’t a work of fiction. Test out the game, Skyfall, here and let me know what you think!
My experience left me pondering a few philosophical questions. I’ve spent decades writing code, leading engineering teams, and delivering software without the aid of AI. In the post-AI world, what does software development look like at scale? What will be the hallmarks of a highly productive engineering team? Moreover, what do we even mean by productivity in this new world? Let’s explore that a bit.
Measuring Post-AI Productivity
Measuring the productivity of an engineering team has always been a bit challenging. There are many metrics that may be correlated but are deeply flawed proxies for true productivity. Lines of code certainly can measure the quantity of code an individual or team produces, but it does not consider whether or not the code is well-written and incentivizes bad behavior to pad numbers. Agile methodologies provide additional analogs, with velocity a common key metric for engineering teams. Still, it doesn’t take into account the diverse set of responsibilities and activities that a developer performs, nor modern DevOps / SRE practices.
In recent years, researchers have invested significant time to understand the properties of high-performing engineering teams in more depth. One such research program is DORA, which officially stands for “DevOps Research and Assessment,” and is part of Google.
DORA attempts to strike a balance between velocity and stability, which is a notable shift in position from the “move fast and break things” mantra, which prioritizes velocity above all. Over time, DORA has identified five key metrics that provide a more complete benchmark of high-performing technology teams:
- Deployment frequency
- Lead time for changes
- Time to restore service
- Change failure rate
- Reliability
Google helpfully provides a series of guides on DORA along with research papers and blog posts that dive deep into their findings. Most recently, DORA has published a guide for leveraging generative AI on development teams, both from the perspective of engineers and engineering leaders. In my view, DORA’s research is required reading for anyone aiming to create high-performing engineering teams.
SPACE, introduced in 2021, is the result of research by a diverse set of technologists from GitHub, the University of Victoria, and Microsoft. Their article published on ACM states that the most important lesson they learned is that “productivity cannot be reduced to a single dimension (or metric!).” This point clearly agrees with DORA’s findings. SPACE outlines five dimensions of their own, introducing Developer Experience (DevEx) as a predictor of performance:
- Satisfaction and well-being
- Performance
- Activity
- Communication and collaboration
- Efficiency and flow
SPACE represents another “must-read,” in my opinion, as it busts very common myths about productivity, including:
- Productivity is all about developer activity
- Productivity is only about individual performance
- One productivity metric can tell us everything
- Productivity Measures are useful only for managers
- Productivity is only about engineering systems and developer tools
That last myth is particularly important, as it is extremely common for developers to object to productivity KPIs, as they don’t find the metrics particularly useful, and are fearful that they will be used against them. SPACE aims to create a multi-dimensional framework that self-actualized engineering teams can use to improve their own performance.
Armed with the lessons of SPACE and DORA, engineering leaders will be much more capable of understanding the impact of AI-powered code assistants on their teams, but they will need to be empowered by vendors. Amazon Q Developer, for example, provides a dashboard that quantifies it's value across a number of key metrics, including not just user activity and lines of code generated, but also the acceptance rate of code suggestions.
Bringing it All Together
So, what can we conclude from our exploration?
Gen AI clearly disrupts developer workflows significantly. Adapting to that disruption will require engineering teams to reconsider their definitions of productivity, establishing new multi-dimensional measures that enable them to continuously improve their performance in the midst of transformative technological change.
Thanks for following along our Amazon Q series! Stay tuned for the third blog coming soon.
Author Spotlight:
Jonathan LaCour
Keep Up To Date With AWS News
Stay up to date with the latest AWS services, latest architecture, cloud-native solutions and more.