Newsletter
|
Sep 25, 2025
The edge is now in the prompt, not in the execution
Copy Link
We have written about version control in the past, but the widespread integration of artificial intelligence (AI) into software development has created a new set of processes that require version control. As users integrate more natural language prompts into their code, they need to find a way to manage versions of these prompts and to evaluate how effective different versions are at achieving its goal. This week, we want to explore what prompt engineering is and why new version control systems are needed to manage this workflow.
Prompt engineering is the process of guiding an AI solution to provide a specific output. For example, if you tell an AI model to:
“Give me a picture of a building.”
You could get any range of things, and likely not what you had in mind. However, if you get more specific and say:
“Create a realistic, life-like image of the Empire State Building at night. The top of the building should glow with golden lighting. The background sky should be filled with stars, and there should be a large red moon prominently visible. The perspective should highlight the building against the night sky, with sharp detail and cinematic contrast.”
The result is likely to be a lot more precise. This same phenomenon exists with every interaction with an AI model. Whether you are trying to get it to engage with a code base, summarize an article, or chain together a series of functions, the more specific the user input, the better the output.
Integrating prompts into a software development workflow means they won’t just be one-off inputs, they’ll be reused repeatedly. For example, you might set up a program to run the same instruction every week: “Summarize every newsletter from Konvoy and post it on Twitter.” Over time, you’ll likely refine that instruction to improve clarity, accuracy, or style. Keeping track of how prompts evolve becomes essential, since even small changes can lead to very different outputs.
In traditional software development, version control systems help teams track all changes made to code over time, allowing users to recall older versions, restore files, and test new features without affecting the main project. Systems like Git, Perforce, and Diversion (a Konvoy portfolio company) help developers keep track of their changes and often have built-in tools to ensure everything is working properly when things change.
Similarly, version control for prompts can help track changes, collaborate with other members of your team, and revert to old versions of a prompt. However, the similarities stop there. Version control for code and prompts has distinctly different workflows that need different tools and processes to manage properly.
New entrants into this space can differentiate themselves from traditional version control systems by 1) a clean user interface, 2) aggregating a unique set of tools and data analytics, and 3) leveraging the unique data that is captured from the evaluation process.
Evaluating the output of an LLM is different from evaluating the output of code. For example, understanding if an LLM chatbot is providing the correct information about your product and is also using the correct demeanor is not a binary answer. This requires different types of testing. PromptLayer, a company attempting to solve version control for prompts, details a few methodologies that can be used here:
To support these new styles of evaluation, new tools are required that can automatically run tests on new prompts, see results side by side, and build custom score cards to measure evaluation needs.
Scoring these prompts provides valuable information that can be leveraged to provide suggestions for other users looking to optimize prompts. Depending on the complexity of the task, these new version control platforms could even be well-positioned to create a marketplace for prompts, helping users streamline their creation process and create a data flywheel for the business.
Takeaway: As AI becomes a core part of software development, managing prompts is increasingly cumbersome and needs to be supported with custom tools. Just as Git transformed how developers track and collaborate on code, new systems are needed to version, evaluate, and optimize prompts. The real moat will not just come from storing prompt history, but from building rich evaluation datasets that guide better outputs. Teams that invest early in prompt version control will not only gain an increased productivity and performance, but will have a system for testing that aligns with best practices in software development.