Abstract
Coding models are helping software developers move faster than ever, but weirdly, the models themselves are not keeping up. They are trained on months-old snapshots of open source code. They have never seen your internal codebase, let alone the code you wrote yesterday. And as AI-assisted contributions have surged, the data shows acceptance rates for pull requests have actually declined. More code is being written, but more of it is being rejected.
Why? Because every repository has its own unwritten rules, architectural constraints that live in the heads of your senior engineers and in the patterns of your commit history, but that no generic model has ever learned. We set out to measure this gap with data, and what we found is that it is real, growing, and structural.
This talk presents original research on what makes codebases architecturally unique, and gives you a practical framework for closing the gap yourself. You will learn how to surface your own repo's unwritten rules (your senior engineers already know them, they just haven't been written down), how to put those rules where your AI tools can actually use them, and how to evaluate whether your current tooling respects the constraints that matter most. No special tooling required. You will leave with a practice you can start next week with a whiteboard and your most experienced engineers.
Interview:
What is your session about, and why is it important for senior software developers?
This session is about the growing gap between what AI coding models know and what your codebase actually requires. Every software organization has unwritten architectural rules. This knowledge lives in the heads of senior engineers and in the patterns of your commit history, but no generic coding model has been trained on it. I present original research showing that these rules exist, they are measurably distinct from generic software engineering best practices, and current approaches like RAG and standard linting miss a significant fraction of violations.
For senior developers and architects, this matters because you are the ones who currently hold this knowledge, and you are the ones feeling the pain when AI-generated code violates it. The good news is that you already have what you need to close the gap. Your senior engineers know these rules; they just have not been written down. This talk gives you the data to understand the problem clearly and a practical, low-tech framework for surfacing those rules and putting them to work in whatever AI tooling you already use.
Why is it critical for software leaders to focus on this topic right now, as we head into 2026?
Two trends are colliding. First, AI-assisted code generation has exploded. Our data shows AI tool mentions in PRs grew 50x between 2022 and 2025 across a number of major open source projects. Second, acceptance rates for PRs are declining, not rising. They dropped from 74.7% to 66.2% over the same period. More code is being generated, but more of it is being rejected. This is not a temporary growing pain. As more of your team's output is AI-assisted, the cost of that AI not understanding your specific architectural constraints compounds. The teams that close this gap fastest will do it by getting explicit about what their codebase uniquely requires: documenting those rules, feeding them into their existing AI tools' system prompts and review checklists, and evaluating their tools against the constraints that actually matter rather than generic benchmarks. This does not require new tooling or infrastructure. It requires a whiteboard session with your most experienced engineers and the discipline to write down what they already know.
What are the common challenges developers and architects face in this area?
The biggest challenge is the gap between generic and specific. Today's coding models know how to write Python, but they do not know that your monorepo requires version synchronization across package boundaries, or that your framework mandates lazy imports for optional dependencies, or that new model implementations must register in a specific registry. RAG helps somewhat: you can put your style guide into the context window. But our research shows that retrieval-based approaches still miss a significant fraction of the violations that repo-specific constraints catch. The deeper challenge is that these rules are usually undocumented. They live as tribal knowledge in the heads of your most experienced engineers. When we extracted constraints computationally from various open source repos, we found real, important rules that nobody had written down. But here is the thing: senior engineers on those projects would have recognized every one of those rules immediately. The knowledge exists. It is just locked in people's heads instead of written where tools and new team members can use it. That is a problem any team can fix without special tooling.
What's one thing you hope attendees will implement immediately after your talk?
Sit down with your two or three most senior engineers and explicitly catalog your repository's architectural constraints. Ask three questions. First: what are the rules in our codebase that are not written down anywhere? Second: look at recent PRs that got sent back in review, not for CI failures but for convention violations. What was the convention? Write it down. Third: what do you spend time explaining to every new team member? Those are constraints too.
In our research, the results consistently included rules that were real and important but that nobody had documented. Things like "configuration validation must enforce that 4-bit loading and flash attention are disabled during adapter merging" or "the frontend package version and the Python package version must be kept identical." Once you have even a partial list, put the top constraints in your AI tools' system prompts and start reviewing AI-generated code against them. That single practice will close more of the gap than switching to a better generic model.
What makes QCon stand out as a conference for senior software professionals?
QCon is one of the rare conferences where the audience is primarily architects and senior engineers who build real systems. That matters for this topic specifically because the people who understand architectural constraints are the people in the room. The value of knowing your repo's unwritten rules only resonates with people who have been burned by violations of those rules. QCon's "no vendor pitches" policy also means I can present research findings honestly, including where our experiments failed in interesting ways, without having to pretend everything is perfect.
Speaker
Jeff Smith
CEO & Co-Founder @ Neoteny AI, AI Engineer, Researcher, Author, Ex-Meta/FAIR
Jeff Smith is an AI engineer, researcher, and entrepreneur. As the Co-Founder and CEO of Neoteny AI, he’s working on the future of coding intelligence. Previously, he was at Facebook/Meta, leading some of their most important AI initiatives including PyTorch, various fundamental methods breakthroughs in FAIR, and the productization of EMG-based neural wristbands. Prior to that, he had a long career across a range of AI and biotech startups, across the US, Asia, and Europe. He’s the author of technical books such as Machine Learning Systems from Manning, and he regularly talks about his work at conferences. His research focuses on efficient model learning methods and has resulted in the development of methods such as SHARe-KANs.