Tempered AI Assessment
Choose Your Assessment
Take the full assessment for the most complete picture, or filter by role for a shorter experience.
▸ Or filter by role for a shorter assessment
12 dimensions, 85 questions
Foundational Adoption
Basic activation and consistent usage of Copilot tools across interfaces
You can't improve what you don't use. But adoption alone doesn't indicate maturity - it's table stakes. With 82-89% industry adoption in 2025, this dimension has decreased in weight. Gartner predicts 90% of enterprise software engineers will use AI assistants by 2028.
How many days per week do you typically use GitHub Copilot?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
Which Copilot interfaces do you actively use?
Sources 2
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously across a codebase
- Requires explicit approval before applying changes (diff review checkpoint)
- Supports iterative refinement: review, request changes, re-generate
GitHub Copilot CLI Documentation
Copilot CLI is GitHub's terminal-native coding agent. Public preview launched September 2025. Uses Claude Sonnet 4.5 by default. Supports interactive (conversational) and programmatic (one-shot) modes. MCP-powered extensibility with GitHub's server built-in. Requires explicit approval for all actions.
- Terminal-native agentic development - no context switching required
- GitHub integration: repos, issues, PRs via natural language
- Two modes: interactive (copilot command) and programmatic (-p/--prompt)
When you use Copilot, how much of your coding session involves it?
Sources 1
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
How many different AI coding tools does your team use?
Sources 1
2025 Stack Overflow Developer Survey
2025 Stack Overflow survey shows continued adoption (84%) but declining trust (60% positive, down from 70%+). Key insight: developers are using AI more but trusting it less. 35% use Stack Overflow as fallback when AI fails.
- 84% of respondents using or planning to use AI tools (up from 76% in 2024)
- 51% of professional developers use AI tools daily
- Positive sentiment dropped from 70%+ (2023-2024) to 60% (2025)
What is your experience with MCP (Model Context Protocol) for connecting AI tools to external data?
Sources 2
Model Context Protocol Specification
Model Context Protocol (MCP) is an open-source standard providing a 'USB-C port for AI applications' - a standardized protocol for connecting AI systems to external data sources, tools, and workflows. MCP became an industry standard in late 2025, enabling composable, interoperable AI systems. Foundational for advanced AI development maturity - assessing MCP adoption indicates sophistication in AI tool integration.
- Three core primitives: Resources (context data), Prompts (templates), Tools (executable functions)
- Adopted by Claude Desktop, Cursor, Windsurf, Zed, and 20+ other tools by late 2025
- Enables 'composable AI': mix and match context providers across tools
Claude Code: Keep the Context Clean
Explains the hidden context cost of MCP integrations. Each MCP server's tools are injected into every prompt, consuming context window. Power users should disable unused servers and trim exposed tools to maximize available context for actual work.
- MCP tools are injected into prompt on every request
- A couple MCP servers can eat 50% of context window before you type
- Disable MCP servers you don't use frequently
If you use MCP: Are you aware of MCP security risks (prompt injection, tool poisoning)?
Sources 2
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
First systematic academic study of MCP security risks. Identifies attack vectors including tool poisoning, puppet attacks, rug pull attacks. 43% of tested implementations had command injection flaws. Critical for adoption dimension's MCP security awareness questions.
- Systematic security analysis of MCP lifecycle (creation, deployment, operation, maintenance)
- 16 key activities identified with security implications
- Four major attacker types: malicious developers, external attackers, malicious users, others
Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol (MCP) Ecosystem
End-to-end empirical evaluation of MCP attack vectors. User study demonstrates that even experienced developers struggle to identify malicious MCP servers. Critical CVE with CVSS 9.6 shows real-world risk.
- Four categories of attacks: Tool Poisoning, Puppet Attacks, Rug Pull Attacks, Exploitation via Malicious External Resources
- User study with 20 participants showed users struggle to identify malicious MCP servers
- Users often unknowingly install malicious servers from aggregator platforms
Advanced Multi-Session Workflows
Proficiency with spec-driven development, orchestrated agents, and persistent session management
Simple chat-based AI assistance is table stakes. Advanced workflows deliver measurable results: GitHub Spec Kit achieves 95%+ first-attempt accuracy with detailed specs, Rakuten reduced development timelines from 24 to 5 days (79% reduction) using structured agent workflows, and 7 hours of sustained autonomous coding is now achievable on complex projects. These patterns enable longer-horizon, more complex tasks with dramatically better outcomes.
Are you familiar with GitHub Spec Kit for spec-driven development?
Sources 1
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
When using spec-driven development, which phases do you follow?
Sources 1
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
Do you use multi-agent orchestration patterns?
Sources 1
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering full SDLC: requirements → architecture → implementation → testing
- Model routing by task complexity: Haiku for routine, Sonnet for analysis, Opus for novel problems
How do you handle long-horizon tasks that span multiple days or sessions?
Sources 1
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
Do you use AI-native task tracking tools designed for agent workflows?
Sources 1
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
How structured is your AI development workflow?
Sources 1
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
Agentic Workflow Supervision
Skill in supervising autonomous AI agents that make multi-file changes
Agent-first IDEs (Cursor 2.0, Google Antigravity, Devin 2.0) can now orchestrate multiple autonomous agents in parallel. Harness 2025: 92% of developers report AI increases the 'blast radius' from bad deployments. The new risks include review fatigue (impossible to review 500+ lines of diffs), agentic security threats (OWASP Top 10), and multi-agent coordination failures. Mature users know how to supervise agents, apply 'Least Agency' principles, and use layered verification (AI review + human review + tests).
How often do you use agent mode or autonomous coding features?
Sources 2
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously across a codebase
- Requires explicit approval before applying changes (diff review checkpoint)
- Supports iterative refinement: review, request changes, re-generate
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
What scope of tasks do you delegate to agent mode?
Sources 2
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering full SDLC: requirements → architecture → implementation → testing
- Model routing by task complexity: Haiku for routine, Sonnet for analysis, Opus for novel problems
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously across a codebase
- Requires explicit approval before applying changes (diff review checkpoint)
- Supports iterative refinement: review, request changes, re-generate
How do you review changes made by agent mode?
Sources 2
AI Code Review and the Best AI Code Review Tools in 2025
Comprehensive overview of AI code review tools and AI-reviewing-AI patterns. Key for agentic_supervision dimension - validates that AI reviewing AI is an emerging best practice.
- 84% of developers now using AI tools, 41% of code is AI-generated
- Leading AI review tools: CodeRabbit, Codacy Guardrails, Snyk DeepCode
- AI-to-AI review is an emerging pattern (AI reviews AI-generated code)
Multi-Agent Collaboration Best Practices
Official guidance on multi-agent patterns including debate (multiple models reviewing each other), TDD splits (one writes tests, another implements), and Architect/Implementer separation. Research shows diverse model debate (Claude + Gemini + GPT) achieves 91% on GSM-8K vs 82% for identical models.
- Separate Claude instances can communicate via shared scratchpads
- Multi-agent debate with diverse models outperforms single-model approaches
- Writer/Reviewer and TDD splits improve output quality
How often do you intervene during an agent mode session?
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How often do you reject or rollback agent mode changes?
Sources 1
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
For complex tasks spanning multiple sessions, how do you maintain context?
Sources 2
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering full SDLC: requirements → architecture → implementation → testing
- Model routing by task complexity: Haiku for routine, Sonnet for analysis, Opus for novel problems
Do you review agent execution plans BEFORE letting agents make changes?
Sources 3
Devin 2.0: Performance Review and Enterprise Metrics
Devin 2.0 shows maturation of autonomous coding agents: 4x faster, 67% merge rate, enterprise adoption (Goldman Sachs, Nubank). The $20/month pricing democratizes access. The 'interactive planning' feature addresses human oversight concerns. Critical for understanding the enterprise autonomous coding landscape.
- Devin 2.0 released April 2025 with $20/month Core plan (down from $500)
- 4x faster problem solving, 2x more efficient resource consumption
- 67% PR merge rate (up from 34% in 2024)
Google Antigravity: Agent-First Development Platform
Google Antigravity represents the 'agent-first IDE' paradigm: agents work autonomously while humans supervise via Manager view. The Artifacts system addresses trust by making agent reasoning visible. Multi-model support (Gemini, Claude, GPT) shows the future is model-agnostic. Critical for agentic_supervision dimension.
- Announced November 18, 2025 alongside Gemini 3
- Agent-first IDE paradigm (vs AI-assisted coding)
- Two views: Editor view (traditional) and Manager view (agent orchestration)
Claude Code: Best practices for agentic coding
Official best practices for Claude Code agentic development. Key for agentic_supervision dimension - demonstrates multi-file autonomous editing capabilities and supervision approaches.
- Claude Code is a command line tool for agentic coding
- CLAUDE.md provides project-specific context and instructions
- Plan mode shows intentions before making changes
Do you apply 'Least Agency' security principles when using AI agents?
Sources 1
OWASP Top 10 for Agentic Applications 2026
The first OWASP security framework specifically for agentic AI systems. The 'Least Agency' principle is critical for our agentic_supervision dimension. Key risks (goal hijacking, tool misuse, rogue agents) directly inform supervision recommendations. Released December 2025, this is the authoritative security guide for AI coding agents.
- ASI01 - Agent Goal Hijack: Prompt injection manipulates agent goals
- ASI02 - Tool Misuse: Agents misuse legitimate tools for data exfiltration
- ASI03 - Identity & Privilege Abuse: Confused deputy and privilege escalation
How do you approach multi-agent workflows (multiple AI agents working in parallel)?
Sources 2
Cursor 2.0: Composer Model and Multi-Agent Architecture
Cursor 2.0 represents the shift to agent-first IDEs: purpose-built Composer model for low-latency coding, up to 8 parallel agents, native browser integration, sandboxed execution. The MoE architecture and RL training show vendor investment in specialized coding models. Critical for understanding the 'agentic IDE' paradigm.
- Cursor 2.0 released October 29, 2025
- Composer: first in-house large coding model, 4x faster than comparable models
- Multi-agent: up to 8 independent AI agents in parallel via git worktrees
Google Antigravity: Agent-First Development Platform
Google Antigravity represents the 'agent-first IDE' paradigm: agents work autonomously while humans supervise via Manager view. The Artifacts system addresses trust by making agent reasoning visible. Multi-model support (Gemini, Claude, GPT) shows the future is model-agnostic. Critical for agentic_supervision dimension.
- Announced November 18, 2025 alongside Gemini 3
- Agent-first IDE paradigm (vs AI-assisted coding)
- Two views: Editor view (traditional) and Manager view (agent orchestration)
Under what conditions do you use autonomous/YOLO modes (skip permission prompts)?
Sources 1
Taming Claude YOLO Mode with Safety Hooks
Practical guide to using YOLO mode safely via Claude Code's hook system. Rather than choosing between speed and safety, hooks intercept dangerous operations automatically. Key patterns: credential protection, destructive command blocking, Python-based filtering with regex.
- Pre-tool-use hooks intercept operations before execution (exit 0 allows, exit 2 blocks)
- Protect credential files: block .env access while permitting .env.sample templates
- Block destructive patterns: rm -rf /, git push --force, chmod 777
Appropriate Non-Use & Managed Use
Recognition of when NOT to use AI assistance, or when to use specialized/security-tuned tools
The constraint is human review capacity, not AI capability. Opsera data shows regulated industries appropriately calibrate: insurance (50% acceptance), banking (55%), healthcare (60%) vs. startups (75%). High-stakes code requires calibrated processes—extra review time, expertise matching—not blanket prohibitions.
Are there specific types of code where you deliberately avoid or limit Copilot use?
Sources 2
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How does your team handle code review for high-stakes changes?
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
Is AI Creating a New Code Review Bottleneck for Senior Engineers?
Documents the emerging 'AI Productivity Paradox': AI increases output but creates review bottlenecks. The 91% increase in PR review times despite 21% more tasks completed shows the shifting bottleneck problem. Critical for organizational integration dimension.
- Teams with heavy AI use completed 21% more tasks but PR review times increased 91%
- 67% of developers spend more time debugging AI-generated code
- 68% note increased time spent on code reviews
If you work in healthcare or with PHI: Are you aware of the HIPAA implications?
Sources 1
Microsoft HIPAA/HITECH Compliance Documentation
GitHub doesn't sign BAAs for Copilot, meaning prompts containing actual PHI would create compliance exposure. However, the concern is data in prompts, not code logic - writing code that handles PHI (without embedding patient data in prompts) is generally safe. Note: Microsoft 365 Copilot (for Word/Excel/Teams) is a different product from GitHub Copilot (for code) and is not a substitute for code completion needs.
- GitHub doesn't sign BAAs for Copilot products
- Microsoft 365 Copilot (Word/Excel/Teams AI, NOT code completion) can be BAA-covered
- Compliance concern is PHI in prompts/context, not code that handles PHI
Do you work in a regulated industry, and if so, how mature is your AI compliance?
Sources 2
Regulation (EU) 2024/1689 - Artificial Intelligence Act
The EU AI Act establishes a comprehensive risk-based regulatory framework for AI systems, classifying them into prohibited, high-risk, and general-risk categories with varying compliance requirements. Enforcement begins in 2025, with organizations using AI coding tools needing to assess whether their implementations fall under high-risk categories. This regulation sets global precedent for AI governance and directly impacts how development teams can deploy AI-assisted development tools.
- Enforcement begins 2025
- Regulates AI systems by risk level
- Applies to AI coding tools used in EU
AI Act Implementation Timeline
EU AI Act enforcement began in 2025 with prohibited practices and GPAI rules. Full application by 2026. Critical for appropriate_nonuse and legal_compliance dimensions.
- EU AI Act entered into force August 1, 2024
- Prohibited AI practices effective February 2, 2025
- GPAI model obligations effective August 2, 2025
Do you work on safety-critical systems where AI code errors could cause physical harm?
Sources 2
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
Regulation (EU) 2024/1689 - Artificial Intelligence Act
The EU AI Act establishes a comprehensive risk-based regulatory framework for AI systems, classifying them into prohibited, high-risk, and general-risk categories with varying compliance requirements. Enforcement begins in 2025, with organizations using AI coding tools needing to assess whether their implementations fall under high-risk categories. This regulation sets global precedent for AI governance and directly impacts how development teams can deploy AI-assisted development tools.
- Enforcement begins 2025
- Regulates AI systems by risk level
- Applies to AI coding tools used in EU
How well do you understand Copilot's limitations for your specific tech stack and work?
Sources 2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
How do you ensure your coding skills don't atrophy from AI reliance?
Sources 2
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
Context Curation & Specification
Skill in providing effective context, specifications, and configuration for AI tools
With 1M+ token context windows (Gemini) and automatic chain-of-thought (Opus 4.5, o3), 'prompt engineering' has evolved into 'context curation'. The skill is no longer tricking the model—it's providing the right codebase context, creating rule files, and writing clear specifications. Context engineering is replacing prompt engineering as the control mechanism for production AI.
What scope of codebase context do you typically provide to AI tools?
Sources 2
GitHub Copilot Documentation - Supported AI Models
GitHub Copilot now supports multiple AI models including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro, allowing developers to switch models per conversation based on task requirements. This multi-model capability is foundational for model routing maturity - understanding that different models excel at different tasks enables cost-effective and capability-optimized AI development workflows.
- Supports Claude 3.5/4 Sonnet, GPT-4o, o1, Gemini 2.0 Flash with per-conversation switching
- Different models have different strengths: Claude for nuanced reasoning, GPT for breadth, Gemini for speed
- Model selection persists per chat session, enabling task-appropriate routing
Cursor Documentation - Rules for AI
Cursor's Rules for AI feature enables project-level AI configuration via .cursorrules files, allowing teams to define coding standards, conventions, and context that the AI follows automatically. Similar to GitHub's copilot-instructions.md but for the Cursor IDE ecosystem. Key for context_curation dimension - demonstrates cross-tool pattern of configuration-driven AI behavior.
- .cursorrules defines project-level AI behavior: coding standards, language conventions, forbidden patterns
- Supports @file references to include additional context (e.g., @docs/architecture.md)
- Rules apply automatically to all AI interactions in the project directory
Do you use visual context (screenshots, diagrams, images) when working with AI tools?
Sources 1
Power User AI Coding Workflow Tips
Practical power user tips for AI coding workflows. Key insights: (1) paste screenshots instead of describing bugs in text, (2) create reusable slash commands for repeated workflows. Both patterns dramatically reduce friction in AI-assisted development.
- Paste screenshots liberally - models handle images extremely well
- Create custom slash commands for repeated workflows
- Screenshot error messages, UI mockups, architecture diagrams
How do you manage project-level AI configuration files?
Sources 3
Cursor Documentation - Rules for AI
Cursor's Rules for AI feature enables project-level AI configuration via .cursorrules files, allowing teams to define coding standards, conventions, and context that the AI follows automatically. Similar to GitHub's copilot-instructions.md but for the Cursor IDE ecosystem. Key for context_curation dimension - demonstrates cross-tool pattern of configuration-driven AI behavior.
- .cursorrules defines project-level AI behavior: coding standards, language conventions, forbidden patterns
- Supports @file references to include additional context (e.g., @docs/architecture.md)
- Rules apply automatically to all AI interactions in the project directory
How Many Instructions Can LLMs Follow?
Empirical research on LLM instruction-following limits. Key finding: even frontier models reliably follow only 150-200 instructions. Implications: keep instruction files focused, use hierarchical structure, include only genuinely useful guidance.
- Frontier LLMs can follow ~150-200 instructions with reasonable consistency
- Smaller models degrade much more quickly with instruction count
- Keep CLAUDE.md/instructions files under 500 lines for reliable adherence
Power User AI Coding Workflow Tips
Practical power user tips for AI coding workflows. Key insights: (1) paste screenshots instead of describing bugs in text, (2) create reusable slash commands for repeated workflows. Both patterns dramatically reduce friction in AI-assisted development.
- Paste screenshots liberally - models handle images extremely well
- Create custom slash commands for repeated workflows
- Screenshot error messages, UI mockups, architecture diagrams
What level of specification do you provide when asking AI to implement features?
Sources 2
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering full SDLC: requirements → architecture → implementation → testing
- Model routing by task complexity: Haiku for routine, Sonnet for analysis, Opus for novel problems
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
When AI output doesn't meet your needs, what do you typically do?
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
For complex tasks, how do you structure your AI interactions?
Sources 2
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering full SDLC: requirements → architecture → implementation → testing
- Model routing by task complexity: Haiku for routine, Sonnet for analysis, Opus for novel problems
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
How do you manage AI context during long or complex coding sessions?
Sources 2
How I Use Every Claude Code Feature
Introduces the Document & Clear pattern for managing AI context. Key insight: rather than trusting automatic context compaction, explicitly clear context and persist important state to external files. This produces more reliable outputs and gives you control over what the AI 'remembers'.
- Document & Clear method: dump plan to .md file, /clear, restart with file reference
- Auto-compaction is opaque, error-prone, and not well-optimized - avoid /compact
- Use /clear + /catchup for simple reboots (custom command reads uncommitted changes)
Advanced Context Engineering for Coding Agents
Introduces the Research → Plan → Implement workflow for AI-assisted development. Key insight: reviewing plans is higher leverage than reviewing code. The workflow explicitly clears context between phases to maintain focus and quality.
- Research → Plan → Implement workflow produces better results than direct coding
- Keep context utilization in the 40-60% range depending on complexity
- A bad line of a plan could lead to hundreds of bad lines of code - review plans first
For complex, unfamiliar tasks, do you run exploratory AI attempts first?
Sources 1
The 7 Prompting Habits of Highly Effective Engineers
Comprehensive framework for AI-assisted development with 7 named habits. Key insight: 'failure is cheap' with agents - use scouts, iterate fast, go concurrent. The scout pattern (habit #3) is foundational: run throwaway attempts to discover complexity before committing.
- 1. Draw the Owl: Make an example change, ask agent to finish the job
- 2. Use a checklist: Break work into checklist, have agent work through it
- 3. Send out a scout: Run throwaway attempt to find where complexity lies
What workflow do you use for complex AI-assisted development tasks?
Sources 1
Advanced Context Engineering for Coding Agents
Introduces the Research → Plan → Implement workflow for AI-assisted development. Key insight: reviewing plans is higher leverage than reviewing code. The workflow explicitly clears context between phases to maintain focus and quality.
- Research → Plan → Implement workflow produces better results than direct coding
- Keep context utilization in the 40-60% range depending on complexity
- A bad line of a plan could lead to hundreds of bad lines of code - review plans first
How do you use AI research capabilities (web search, citations) during planning phases?
Sources 1
Advanced Context Engineering for Coding Agents
Introduces the Research → Plan → Implement workflow for AI-assisted development. Key insight: reviewing plans is higher leverage than reviewing code. The workflow explicitly clears context between phases to maintain focus and quality.
- Research → Plan → Implement workflow produces better results than direct coding
- Keep context utilization in the 40-60% range depending on complexity
- A bad line of a plan could lead to hundreds of bad lines of code - review plans first
Legacy System & Specialized Use Cases
Effective use of AI for legacy code modernization, COBOL migration, and complex codebase understanding
Legacy systems remain critical enterprise infrastructure. AI tools like GitHub Copilot and Claude Code are increasingly used for legacy code comprehension, documentation, and modernization—enabling organizations to reduce modernization timelines significantly.
Do you work with legacy codebases (COBOL, older Java, mainframe systems)?
Sources 1
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How do you use AI tools for legacy code?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
GitHub Copilot Documentation - Supported AI Models
GitHub Copilot now supports multiple AI models including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro, allowing developers to switch models per conversation based on task requirements. This multi-model capability is foundational for model routing maturity - understanding that different models excel at different tasks enables cost-effective and capability-optimized AI development workflows.
- Supports Claude 3.5/4 Sonnet, GPT-4o, o1, Gemini 2.0 Flash with per-conversation switching
- Different models have different strengths: Claude for nuanced reasoning, GPT for breadth, Gemini for speed
- Model selection persists per chat session, enabling task-appropriate routing
For AI-assisted legacy migration, what verification do you apply?
Sources 1
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
How do you use AI to document legacy code?
Sources 2
Long context | Gemini API Documentation
Gemini's 1M token context window is among the largest available, enabling whole-codebase understanding. Key for context_curation and model_routing dimensions.
- Gemini models support up to 1M token context window (1,048,576 tokens)
- Can process hours of video, audio, and 60,000+ lines of code in single context
- Gemini 2.5 Pro, 2.5 Flash, 3.0 Pro, 3.0 Flash all support 1M tokens
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
Legal & IP Compliance
Awareness and management of legal risks including copyright, licensing, and IP ownership for AI-generated code
35% of AI code samples have licensing irregularities (Software Freedom Conservancy). GitHub Copilot litigation is ongoing. US Copyright Office clarified AI training fair use in May 2025. Unmanaged legal risk can result in lawsuits and IP contamination.
Are you aware of the legal/IP risks of using AI-generated code?
Sources 3
Doe v. GitHub, Inc. - GitHub Copilot Class Action Lawsuit
Doe v. GitHub is a class action lawsuit alleging GitHub Copilot violated the DMCA and open-source licenses by training on publicly available code without proper attribution or consent. Most copyright claims were dismissed in May 2023, but breach of contract and open-source license violation claims survive, with an interlocutory appeal filed to the Ninth Circuit in October 2024. Outcome will establish legal precedent for how AI systems can use open-source materials.
- Ongoing litigation regarding Copilot training data
- Claims of copyright infringement
Copyright and Artificial Intelligence Study
This is an ongoing multi-part study by the Copyright Office. Part 2 (Copyrightability) is most relevant - it addresses whether AI-generated outputs can receive copyright protection. Part 3 (Generative AI Training) addresses fair use of copyrighted materials in training. The study is still evolving, so conclusions should be referenced with caution as 'current guidance' rather than final rulings.
- Part 1 (July 2024): Digital Replicas
- Part 2 (January 2025): Copyrightability of AI outputs
- Part 3 (May 2025 pre-publication): Generative AI Training
Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.
In Thomson Reuters v. Ross Intelligence (2025), a US court ruled that training AI systems on copyrighted content (Westlaw's legal headnotes) does not qualify as fair use - the first judicial rejection of fair use defense in an AI training context. The ruling establishes that embedding copyrighted works into training data cannot be justified as fair use merely because AI outputs don't expose the original material. Critical precedent for legal_compliance dimension.
- Set precedent for AI training on copyrighted content
Does your organization have legal policies for AI-generated code?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
GitHub Copilot IP Indemnification
GitHub Copilot Business and Enterprise tiers provide IP indemnification coverage up to $500,000 against copyright infringement claims for unmodified suggestions when public code filtering is enabled, while Individual and Pro tiers lack this protection. For enterprises evaluating AI adoption, tier-based indemnification represents a critical liability control mechanism. Key for legal_compliance dimension.
- GitHub Copilot Business/Enterprise includes IP indemnification
- Individual/free tier does not include indemnification
Do you scan AI-generated code for license contamination?
Sources 1
SFC Comments to US Copyright Office on Generative AI and Copyleft
The Software Freedom Conservancy's audit found that 35% of AI code samples have licensing irregularities, raising significant copyleft compliance concerns for organizations using AI coding tools. This is critical context for legal_compliance dimension - it quantifies the actual license compliance risk in AI-generated code.
- 35% of AI code samples have licensing irregularities
Does your AI coding tool vendor provide IP indemnification?
Sources 1
GitHub Copilot IP Indemnification
GitHub Copilot Business and Enterprise tiers provide IP indemnification coverage up to $500,000 against copyright infringement claims for unmodified suggestions when public code filtering is enabled, while Individual and Pro tiers lack this protection. For enterprises evaluating AI adoption, tier-based indemnification represents a critical liability control mechanism. Key for legal_compliance dimension.
- GitHub Copilot Business/Enterprise includes IP indemnification
- Individual/free tier does not include indemnification
Do you document human involvement in AI-assisted code for copyright purposes?
Sources 1
Copyright and Artificial Intelligence Study
This is an ongoing multi-part study by the Copyright Office. Part 2 (Copyrightability) is most relevant - it addresses whether AI-generated outputs can receive copyright protection. Part 3 (Generative AI Training) addresses fair use of copyrighted materials in training. The study is still evolving, so conclusions should be referenced with caution as 'current guidance' rather than final rulings.
- Part 1 (July 2024): Digital Replicas
- Part 2 (January 2025): Copyrightability of AI outputs
- Part 3 (May 2025 pre-publication): Generative AI Training
Model Selection & Routing
Skill in selecting the right AI model for each task type
GitHub Copilot now supports Claude (Opus/Sonnet), GPT-4, and Gemini. Cursor and Windsurf offer even more options. Mature users know that different models excel at different tasks, and the 5x cost difference between Opus and Haiku matters. IDC predicts 70% of top enterprises will use dynamic model routing by 2028.
Are you aware that GitHub Copilot supports multiple AI models (Claude, GPT, Gemini)?
Sources 1
GitHub Copilot Multi-Model Support
GitHub Copilot's multi-model support enables developers to choose the best model for each task. Key for model_routing dimension.
- GitHub Copilot supports multiple AI models from Anthropic, Google, and OpenAI
- GitHub CEO: 'The era of a single model is over'
- Developers can toggle between models during conversation
How do you select AI models for different coding tasks?
Sources 3
Developers are choosing older AI models — and the data explain why
Data-driven analysis showing that in production environments, developers are diversifying model usage rather than consolidating around the newest option. Sonnet 4.5 excels at multi-file reasoning but introduces latency; Sonnet 4.0 is faster and more consistent for structured tasks; GPT-5 excels at explanatory contexts. This supports the need for model routing strategies rather than single-model approaches.
- Model adoption is diversifying, not consolidating around one 'best' model
- Developers match models to specific task profiles rather than always using newest
- Sonnet 4.5 share dropped from 66% to 52% while Sonnet 4.0 rose from 23% to 37%
Anthropic API Pricing
Claude's tiered pricing shows a 5x cost difference between Opus ($5/$25) and Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
- Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
- Claude Sonnet 4.5: $3/$15 per million tokens
- Claude Haiku 4.5: $1/$5 per million tokens
Why the Future of AI Lies in Model Routing
IDC predicts model routing will become standard in enterprise AI. Different models excel at different tasks, and using a single model for everything means suboptimal results. Heavy users have the most to gain from routing.
- By 2028, 70% of top AI-driven enterprises will use multi-tool architectures with dynamic model routing
- AI models work best when somewhat specialized for targeted use cases
- Even SOTA models are delivered as mixtures of experts with routing
Which of the following model-task pairings do you use?
Sources 4
Claude Opus 4.5
Claude Opus 4.5 sets a new bar for AI coding with 80.9% SWE-bench Verified. Key for model_routing dimension - represents current state-of-the-art for complex coding tasks.
- 80.9% on SWE-bench Verified - first AI model over 80%
- 66.3% on OSWorld (best computer-using model)
- 89.4% on Aider Polyglot Coding benchmark
Long context | Gemini API Documentation
Gemini's 1M token context window is among the largest available, enabling whole-codebase understanding. Key for context_curation and model_routing dimensions.
- Gemini models support up to 1M token context window (1,048,576 tokens)
- Can process hours of video, audio, and 60,000+ lines of code in single context
- Gemini 2.5 Pro, 2.5 Flash, 3.0 Pro, 3.0 Flash all support 1M tokens
Gemini 2.5 and 3: Thinking Models with Deep Think
Gemini 2.5 Pro (March 2025) introduced 'thinking models' with 1M context. Deep Think mode extends inference time for complex reasoning tasks, achieving Bronze IMO performance. Gemini 3 Pro announced November 2025 replaces 2.5 as the flagship. Critical for understanding the 'reasoning model' paradigm shift and extended thinking capabilities.
- Gemini 2.5 Pro released March 2025 with 1M token context window
- Deep Think mode uses extended inference time for complex reasoning
- Bronze-level performance on 2025 IMO benchmark
OpenAI o3, o4-mini, GPT-5-Codex, and Codex Platform
OpenAI's 2025 agentic coding stack evolved rapidly: o3 (April 2025, 71.7% SWE-bench) → codex-1 (72.1%/83.8%) → GPT-5-Codex (September 2025, 74.5%). The Codex platform represents OpenAI's vision for cloud-based multi-agent software engineering. Note: o3/o4-mini benchmarks are superseded by GPT-5-Codex for current capabilities.
- o3 released April 16, 2025 with 71.7% on SWE-bench Verified
- o4-mini released as successor to o3-mini
- Codex platform: cloud-based multi-agent software engineering
Are you aware of the cost differences between AI models?
Sources 1
Anthropic API Pricing
Claude's tiered pricing shows a 5x cost difference between Opus ($5/$25) and Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
- Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
- Claude Sonnet 4.5: $3/$15 per million tokens
- Claude Haiku 4.5: $1/$5 per million tokens
When selecting a model for a task, do you consider speed/latency tradeoffs?
Sources 1
Developers are choosing older AI models — and the data explain why
Data-driven analysis showing that in production environments, developers are diversifying model usage rather than consolidating around the newest option. Sonnet 4.5 excels at multi-file reasoning but introduces latency; Sonnet 4.0 is faster and more consistent for structured tasks; GPT-5 excels at explanatory contexts. This supports the need for model routing strategies rather than single-model approaches.
- Model adoption is diversifying, not consolidating around one 'best' model
- Developers match models to specific task profiles rather than always using newest
- Sonnet 4.5 share dropped from 66% to 52% while Sonnet 4.0 rose from 23% to 37%
Organizational Integration
Team and organizational practices for AI-assisted development, including agentic workflows
Individual skill without team practices leads to inconsistent quality. DORA 2025: AI amplifies existing capabilities—strong teams get stronger, struggling teams get worse. Microsoft research via GitHub: It takes 11 weeks for developers to fully realize productivity gains from AI tools—plan for this ramp-up when calculating ROI.
Does your team have guidelines for when/how to use Copilot?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
How does your team handle code review for AI-generated code?
Sources 2
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
Does your organization have policies for agent mode / autonomous AI coding?
Sources 2
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously across a codebase
- Requires explicit approval before applying changes (diff review checkpoint)
- Supports iterative refinement: review, request changes, re-generate
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
What AI coding tool training have you received?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
How does your team share AI coding tips, tricks, and learnings?
Sources 2
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Does AI-generated code have different testing requirements?
Sources 1
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
Does your organization track AI coding tool effectiveness?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Cognition-Windsurf Acquisition and Consolidation
The July 2025 Cognition-Windsurf deal illustrates rapid AI coding market consolidation. The bidding war (OpenAI $3B, Google $2.4B acqui-hire, Cognition acquisition) shows the strategic value of AI coding tools. Cognition's $10.2B valuation post-merger signals enterprise confidence in agentic coding.
- Cognition acquired Windsurf July 2025 after Google hired CEO in $2.4B deal
- OpenAI's $3B Windsurf offer expired just hours before Google deal
- Acquisition included IP, trademark, product, and $82M ARR
Does your organization provide dedicated time for AI tool learning?
Sources 1
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Has your organization redesigned workflows specifically for AI tools?
Sources 2
The State of AI in 2025: Agents, Innovation, and Transformation
McKinsey's 2025 survey shows AI use is common (88%) but enterprise value capture is rare (only 39% see EBIT impact). The key differentiator is workflow redesign - high performers are 3x more likely to fundamentally redesign workflows. The 62% experimenting with agents stat is critical for agentic_supervision. Key insight: most organizations are still in pilots, not scaled adoption.
- 88% report regular AI use in at least one business function (up from 78%)
- Nearly two-thirds still in experimentation or piloting phases
- 62% experimenting with AI agents; 23% scaling agents
Claude Code in Slack: Agentic Coding Integration
Claude Code's Slack integration represents the 'ambient AI' pattern: AI agents triggered from natural team conversations, not dedicated coding interfaces. The $1B revenue milestone and enterprise customers (Netflix, Spotify) validate the market. Rakuten's 79% timeline reduction is a standout case study.
- Claude Code in Slack launched December 8, 2025 as research preview
- @Claude tag routes coding tasks to Claude Code on web automatically
- Analyzes Slack context (bug reports, feature requests) for repository detection
Perceived Outcomes & Satisfaction
Self-reported productivity and satisfaction metrics (SPACE framework)
These don't measure maturity directly, but they indicate whether Copilot is delivering value. DORA 2025: AI adoption now positively correlates with throughput (unlike 2024) but still negatively with stability.
Copilot helps me complete tasks faster.
Sources 2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
Copilot helps me stay in flow when coding.
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
Compared to coding without Copilot, how much time do you estimate you save per week?
Sources 2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
The State of Developer Ecosystem 2025
The 2025 JetBrains survey of 24,534 developers shows AI tools have become mainstream (85% regular usage). The 68% expecting AI proficiency to become a job requirement is critical for skill development. The finding that 19% save 8+ hours/week (up from 9% in 2024) shows productivity gains are real for power users. Key insight: developers want AI for mundane tasks but want control of creative/complex work.
- 85% of developers regularly use AI tools for coding and development
- 62% use at least one AI coding assistant, agent, or code editor
- 68% expect AI proficiency will become a job requirement
I enjoy my work more when using Copilot.
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
The code I produce with Copilot is high quality.
Sources 1
State of AI vs Human Code Generation Report
This is the most rigorous empirical comparison of AI vs human code quality to date. The 1.7x issue rate and specific vulnerability multipliers (2.74x XSS, 1.88x password handling) are critical for trust_calibration recommendations. Key insight: AI makes the same kinds of mistakes humans do, just more often at larger scale. The 8x I/O performance issue rate shows AI favors simple patterns over efficiency.
- AI-generated PRs contain 1.7x more issues overall (10.83 vs 6.45 issues per PR)
- AI PRs show 1.4-1.7x more critical and major issues
- Logic and correctness issues 75% more common in AI PRs
Have you ever measured your actual productivity with vs. without AI tools?
Sources 1
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
Does your organization track business ROI from AI coding tools (not just usage)?
Sources 1
The State of AI in 2025: Agents, Innovation, and Transformation
McKinsey's 2025 survey shows AI use is common (88%) but enterprise value capture is rare (only 39% see EBIT impact). The key differentiator is workflow redesign - high performers are 3x more likely to fundamentally redesign workflows. The 62% experimenting with agents stat is critical for agentic_supervision. Key insight: most organizations are still in pilots, not scaled adoption.
- 88% report regular AI use in at least one business function (up from 78%)
- Nearly two-thirds still in experimentation or piloting phases
- 62% experimenting with AI agents; 23% scaling agents
Has your trust in AI coding tools changed over the past 12 months?
Sources 1
2025 Stack Overflow Developer Survey
2025 Stack Overflow survey shows continued adoption (84%) but declining trust (60% positive, down from 70%+). Key insight: developers are using AI more but trusting it less. 35% use Stack Overflow as fallback when AI fails.
- 84% of respondents using or planning to use AI tools (up from 76% in 2024)
- 51% of professional developers use AI tools daily
- Positive sentiment dropped from 70%+ (2023-2024) to 60% (2025)
Team Composition & Talent Development
How AI adoption is affecting team structure, junior developer development, and hiring practices
AI presents both risks AND opportunities for junior developer development. While Stanford research shows employment shifts in AI-exposed roles, GitHub research shows juniors who adapt—supervising AI output, maintaining curiosity, building fundamentals—can thrive. The skill surface shifts from 'write code' to 'verify and supervise AI code'. Teams that invest in updated mentorship create accelerated apprenticeships where juniors gain exposure to architecture and system design earlier.
How has your team's junior-to-senior developer ratio changed with AI adoption?
Sources 2
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence
First rigorous study of AI's labor market effects using real-time payroll data. Key insight: AI is already affecting employment, but asymmetrically - young workers in automatable roles are hit hardest (13% decline ages 22-25), while experienced workers are stable/growing. The automation vs augmentation distinction is critical: roles where AI replaces tasks are declining, roles where AI enhances human judgment are not. This is NOT about whether to use AI - it's about career positioning.
- Early-career workers (ages 22-25) in AI-exposed occupations experienced 13% relative employment decline
- Experienced workers in AI-exposed roles remained stable or grew
- Young software developers down nearly 20% from late 2022 peak by July 2025
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How do you ensure junior developers build foundational skills despite AI availability?
Sources 2
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
Junior developers aren't obsolete: Here's how to thrive in the age of AI
GitHub's perspective on how junior developers can thrive with AI. Key insight: AI changes expectations - juniors must supervise AI output, not blindly accept it. Fundamentals and critical thinking remain essential.
- AI is closing skill gaps but creating new expectations for juniors
- Juniors expected to supervise AI's work, not just accept it blindly
- Think critically about AI-generated code, stay curious
How has AI affected your hiring criteria?
Sources 2
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence
First rigorous study of AI's labor market effects using real-time payroll data. Key insight: AI is already affecting employment, but asymmetrically - young workers in automatable roles are hit hardest (13% decline ages 22-25), while experienced workers are stable/growing. The automation vs augmentation distinction is critical: roles where AI replaces tasks are declining, roles where AI enhances human judgment are not. This is NOT about whether to use AI - it's about career positioning.
- Early-career workers (ages 22-25) in AI-exposed occupations experienced 13% relative employment decline
- Experienced workers in AI-exposed roles remained stable or grew
- Young software developers down nearly 20% from late 2022 peak by July 2025
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How is AI affecting productivity distribution across your team?
Sources 2
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
How has your mentorship approach adapted for AI-assisted development?
Sources 2
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
Junior developers aren't obsolete: Here's how to thrive in the age of AI
GitHub's perspective on how junior developers can thrive with AI. Key insight: AI changes expectations - juniors must supervise AI output, not blindly accept it. Fundamentals and critical thinking remain essential.
- AI is closing skill gaps but creating new expectations for juniors
- Juniors expected to supervise AI's work, not just accept it blindly
- Think critically about AI-generated code, stay curious
Are juniors using AI as a learning accelerator or a shortcut?
Sources 1
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
Trust Calibration & Verification
Appropriate skepticism and verification practices for AI-generated code, including managing review fatigue
Veracode 2025: AI-generated code introduced security flaws in 45% of controlled tests (unreviewed raw output; varies by language—Java 72%, Python 38%). Harness 2025: 67% of developers spend more time debugging AI-generated code than before. METR found experienced devs feel 20% faster while actually being 19% slower—a dangerous perception gap. Review fatigue is now a critical concern.
What do you typically do before accepting a Copilot suggestion?
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
Approximately what percentage of Copilot suggestions do you accept?
Sources 2
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
In the past month, how often did you discover an error in AI-generated code AFTER accepting it?
Sources 2
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
Are you aware that AI-generated code introduced security flaws in 45% of coding tests (Veracode 2025)?
Sources 1
October 2025 Update: GenAI Code Security Report
Primary source for AI code security statistics: 45% overall failure rate, 72% for Java specifically. The 'bigger models ≠ more secure code' finding is critical for model_routing - security scanning is needed regardless of model. Java's 72% rate makes it the riskiest language for AI-generated code.
- AI-generated code introduced risky security flaws in 45% of tests
- Java was the riskiest language with 72% security failure rate
- XSS (CWE-80) defense failed in 86% of relevant code samples
CodeRabbit 2025 found AI code has 2.74x more XSS, 8x more I/O performance issues. Do you check for these patterns?
Sources 1
State of AI vs Human Code Generation Report
This is the most rigorous empirical comparison of AI vs human code quality to date. The 1.7x issue rate and specific vulnerability multipliers (2.74x XSS, 1.88x password handling) are critical for trust_calibration recommendations. Key insight: AI makes the same kinds of mistakes humans do, just more often at larger scale. The 8x I/O performance issue rate shows AI favors simple patterns over efficiency.
- AI-generated PRs contain 1.7x more issues overall (10.83 vs 6.45 issues per PR)
- AI PRs show 1.4-1.7x more critical and major issues
- Logic and correctness issues 75% more common in AI PRs
GitClear found 8x increase in code duplication from AI tools. Do you monitor for this?
Sources 1
AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones
GitClear's research on 211M lines of code shows AI is changing how code is written - more duplication, less refactoring. NOTE: Title says '4x Growth' (2025 projection); key finding is '8x increase' (actual 2024 data). We cite the 8x figure as it's measured data. The decline in refactoring suggests AI makes it easier to add new code than reuse existing code (limited context). Critical for understanding long-term maintainability implications.
- 8x increase in duplicated code blocks during 2024
- Refactoring dropped from 25% of changed lines (2021) to <10% (2024)
- Copy/pasted (cloned) code rose from 8.3% to 12.3% (2021-2024)
With agent mode generating larger diffs, how do you manage review fatigue?
Sources 2
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
How often do you 'Accept All' AI changes without reading the diff? (Karpathy's 'vibe coding' pattern)
Sources 2
Vibe Coding Definition (Original Tweet)
This tweet coined the term 'vibe coding' on February 3, 2025, defining it as a programming style where you 'forget that the code even exists' and 'Accept All' without reading diffs. Critically, Karpathy explicitly limits this to 'throwaway weekend projects' - a nuance often missed in subsequent coverage. The full quote shows he acknowledges the code grows 'beyond my usual comprehension' and he works around bugs rather than fixing them. This is essential context for our trust_calibration dimension: even the person who coined the term warns it's not for production work.
- Coined term 'vibe coding' for accepting AI changes without reading
- Described code 'beyond your comprehension'
- Pattern of full trust in AI output
The vibe coding hangover is upon us
This article documents the real-world consequences of 'vibe coding' practices going wrong. The Tea App case study is particularly powerful: a dating app built with minimal oversight leaked 72,000 images including driver's licenses due to a misconfigured Firebase bucket - a basic security error that proper review would have caught. PayPal engineer Jack Hays calls AI-generated code 'development hell' to maintain. Stack Overflow data shows declining trust (46% distrust vs 33% trust) and positive sentiment falling from 70% to 60%. This is essential evidence for our trust_calibration and agentic_supervision dimensions.
- Documented 'vibe coding hangover' phenomenon
- Teams in 'development hell' from unreviewed AI code
- Tea App data breach: 72,000 sensitive images leaked from unsecured Firebase
If you sometimes accept without reading: In what contexts?
Sources 2
Vibe Coding Definition (Original Tweet)
This tweet coined the term 'vibe coding' on February 3, 2025, defining it as a programming style where you 'forget that the code even exists' and 'Accept All' without reading diffs. Critically, Karpathy explicitly limits this to 'throwaway weekend projects' - a nuance often missed in subsequent coverage. The full quote shows he acknowledges the code grows 'beyond my usual comprehension' and he works around bugs rather than fixing them. This is essential context for our trust_calibration dimension: even the person who coined the term warns it's not for production work.
- Coined term 'vibe coding' for accepting AI changes without reading
- Described code 'beyond your comprehension'
- Pattern of full trust in AI output
The vibe coding hangover is upon us
This article documents the real-world consequences of 'vibe coding' practices going wrong. The Tea App case study is particularly powerful: a dating app built with minimal oversight leaked 72,000 images including driver's licenses due to a misconfigured Firebase bucket - a basic security error that proper review would have caught. PayPal engineer Jack Hays calls AI-generated code 'development hell' to maintain. Stack Overflow data shows declining trust (46% distrust vs 33% trust) and positive sentiment falling from 70% to 60%. This is essential evidence for our trust_calibration and agentic_supervision dimensions.
- Documented 'vibe coding hangover' phenomenon
- Teams in 'development hell' from unreviewed AI code
- Tea App data breach: 72,000 sensitive images leaked from unsecured Firebase
Submit Incomplete Assessment?
Your results will be marked as partial and may not reflect your full maturity level.
Start Over?
This will clear all your answers and reset the assessment to the beginning.
This action cannot be undone.
About This Assessment
Section titled “About This Assessment”This assessment evaluates your team’s maturity across 12 research-backed dimensions of AI-assisted development:
- Foundational Adoption — Basic usage patterns and tool activation
- Context Curation — Providing AI with the right information
- Model Routing — Matching tasks to appropriate models
- Agentic Supervision — Managing autonomous AI workflows
- Trust Calibration — Appropriate trust and verification
- Appropriate Non-Use — Knowing when to skip AI
- Organizational Integration — Enterprise policies and governance
- Legal & Compliance — Regulatory and licensing considerations
- Team Composition — Experience balance across the team
- Legacy Modernization — Using AI for legacy code
- Advanced Workflows — Multi-model and multi-tool patterns
- Outcomes — Measuring actual results
Scoring Methodology
Section titled “Scoring Methodology”Each question is weighted based on 2025 industry research from:
- DORA State of AI-Assisted Software Development 2025
- Veracode GenAI Code Security Report 2025
- GitHub/Accenture Enterprise Research
- METR Developer Productivity Study
Your results include personalized recommendations based on your specific gaps.