Tempered AI Assessment
Choose Your Assessment
Take the full assessment for the most complete picture, or filter by role for a shorter experience.
▸ Or filter by role for a shorter assessment
12 dimensions, 87 questions
Foundational Adoption
Basic activation and consistent usage of Copilot tools across interfaces
You can't improve what you don't use. But adoption alone doesn't indicate maturity - it's table stakes. With 82-89% industry adoption in 2025, this dimension has decreased in weight. Gartner predicts 90% of enterprise software engineers will use AI assistants by 2028.
How many days per week do you typically use GitHub Copilot?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
Which Copilot interfaces do you actively use?
Note: Agentic interfaces score higher as they represent the 2025 paradigm shift
Sources 2
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously
- Supervision best practices
GitHub Copilot CLI Documentation
Copilot CLI brings agentic capabilities to the terminal, enabling natural language interaction with GitHub and code. Similar to Claude Code but with GitHub-native integration. Represents the shift toward terminal-based AI development workflows. Key for advanced workflow maturity assessment.
- Terminal-native agentic development
- GitHub integration (repos, issues, PRs) via natural language
- Preview every action before execution
When you use Copilot, how much of your coding session involves it?
Note: 75%+ usage no longer scores higher than 50% - excessive reliance can indicate over-trust
Sources 1
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
How many different AI coding tools does your team use?
Note: ~50% of teams use 2+ tools. Some diversity is healthy; too many creates inconsistency.
Sources 1
2025 Stack Overflow Developer Survey
2025 Stack Overflow survey shows continued adoption (84%) but declining trust (60% positive, down from 70%+). Key insight: developers are using AI more but trusting it less. 35% use Stack Overflow as fallback when AI fails.
- 84% of respondents using or planning to use AI tools (up from 76% in 2024)
- 51% of professional developers use AI tools daily
- Positive sentiment dropped from 70%+ (2023-2024) to 60% (2025)
Do you use MCP (Model Context Protocol) servers to connect AI tools to external data sources?
Note: MCP became industry standard in late 2025 (Linux Foundation/AAIF). Security review is critical. Context optimization (disabling unused servers) is a power user technique - MCP tools consume context on every request.
Sources 2
Model Context Protocol Specification
Model Context Protocol (MCP) is an open-source standard providing a 'USB-C port for AI applications' - a standardized protocol for connecting AI systems to external data sources, tools, and workflows. MCP became an industry standard in late 2025, enabling composable, interoperable AI systems. Foundational for advanced AI development maturity - assessing MCP adoption indicates sophistication in AI tool integration.
- MCP became industry standard in late 2025
- Protocol for connecting AI tools to external data
Claude Code: Keep the Context Clean
Explains the hidden context cost of MCP integrations. Each MCP server's tools are injected into every prompt, consuming context window. Power users should disable unused servers and trim exposed tools to maximize available context for actual work.
- MCP tools are injected into prompt on every request
- A couple MCP servers can eat 50% of context window before you type
- Disable MCP servers you don't use frequently
If you use MCP: Are you aware of MCP security risks (prompt injection, tool poisoning)?
Note: April 2025 security research identified tool poisoning, cross-server shadowing, and prompt injection as MCP attack vectors.
Sources 2
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
First systematic academic study of MCP security risks. Identifies attack vectors including tool poisoning, puppet attacks, rug pull attacks. 43% of tested implementations had command injection flaws. Critical for adoption dimension's MCP security awareness questions.
- Systematic security analysis of MCP lifecycle (creation, deployment, operation, maintenance)
- 16 key activities identified with security implications
- Four major attacker types: malicious developers, external attackers, malicious users, others
Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol (MCP) Ecosystem
End-to-end empirical evaluation of MCP attack vectors. User study demonstrates that even experienced developers struggle to identify malicious MCP servers. Critical CVE with CVSS 9.6 shows real-world risk.
- Four categories of attacks: Tool Poisoning, Puppet Attacks, Rug Pull Attacks, Exploitation via Malicious External Resources
- User study with 20 participants showed users struggle to identify malicious MCP servers
- Users often unknowingly install malicious servers from aggregator platforms
Advanced Multi-Session Workflows
Proficiency with spec-driven development, orchestrated agents, and persistent session management
Simple chat-based AI assistance is table stakes. Advanced workflows deliver measurable results: GitHub Spec Kit achieves 95%+ first-attempt accuracy with detailed specs, Rakuten reduced development timelines from 24 to 5 days (79% reduction) using structured agent workflows, and 7 hours of sustained autonomous coding is now achievable on complex projects. These patterns enable longer-horizon, more complex tasks with dramatically better outcomes.
Are you familiar with GitHub Spec Kit for spec-driven development?
Note: Spec Kit represents GitHub's recommended approach for production-quality AI-assisted development
Sources 1
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
When using spec-driven development, which phases do you follow?
Note: Full four-phase workflow (Specify → Plan → Tasks → Implement) ensures quality and traceability
Sources 1
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
Do you use multi-agent orchestration patterns?
Note: BMAD and similar frameworks use specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator
Sources 1
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering development scenarios
- Scale-adaptive intelligence adjusts to task complexity
How do you handle long-horizon tasks that span multiple days or sessions?
Note: Beads provides Git-backed, dependency-aware task tracking that surfaces 'ready' work automatically
Sources 1
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
Do you use AI-native task tracking tools designed for agent workflows?
Note: AI-native task trackers (Beads, Linear AI, etc.) are designed for agent workflows with features like dependency-aware task graphs and automatic context surfacing
Sources 1
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
How structured is your AI development workflow?
Note: Rigorous workflows with explicit phases and checkpoints correlate with higher code quality and fewer rework cycles
Sources 1
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
Agentic Workflow Supervision
Skill in supervising autonomous AI agents that make multi-file changes
Agent-first IDEs (Cursor 2.0, Google Antigravity, Devin 2.0) can now orchestrate multiple autonomous agents in parallel. The new risks include review fatigue (impossible to review 500+ lines of diffs), agentic security threats (OWASP Top 10), and multi-agent coordination failures. Mature users know how to supervise agents, apply 'Least Agency' principles, and use layered verification (AI review + human review + tests).
How often do you use agent mode or autonomous coding features?
Sources 2
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously
- Supervision best practices
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
What scope of tasks do you delegate to agent mode?
Sources 2
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering development scenarios
- Scale-adaptive intelligence adjusts to task complexity
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously
- Supervision best practices
How do you review changes made by agent mode?
Note: Using AI to review AI (e.g., asking Opus to review changes made by Flash) is an emerging best practice
Sources 2
AI Code Review and the Best AI Code Review Tools in 2025
Comprehensive overview of AI code review tools and AI-reviewing-AI patterns. Key for agentic_supervision dimension - validates that AI reviewing AI is an emerging best practice.
- 84% of developers now using AI tools, 41% of code is AI-generated
- Leading AI review tools: CodeRabbit, Codacy Guardrails, Snyk DeepCode
- AI-to-AI review is an emerging pattern (AI reviews AI-generated code)
Multi-Agent Collaboration Best Practices
Official guidance on multi-agent patterns including debate (multiple models reviewing each other), TDD splits (one writes tests, another implements), and Architect/Implementer separation. Research shows diverse model debate (Claude + Gemini + GPT) achieves 91% on GSM-8K vs 82% for identical models.
- Separate Claude instances can communicate via shared scratchpads
- Multi-agent debate with diverse models outperforms single-model approaches
- Writer/Reviewer and TDD splits improve output quality
How often do you intervene during an agent mode session?
Note: 'Pair programming' approach scores highest—active but not micromanaging
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How often do you reject or rollback agent mode changes?
Note: 10-30% rollback rate indicates calibrated supervision. Never or >50% suggests miscalibration.
Sources 1
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
For complex tasks spanning multiple sessions, how do you maintain context?
Note: Multi-session memory is a 2025 frontier capability. Beads (Steve Yegge) and BMAD represent mature approaches to context continuity.
Sources 2
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering development scenarios
- Scale-adaptive intelligence adjusts to task complexity
Do you review agent execution plans BEFORE letting agents make changes?
Note: Devin 2.0 introduced interactive planning: users can edit/approve execution plans before task execution. Google Antigravity uses Artifacts to make agent reasoning visible. Pre-execution review is a critical control.
Sources 3
Devin 2.0: Performance Review and Enterprise Metrics
Devin 2.0 shows maturation of autonomous coding agents: 4x faster, 67% merge rate, enterprise adoption (Goldman Sachs, Nubank). The $20/month pricing democratizes access. The 'interactive planning' feature addresses human oversight concerns. Critical for understanding the enterprise autonomous coding landscape.
- Devin 2.0 released April 2025 with $20/month Core plan (down from $500)
- 4x faster problem solving, 2x more efficient resource consumption
- 67% PR merge rate (up from 34% in 2024)
Google Antigravity: Agent-First Development Platform
Google Antigravity represents the 'agent-first IDE' paradigm: agents work autonomously while humans supervise via Manager view. The Artifacts system addresses trust by making agent reasoning visible. Multi-model support (Gemini, Claude, GPT) shows the future is model-agnostic. Critical for agentic_supervision dimension.
- Announced November 18, 2025 alongside Gemini 3
- Agent-first IDE paradigm (vs AI-assisted coding)
- Two views: Editor view (traditional) and Manager view (agent orchestration)
Claude Code: Best practices for agentic coding
Official best practices for Claude Code agentic development. Key for agentic_supervision dimension - demonstrates multi-file autonomous editing capabilities and supervision approaches.
- Claude Code is a command line tool for agentic coding
- CLAUDE.md provides project-specific context and instructions
- Plan mode shows intentions before making changes
Do you apply 'Least Agency' security principles when using AI agents?
Note: OWASP Top 10 for Agentic AI (Dec 2025) identifies critical risks: goal hijacking, tool misuse, privilege escalation. 'Least Agency' principle: only grant agents the minimum autonomy required. Layered controls combine scope limits + permissions + monitoring.
Sources 1
OWASP Top 10 for Agentic Applications 2026
The first OWASP security framework specifically for agentic AI systems. The 'Least Agency' principle is critical for our agentic_supervision dimension. Key risks (goal hijacking, tool misuse, rogue agents) directly inform supervision recommendations. Released December 2025, this is the authoritative security guide for AI coding agents.
- ASI01 - Agent Goal Hijack: Prompt injection manipulates agent goals
- ASI02 - Tool Misuse: Agents misuse legitimate tools for data exfiltration
- ASI03 - Identity & Privilege Abuse: Confused deputy and privilege escalation
How do you approach multi-agent workflows (multiple AI agents working in parallel)?
Note: Cursor 2.0's 8-parallel-agent architecture and Google Antigravity's Manager view represent the multi-agent frontier. Proper isolation (git worktrees) prevents agents from interfering with each other.
Sources 2
Cursor 2.0: Composer Model and Multi-Agent Architecture
Cursor 2.0 represents the shift to agent-first IDEs: purpose-built Composer model for low-latency coding, up to 8 parallel agents, native browser integration, sandboxed execution. The MoE architecture and RL training show vendor investment in specialized coding models. Critical for understanding the 'agentic IDE' paradigm.
- Cursor 2.0 released October 29, 2025
- Composer: first in-house large coding model, 4x faster than comparable models
- Multi-agent: up to 8 independent AI agents in parallel via git worktrees
Google Antigravity: Agent-First Development Platform
Google Antigravity represents the 'agent-first IDE' paradigm: agents work autonomously while humans supervise via Manager view. The Artifacts system addresses trust by making agent reasoning visible. Multi-model support (Gemini, Claude, GPT) shows the future is model-agnostic. Critical for agentic_supervision dimension.
- Announced November 18, 2025 alongside Gemini 3
- Agent-first IDE paradigm (vs AI-assisted coding)
- Two views: Editor view (traditional) and Manager view (agent orchestration)
Under what conditions do you use autonomous/YOLO modes (skip permission prompts)?
Note: YOLO/autonomous modes are powerful for prototyping but dangerous in production. Anthropic guidance: only in containers/VMs with network disabled. Sandboxed + safeguards scores highest.
Sources 1
YOLO Mode Safety Guidelines for AI Agents
Safety guidelines for running AI agents in autonomous mode. Key rule: only skip permission prompts in sandboxed environments (containers, VMs) without network access to production. Acceptable for prototyping; never for production work.
- YOLO mode (--dangerously-skip-permissions) only safe in sandboxed environments
- Never use with network access to sensitive systems
- Never use with access to production credentials
Appropriate Non-Use & Managed Use
Recognition of when NOT to use AI assistance, or when to use specialized/security-tuned tools
Security-sensitive code, complex business logic, and regulated environments require careful AI usage. But 2025 update: complete 'AI avoidance' is being replaced by 'managed use with specialized tools'—security-tuned agents can audit AI-generated code.
Are there specific types of code where you deliberately avoid or limit Copilot use?
Note: Updated for 2025: 'Managed use with specialized tools' scores higher than blanket avoidance
Sources 2
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
For which scenarios do you avoid or limit Copilot use?
Sources 1
HIPAA Security Rule Notice of Proposed Rulemaking to Strengthen Cybersecurity for Electronic Protected Health Information
IMPORTANT: This is a PROPOSED rule (NPRM), not a finalized regulation. Published December 27, 2024 with 60-day comment period. While it addresses cybersecurity broadly (encryption, MFA, audits), it does not specifically address AI coding tools. The relevance to our survey is about the broader compliance environment healthcare organizations must consider when using AI tools that might touch ePHI.
- PROPOSED rule (NPRM) - not yet finalized
- Major update to strengthen cybersecurity for ePHI
- Requires encryption of ePHI at rest and in transit
If you work in healthcare or with PHI: Are you aware of the HIPAA implications?
Note: GitHub Copilot is NOT HIPAA compliant. Microsoft 365 Copilot is covered under BAA with proper configuration.
Sources 1
Microsoft HIPAA/HITECH Compliance Documentation
Microsoft 365 Copilot is HIPAA compliant when organizations sign a Business Associate Agreement (BAA), but GitHub Copilot is explicitly NOT covered under BAA and cannot be used with Protected Health Information (PHI). This distinction is critical for healthcare developers - using non-compliant tools with PHI exposes organizations to regulatory penalties. Key for appropriate_nonuse dimension.
- Microsoft 365 Copilot is HIPAA compliant with BAA
- GitHub Copilot is NOT covered under BAA
Do you work in a regulated industry beyond healthcare?
Note: This is a classification question - no scoring. Used to trigger follow-up questions.
Sources 2
Regulation (EU) 2024/1689 - Artificial Intelligence Act
The EU AI Act establishes a comprehensive risk-based regulatory framework for AI systems, classifying them into prohibited, high-risk, and general-risk categories with varying compliance requirements. Enforcement begins in 2025, with organizations using AI coding tools needing to assess whether their implementations fall under high-risk categories. This regulation sets global precedent for AI governance and directly impacts how development teams can deploy AI-assisted development tools.
- Enforcement begins 2025
- Regulates AI systems by risk level
- Applies to AI coding tools used in EU
HIPAA Security Rule Notice of Proposed Rulemaking to Strengthen Cybersecurity for Electronic Protected Health Information
IMPORTANT: This is a PROPOSED rule (NPRM), not a finalized regulation. Published December 27, 2024 with 60-day comment period. While it addresses cybersecurity broadly (encryption, MFA, audits), it does not specifically address AI coding tools. The relevance to our survey is about the broader compliance environment healthcare organizations must consider when using AI tools that might touch ePHI.
- PROPOSED rule (NPRM) - not yet finalized
- Major update to strengthen cybersecurity for ePHI
- Requires encryption of ePHI at rest and in transit
If in a regulated industry: Does your organization have AI-specific compliance policies?
Note: Regulated industries face heightened AI compliance requirements. EU AI Act enforcement begins 2025.
Sources 1
AI Act Implementation Timeline
EU AI Act enforcement began in 2025 with prohibited practices and GPAI rules. Full application by 2026. Critical for appropriate_nonuse and legal_compliance dimensions.
- EU AI Act entered into force August 1, 2024
- Prohibited AI practices effective February 2, 2025
- GPAI model obligations effective August 2, 2025
Do you work on safety-critical systems where AI code errors could cause physical harm?
Note: Safety-critical systems (automotive, medical devices, aviation) require highest verification or AI prohibition.
Sources 2
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
Regulation (EU) 2024/1689 - Artificial Intelligence Act
The EU AI Act establishes a comprehensive risk-based regulatory framework for AI systems, classifying them into prohibited, high-risk, and general-risk categories with varying compliance requirements. Enforcement begins in 2025, with organizations using AI coding tools needing to assess whether their implementations fall under high-risk categories. This regulation sets global precedent for AI governance and directly impacts how development teams can deploy AI-assisted development tools.
- Enforcement begins 2025
- Regulates AI systems by risk level
- Applies to AI coding tools used in EU
How well do you understand Copilot's limitations for your specific tech stack and work?
Sources 2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
How do you ensure your coding skills don't atrophy from AI reliance?
Note: Addy Osmani's research shows skill atrophy is real. Junior devs especially at risk.
Sources 2
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
If you mentor junior developers: Do you guide them on appropriate AI use?
Note: Junior developer skill atrophy is a major 2025 concern. Mentors need to actively prevent over-reliance.
Sources 1
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence
This is the most credible source on AI's employment impact on junior developers. The 13% relative decline for ages 22-25 in AI-exposed roles is significant but more nuanced than previously cited '25% decrease'. Key insight: the impact is concentrated where AI automates rather than augments - this supports our team_composition dimension's focus on mentorship and skill development. Updated November 2025.
- Early-career workers (ages 22-25) in AI-exposed occupations experienced 13% relative decline in employment
- Adjustments occur primarily through employment rather than compensation
- Employment declines concentrated in occupations where AI automates rather than augments labor
Context Curation & Specification
Skill in providing effective context, specifications, and configuration for AI tools
With 1M+ token context windows (Gemini) and automatic chain-of-thought (Opus 4.5, o3), 'prompt engineering' has evolved into 'context curation'. The skill is no longer tricking the model—it's providing the right codebase context, creating rule files, and writing clear specifications. Context engineering is replacing prompt engineering as the control mechanism for production AI.
What scope of codebase context do you typically provide to AI tools?
Note: Measures progression from no context to full codebase awareness. Rules files and visual context are separate questions.
Sources 2
GitHub Copilot Documentation - Supported AI Models
GitHub Copilot now supports multiple AI models including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro, allowing developers to switch models per conversation based on task requirements. This multi-model capability is foundational for model routing maturity - understanding that different models excel at different tasks enables cost-effective and capability-optimized AI development workflows.
- Multi-model support (Claude, GPT, Gemini)
- Model selection per conversation
Cursor Documentation - Rules for AI
Cursor's Rules for AI feature enables project-level AI configuration via .cursorrules files, allowing teams to define coding standards, conventions, and context that the AI follows automatically. Similar to GitHub's copilot-instructions.md but for the Cursor IDE ecosystem. Key for context_curation dimension - demonstrates cross-tool pattern of configuration-driven AI behavior.
- .cursorrules file format and usage
- Project-level AI configuration
Do you use visual context (screenshots, diagrams, images) when working with AI tools?
Note: Visual context is a power user technique - models interpret images with surprising accuracy
Sources 1
Power User AI Coding Workflow Tips
Practical power user tips for AI coding workflows. Key insights: (1) paste screenshots instead of describing bugs in text, (2) create reusable slash commands for repeated workflows. Both patterns dramatically reduce friction in AI-assisted development.
- Paste screenshots liberally - models handle images extremely well
- Create custom slash commands for repeated workflows
- Screenshot error messages, UI mockups, architecture diagrams
Do you use project-level AI configuration files?
Note: Team-shared configuration indicates organizational maturity. Custom slash commands are reusable macros for AI workflows.
Sources 3
Cursor Documentation - Rules for AI
Cursor's Rules for AI feature enables project-level AI configuration via .cursorrules files, allowing teams to define coding standards, conventions, and context that the AI follows automatically. Similar to GitHub's copilot-instructions.md but for the Cursor IDE ecosystem. Key for context_curation dimension - demonstrates cross-tool pattern of configuration-driven AI behavior.
- .cursorrules file format and usage
- Project-level AI configuration
How Many Instructions Can LLMs Follow?
Empirical research on LLM instruction-following limits. Key finding: even frontier models reliably follow only 150-200 instructions. Implications: keep instruction files focused, use hierarchical structure, include only genuinely useful guidance.
- Frontier LLMs can follow ~150-200 instructions with reasonable consistency
- Smaller models degrade much more quickly with instruction count
- Keep CLAUDE.md/instructions files under 500 lines
Power User AI Coding Workflow Tips
Practical power user tips for AI coding workflows. Key insights: (1) paste screenshots instead of describing bugs in text, (2) create reusable slash commands for repeated workflows. Both patterns dramatically reduce friction in AI-assisted development.
- Paste screenshots liberally - models handle images extremely well
- Create custom slash commands for repeated workflows
- Screenshot error messages, UI mockups, architecture diagrams
What level of specification do you provide when asking AI to implement features?
Sources 2
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering development scenarios
- Scale-adaptive intelligence adjusts to task complexity
Spec-Driven Development with AI: Get Started with a New Open Source Toolkit
GitHub Spec Kit formalizes the spec-driven development approach where detailed specifications precede AI code generation. The four-phase workflow (Specify → Plan → Tasks → Implement) ensures human oversight at each checkpoint. This is the antidote to 'vibe coding' - structured, auditable AI development. Key for assessing advanced workflow maturity.
- Four-phase workflow: Specify, Plan, Tasks, Implement
- Specifications become executable artifacts
- Supports GitHub Copilot, Claude Code, Gemini CLI
When AI output doesn't meet your needs, what do you typically do?
Note: Model switching indicates advanced multi-model awareness
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
For complex tasks, how do you structure your AI interactions?
Note: Supervised agent usage shows mature agentic workflow understanding
Sources 2
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
- 19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
- 50+ workflows covering development scenarios
- Scale-adaptive intelligence adjusts to task complexity
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
- Git-backed issue tracker designed for AI coding agents
- Persistent session memory across restarts
- Dependency-aware task graph (DAG)
How do you manage AI context during long or complex coding sessions?
Sources 2
Document & Clear Method for AI Context Management
Introduces the Document & Clear pattern for managing AI context. Key insight: rather than trusting automatic context compaction, explicitly clear context and persist important state to external files. This produces more reliable outputs and gives you control over what the AI 'remembers'.
- Clear context aggressively with /clear when <50% of prior context is relevant
- Document & Clear method: dump plan to .md file, clear, restart with file reference
- Auto-compaction is unreliable - explicit clearing produces better outputs
Advanced Context Engineering for AI Agents
Introduces the Research → Plan → Implement workflow for AI-assisted development. Key insight: reviewing plans is higher leverage than reviewing code. The workflow explicitly clears context between phases to maintain focus and quality.
- Research → Plan → Implement workflow produces better results than direct coding
- Review the plan before implementation for maximum leverage
- Keep context utilization at 40-60% - clear between phases
For complex, unfamiliar tasks, do you run exploratory AI attempts first?
Sources 1
7 Prompting Habits of Highly Effective Engineers
Introduces the 'scout pattern' for AI-assisted development. Before committing to a complex task, run a throwaway attempt to discover where complexity lies, which files are involved, and what questions arise. This reconnaissance produces valuable context for the real implementation.
- Send out a scout before committing to learn where complexity lies
- Use throwaway attempts to learn which files get modified
- Failed attempts provide valuable context for the 'real' attempt
What workflow do you use for complex AI-assisted development tasks?
Sources 1
Advanced Context Engineering for AI Agents
Introduces the Research → Plan → Implement workflow for AI-assisted development. Key insight: reviewing plans is higher leverage than reviewing code. The workflow explicitly clears context between phases to maintain focus and quality.
- Research → Plan → Implement workflow produces better results than direct coding
- Review the plan before implementation for maximum leverage
- Keep context utilization at 40-60% - clear between phases
How do you use AI research capabilities (web search, citations) during planning phases?
Note: AI research capabilities with web search and citations are critical for planning phases. Copilot has @web, Claude Code has built-in search. Verifying citations is essential - AI can hallucinate sources.
Sources 1
Advanced Context Engineering for AI Agents
Introduces the Research → Plan → Implement workflow for AI-assisted development. Key insight: reviewing plans is higher leverage than reviewing code. The workflow explicitly clears context between phases to maintain focus and quality.
- Research → Plan → Implement workflow produces better results than direct coding
- Review the plan before implementation for maximum leverage
- Keep context utilization at 40-60% - clear between phases
Legacy System & Specialized Use Cases
Effective use of AI for legacy code modernization, COBOL migration, and complex codebase understanding
Legacy systems remain critical enterprise infrastructure. AI tools like GitHub Copilot and Claude Code are increasingly used for legacy code comprehension, documentation, and modernization—enabling organizations to reduce modernization timelines significantly.
Do you work with legacy codebases (COBOL, older Java, mainframe systems)?
Sources 1
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How do you use AI tools for legacy code?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
GitHub Copilot Documentation - Supported AI Models
GitHub Copilot now supports multiple AI models including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro, allowing developers to switch models per conversation based on task requirements. This multi-model capability is foundational for model routing maturity - understanding that different models excel at different tasks enables cost-effective and capability-optimized AI development workflows.
- Multi-model support (Claude, GPT, Gemini)
- Model selection per conversation
For AI-assisted legacy migration, what verification do you apply?
Note: Legacy migration requires highest verification—business logic is often undocumented
Sources 1
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
How do you use AI to document legacy code?
Sources 2
Long context | Gemini API Documentation
Gemini's 1M token context window is among the largest available, enabling whole-codebase understanding. Key for context_curation and model_routing dimensions.
- Gemini models support up to 1M token context window (1,048,576 tokens)
- Can process hours of video, audio, and 60,000+ lines of code in single context
- Gemini 2.5 Pro, 2.5 Flash, 3.0 Pro, 3.0 Flash all support 1M tokens
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
Legal & IP Compliance
Awareness and management of legal risks including copyright, licensing, and IP ownership for AI-generated code
35% of AI code samples have licensing irregularities (Software Freedom Conservancy). GitHub Copilot litigation is ongoing. US Copyright Office clarified AI training fair use in May 2025. Unmanaged legal risk can result in lawsuits and IP contamination.
Are you aware of the legal/IP risks of using AI-generated code?
Sources 3
Doe v. GitHub, Inc. - GitHub Copilot Class Action Lawsuit
Doe v. GitHub is a class action lawsuit alleging GitHub Copilot violated the DMCA and open-source licenses by training on publicly available code without proper attribution or consent. Most copyright claims were dismissed in May 2023, but breach of contract and open-source license violation claims survive, with an interlocutory appeal filed to the Ninth Circuit in October 2024. Outcome will establish legal precedent for how AI systems can use open-source materials.
- Ongoing litigation regarding Copilot training data
- Claims of copyright infringement
Copyright and Artificial Intelligence Study
This is an ongoing multi-part study by the Copyright Office. Part 2 (Copyrightability) is most relevant - it addresses whether AI-generated outputs can receive copyright protection. Part 3 (Generative AI Training) addresses fair use of copyrighted materials in training. The study is still evolving, so conclusions should be referenced with caution as 'current guidance' rather than final rulings.
- Part 1 (July 2024): Digital Replicas
- Part 2 (January 2025): Copyrightability of AI outputs
- Part 3 (May 2025 pre-publication): Generative AI Training
Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.
In Thomson Reuters v. Ross Intelligence (2025), a US court ruled that training AI systems on copyrighted content (Westlaw's legal headnotes) does not qualify as fair use - the first judicial rejection of fair use defense in an AI training context. The ruling establishes that embedding copyrighted works into training data cannot be justified as fair use merely because AI outputs don't expose the original material. Critical precedent for legal_compliance dimension.
- Set precedent for AI training on copyrighted content
Does your organization have legal policies for AI-generated code?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
GitHub Copilot IP Indemnification
GitHub Copilot Business and Enterprise tiers provide IP indemnification coverage up to $500,000 against copyright infringement claims for unmodified suggestions when public code filtering is enabled, while Individual and Pro tiers lack this protection. For enterprises evaluating AI adoption, tier-based indemnification represents a critical liability control mechanism. Key for legal_compliance dimension.
- GitHub Copilot Business/Enterprise includes IP indemnification
- Individual/free tier does not include indemnification
Do you scan AI-generated code for license contamination?
Note: 35% of AI code has licensing irregularities. Scanning is increasingly essential.
Sources 1
SFC Comments to US Copyright Office on Generative AI and Copyleft
The Software Freedom Conservancy's audit found that 35% of AI code samples have licensing irregularities, raising significant copyleft compliance concerns for organizations using AI coding tools. This is critical context for legal_compliance dimension - it quantifies the actual license compliance risk in AI-generated code.
- 35% of AI code samples have licensing irregularities
Does your AI coding tool vendor provide IP indemnification?
Note: GitHub Copilot Business/Enterprise and some Microsoft offerings include indemnification. Open-source tools typically don't.
Sources 1
GitHub Copilot IP Indemnification
GitHub Copilot Business and Enterprise tiers provide IP indemnification coverage up to $500,000 against copyright infringement claims for unmodified suggestions when public code filtering is enabled, while Individual and Pro tiers lack this protection. For enterprises evaluating AI adoption, tier-based indemnification represents a critical liability control mechanism. Key for legal_compliance dimension.
- GitHub Copilot Business/Enterprise includes IP indemnification
- Individual/free tier does not include indemnification
Do you document human involvement in AI-assisted code for copyright purposes?
Note: US Copyright Office requires 'meaningful human authorship' for copyright. Pure AI output may not be protectable.
Sources 1
Copyright and Artificial Intelligence Study
This is an ongoing multi-part study by the Copyright Office. Part 2 (Copyrightability) is most relevant - it addresses whether AI-generated outputs can receive copyright protection. Part 3 (Generative AI Training) addresses fair use of copyrighted materials in training. The study is still evolving, so conclusions should be referenced with caution as 'current guidance' rather than final rulings.
- Part 1 (July 2024): Digital Replicas
- Part 2 (January 2025): Copyrightability of AI outputs
- Part 3 (May 2025 pre-publication): Generative AI Training
Model Selection & Routing
Skill in selecting the right AI model for each task type
GitHub Copilot now supports Claude (Opus/Sonnet), GPT-4, and Gemini. Cursor and Windsurf offer even more options. Mature users know that different models excel at different tasks, and the 100x cost difference between models matters. IDC predicts 70% of top enterprises will use dynamic model routing by 2028.
Are you aware that GitHub Copilot supports multiple AI models (Claude, GPT, Gemini)?
Sources 1
GitHub Copilot Multi-Model Support
GitHub Copilot's multi-model support enables developers to choose the best model for each task. Key for model_routing dimension.
- GitHub Copilot supports multiple AI models from Anthropic, Google, and OpenAI
- GitHub CEO: 'The era of a single model is over'
- Developers can toggle between models during conversation
How do you select AI models for different coding tasks?
Sources 3
Developers are choosing older AI models — and the data explain why
Data-driven analysis showing that in production environments, developers are diversifying model usage rather than consolidating around the newest option. Sonnet 4.5 excels at multi-file reasoning but introduces latency; Sonnet 4.0 is faster and more consistent for structured tasks; GPT-5 excels at explanatory contexts. This supports the need for model routing strategies rather than single-model approaches.
- Model adoption is diversifying, not consolidating around one 'best' model
- Developers match models to specific task profiles rather than always using newest
- Sonnet 4.5 share dropped from 66% to 52% while Sonnet 4.0 rose from 23% to 37%
Anthropic API Pricing
Claude's tiered pricing shows a 25x cost difference from Opus ($5/$25) to Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
- Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
- Claude Sonnet 4.5: $3/$15 per million tokens
- Claude Haiku 4.5: $1/$5 per million tokens
Why the Future of AI Lies in Model Routing
IDC predicts model routing will become standard in enterprise AI. Different models excel at different tasks, and using a single model for everything means suboptimal results. Heavy users have the most to gain from routing.
- By 2028, 70% of top AI-driven enterprises will use multi-tool architectures with dynamic model routing
- AI models work best when somewhat specialized for targeted use cases
- Even SOTA models are delivered as mixtures of experts with routing
Which of the following model-task pairings do you use?
Note: Thinking triggers (ultrathink) activate extended reasoning budgets in Claude Code. Gemini Deep Think uses parallel reasoning. Different cognitive styles for different tasks.
Sources 4
Claude Opus 4.5
Claude Opus 4.5 sets a new bar for AI coding with 80.9% SWE-bench Verified. Key for model_routing dimension - represents current state-of-the-art for complex coding tasks.
- 80.9% on SWE-bench Verified - first AI model over 80%
- 66.3% on OSWorld (best computer-using model)
- 89.4% on Aider Polyglot Coding benchmark
Long context | Gemini API Documentation
Gemini's 1M token context window is among the largest available, enabling whole-codebase understanding. Key for context_curation and model_routing dimensions.
- Gemini models support up to 1M token context window (1,048,576 tokens)
- Can process hours of video, audio, and 60,000+ lines of code in single context
- Gemini 2.5 Pro, 2.5 Flash, 3.0 Pro, 3.0 Flash all support 1M tokens
Gemini 2.5 and 3: Thinking Models with Deep Think
Gemini 2.5 Pro (March 2025) introduced 'thinking models' with 1M context. Deep Think mode extends inference time for complex reasoning tasks, achieving Bronze IMO performance. Gemini 3 Pro announced November 2025 replaces 2.5 as the flagship. Critical for understanding the 'reasoning model' paradigm shift and extended thinking capabilities.
- Gemini 2.5 Pro released March 2025 with 1M token context window
- Deep Think mode uses extended inference time for complex reasoning
- Bronze-level performance on 2025 IMO benchmark
OpenAI o3, o4-mini, GPT-5-Codex, and Codex Platform
OpenAI's 2025 agentic coding stack evolved rapidly: o3 (April 2025, 71.7% SWE-bench) → codex-1 (72.1%/83.8%) → GPT-5-Codex (September 2025, 74.5%). The Codex platform represents OpenAI's vision for cloud-based multi-agent software engineering. Note: o3/o4-mini benchmarks are superseded by GPT-5-Codex for current capabilities.
- o3 released April 16, 2025 with 71.7% on SWE-bench Verified
- o4-mini released as successor to o3-mini
- Codex platform: cloud-based multi-agent software engineering
Are you aware of the cost differences between AI models?
Sources 1
Anthropic API Pricing
Claude's tiered pricing shows a 25x cost difference from Opus ($5/$25) to Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
- Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
- Claude Sonnet 4.5: $3/$15 per million tokens
- Claude Haiku 4.5: $1/$5 per million tokens
When selecting a model for a task, do you consider speed/latency tradeoffs?
Sources 1
Developers are choosing older AI models — and the data explain why
Data-driven analysis showing that in production environments, developers are diversifying model usage rather than consolidating around the newest option. Sonnet 4.5 excels at multi-file reasoning but introduces latency; Sonnet 4.0 is faster and more consistent for structured tasks; GPT-5 excels at explanatory contexts. This supports the need for model routing strategies rather than single-model approaches.
- Model adoption is diversifying, not consolidating around one 'best' model
- Developers match models to specific task profiles rather than always using newest
- Sonnet 4.5 share dropped from 66% to 52% while Sonnet 4.0 rose from 23% to 37%
Organizational Integration
Team and organizational practices for AI-assisted development, including agentic workflows
Individual skill without team practices leads to inconsistent quality. DORA 2025: AI amplifies existing capabilities—strong teams get stronger, struggling teams get worse. Organizational readiness determines whether AI helps or hurts.
Does your team have guidelines for when/how to use Copilot?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
How does your team handle code review for AI-generated code?
Note: 2025 update: AI-assisted code review (CodeRabbit, etc.) is emerging practice for managing volume
Sources 2
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
Does your organization have policies for agent mode / autonomous AI coding?
Sources 2
GitHub Copilot Agent Mode Documentation
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
- Agent mode can edit multiple files autonomously
- Supervision best practices
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
What AI coding tool training have you received?
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
How does your team share AI coding tips, tricks, and learnings?
Sources 2
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Does AI-generated code have different testing requirements?
Note: Given 45% vulnerability rate (Veracode 2025), security scanning for AI code is recommended
Sources 1
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
Does your organization track AI coding tool effectiveness?
Note: DORA 2025 recommends three-layer measurement: adoption, behavior, outcomes. Leading AI tool vendors track ARR growth as adoption proxy.
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Cognition-Windsurf Acquisition and Consolidation
The July 2025 Cognition-Windsurf deal illustrates rapid AI coding market consolidation. The bidding war (OpenAI $3B, Google $2.4B acqui-hire, Cognition acquisition) shows the strategic value of AI coding tools. Cognition's $10.2B valuation post-merger signals enterprise confidence in agentic coding.
- Cognition acquired Windsurf July 2025 after Google hired CEO in $2.4B deal
- OpenAI's $3B Windsurf offer expired just hours before Google deal
- Acquisition included IP, trademark, product, and $82M ARR
Does your organization provide dedicated time for AI tool learning?
Note: DORA 2025: Dedicated learning time leads to 131% increase in team AI adoption
Sources 1
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Has your organization redesigned workflows specifically for AI tools?
Note: McKinsey 2025: High performers are 3x more likely to redesign workflows for AI. Rakuten achieved 79% time reduction with Slack-integrated AI coding.
Sources 2
The State of AI in 2025: Agents, Innovation, and Transformation
McKinsey's 2025 survey shows AI use is common (88%) but enterprise value capture is rare (only 39% see EBIT impact). The key differentiator is workflow redesign - high performers are 3x more likely to fundamentally redesign workflows. The 62% experimenting with agents stat is critical for agentic_supervision. Key insight: most organizations are still in pilots, not scaled adoption.
- 88% report regular AI use in at least one business function (up from 78%)
- Nearly two-thirds still in experimentation or piloting phases
- 62% experimenting with AI agents; 23% scaling agents
Claude Code in Slack: Agentic Coding Integration
Claude Code's Slack integration represents the 'ambient AI' pattern: AI agents triggered from natural team conversations, not dedicated coding interfaces. The $1B revenue milestone and enterprise customers (Netflix, Spotify) validate the market. Rakuten's 79% timeline reduction is a standout case study.
- Claude Code in Slack launched December 8, 2025 as research preview
- @Claude tag routes coding tasks to Claude Code on web automatically
- Analyzes Slack context (bug reports, feature requests) for repository detection
Perceived Outcomes & Satisfaction
Self-reported productivity and satisfaction metrics (SPACE framework)
These don't measure maturity directly, but they indicate whether Copilot is delivering value. DORA 2025: AI adoption now positively correlates with throughput (unlike 2024) but still negatively with stability.
Copilot helps me complete tasks faster.
Sources 2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
Copilot helps me stay in flow when coding.
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
Compared to coding without Copilot, how much time do you estimate you save per week?
Note: JetBrains 2025: 19% of developers save 8+ hours/week (up from 9% in 2024). 6+ hours indicates power user territory.
Sources 2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
The State of Developer Ecosystem 2025
The 2025 JetBrains survey of 24,534 developers shows AI tools have become mainstream (85% regular usage). The 68% expecting AI proficiency to become a job requirement is critical for skill development. The finding that 19% save 8+ hours/week (up from 9% in 2024) shows productivity gains are real for power users. Key insight: developers want AI for mundane tasks but want control of creative/complex work.
- 85% of developers regularly use AI tools for coding and development
- 62% use at least one AI coding assistant, agent, or code editor
- 68% expect AI proficiency will become a job requirement
I enjoy my work more when using Copilot.
Sources 2
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
The code I produce with Copilot is high quality.
Sources 1
State of AI vs Human Code Generation Report
This is the most rigorous empirical comparison of AI vs human code quality to date. The 1.7x issue rate and specific vulnerability multipliers (2.74x XSS, 1.88x password handling) are critical for trust_calibration recommendations. Key insight: AI makes the same kinds of mistakes humans do, just more often at larger scale. The 8x I/O performance issue rate shows AI favors simple patterns over efficiency.
- AI-generated PRs contain 1.7x more issues overall (10.83 vs 6.45 issues per PR)
- AI PRs show 1.4-1.7x more critical and major issues
- Logic and correctness issues 75% more common in AI PRs
Have you ever measured your actual productivity with vs. without AI tools?
Sources 1
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
Does your organization track business ROI from AI coding tools (not just usage)?
Note: McKinsey 2025: 88% use AI but only 39% see EBIT impact. Most organizations haven't connected AI adoption to business outcomes.
Sources 1
The State of AI in 2025: Agents, Innovation, and Transformation
McKinsey's 2025 survey shows AI use is common (88%) but enterprise value capture is rare (only 39% see EBIT impact). The key differentiator is workflow redesign - high performers are 3x more likely to fundamentally redesign workflows. The 62% experimenting with agents stat is critical for agentic_supervision. Key insight: most organizations are still in pilots, not scaled adoption.
- 88% report regular AI use in at least one business function (up from 78%)
- Nearly two-thirds still in experimentation or piloting phases
- 62% experimenting with AI agents; 23% scaling agents
Has your trust in AI coding tools changed over the past 12 months?
Note: Stack Overflow 2025: Positive sentiment dropped from 70%+ to 60%. Calibrated trust (stable or slightly cautious) indicates maturity. Both extremes (burned or heavy reliance) may indicate issues.
Sources 1
2025 Stack Overflow Developer Survey
2025 Stack Overflow survey shows continued adoption (84%) but declining trust (60% positive, down from 70%+). Key insight: developers are using AI more but trusting it less. 35% use Stack Overflow as fallback when AI fails.
- 84% of respondents using or planning to use AI tools (up from 76% in 2024)
- 51% of professional developers use AI tools daily
- Positive sentiment dropped from 70%+ (2023-2024) to 60% (2025)
Team Composition & Talent Development
How AI adoption is affecting team structure, junior developer development, and hiring practices
AI presents both risks AND opportunities for junior developer development. While Stanford research shows employment shifts in AI-exposed roles, GitHub research shows juniors who adapt—supervising AI output, maintaining curiosity, building fundamentals—can thrive. The skill surface shifts from 'write code' to 'verify and supervise AI code'. Teams that invest in updated mentorship create accelerated apprenticeships where juniors gain exposure to architecture and system design earlier.
How has your team's junior-to-senior developer ratio changed with AI adoption?
Note: Dramatic reduction in juniors is a long-term talent pipeline risk
Sources 2
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence
This is the most credible source on AI's employment impact on junior developers. The 13% relative decline for ages 22-25 in AI-exposed roles is significant but more nuanced than previously cited '25% decrease'. Key insight: the impact is concentrated where AI automates rather than augments - this supports our team_composition dimension's focus on mentorship and skill development. Updated November 2025.
- Early-career workers (ages 22-25) in AI-exposed occupations experienced 13% relative decline in employment
- Adjustments occur primarily through employment rather than compensation
- Employment declines concentrated in occupations where AI automates rather than augments labor
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How do you ensure junior developers build foundational skills despite AI availability?
Note: Teams with no special junior development practices risk long-term skill atrophy
Sources 2
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
Junior developers aren't obsolete: Here's how to thrive in the age of AI
GitHub's perspective on how junior developers can thrive with AI. Key insight: AI changes expectations - juniors must supervise AI output, not blindly accept it. Fundamentals and critical thinking remain essential.
- AI is closing skill gaps but creating new expectations for juniors
- Juniors expected to supervise AI's work, not just accept it blindly
- Think critically about AI-generated code, stay curious
How has AI affected your hiring criteria?
Note: Dramatic changes in hiring criteria may indicate short-term optimization at expense of long-term capability
Sources 2
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence
This is the most credible source on AI's employment impact on junior developers. The 13% relative decline for ages 22-25 in AI-exposed roles is significant but more nuanced than previously cited '25% decrease'. Key insight: the impact is concentrated where AI automates rather than augments - this supports our team_composition dimension's focus on mentorship and skill development. Updated November 2025.
- Early-career workers (ages 22-25) in AI-exposed occupations experienced 13% relative decline in employment
- Adjustments occur primarily through employment rather than compensation
- Employment declines concentrated in occupations where AI automates rather than augments labor
State of AI-Assisted Software Development 2025
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
- 89% of organizations prioritizing AI integration into applications
- 76% of technologists rely on AI for parts of their daily work
- 75% of developers report positive productivity impact from AI
How is AI affecting productivity distribution across your team?
Sources 2
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
This is the most rigorous 2025 study on AI coding productivity. The RCT methodology (16 experienced developers, 246 tasks, $150/hr compensation) makes this highly credible. The 39-44 percentage point gap between perceived and actual productivity is the key insight for our trust_calibration dimension. This directly supports recommendations about not over-trusting AI suggestions and maintaining verification practices.
- Experienced developers were 19% slower with AI
- Developers perceived 20% speedup (39-44 percentage point gap)
- Self-reported productivity may not reflect reality
How has your mentorship approach adapted for AI-assisted development?
Sources 2
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
Junior developers aren't obsolete: Here's how to thrive in the age of AI
GitHub's perspective on how junior developers can thrive with AI. Key insight: AI changes expectations - juniors must supervise AI output, not blindly accept it. Fundamentals and critical thinking remain essential.
- AI is closing skill gaps but creating new expectations for juniors
- Juniors expected to supervise AI's work, not just accept it blindly
- Think critically about AI-generated code, stay curious
Are juniors using AI as a learning accelerator or a shortcut?
Sources 1
AI Won't Kill Junior Devs - But Your Hiring Strategy Might
Addy Osmani reframes the junior developer AI debate from risk to opportunity. Key insight: AI accelerates careers for juniors who adapt by shifting from 'write code' to 'supervise AI code'. Teams with updated mentorship create accelerated apprenticeships. The real threat is hiring strategies, not AI itself.
- AI is a career accelerant for juniors who adapt
- Skill surface shifts from 'write code' to 'verify and supervise AI code'
- Updated mentorship: coach on integrating AI without over-dependence
Trust Calibration & Verification
Appropriate skepticism and verification practices for AI-generated code, including managing review fatigue
Veracode 2025: 45% of AI-generated code contains security vulnerabilities. Qodo 2025: 25% of developers estimate 1 in 5 AI suggestions contain errors. Yet METR found experienced devs feel 20% faster while actually being 19% slower—a dangerous perception gap. Review fatigue is now a critical concern.
What do you typically do before accepting a Copilot suggestion?
Sources 2
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
Approximately what percentage of Copilot suggestions do you accept?
Note: 2025 benchmark: 30-33% is healthy. Healthcare/regulated at 50-60%. Startups at 75% (too high). >70% is a red flag.
Sources 2
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
- 95% of developers said they enjoyed coding more with GitHub Copilot
- 90% of developers felt more fulfilled with their jobs when using GitHub Copilot
- Developers accepted around 30% of GitHub Copilot's suggestions
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
In the past month, how often did you discover an error in AI-generated code AFTER accepting it?
Note: 1-2 times scores highest—indicates you use AI enough to encounter issues but catch most before acceptance
Sources 2
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
Are you aware that 45% of AI-generated code contains security vulnerabilities (Veracode 2025)?
Sources 1
October 2025 Update: GenAI Code Security Report
This is the primary source for our 45% security vulnerability claim. The October 2025 update confirms that AI code security issues persist even with newer models. The finding that 'bigger models ≠ more secure code' is important for our model_routing dimension - it suggests security scanning is needed regardless of which model is used. The 72% Java-specific rate mentioned in our citations may be from the full PDF report.
- AI-generated code introduced risky security flaws in 45% of tests
- 100+ LLMs tested across Java, JavaScript, Python, and C#
- Larger, newer AI models didn't improve security
CodeRabbit 2025 found AI code has 2.74x more XSS, 8x more I/O performance issues. Do you check for these patterns?
Note: CodeRabbit analyzed 470 PRs: AI code has 1.7x overall issues, with security (2.74x XSS) and performance (8x I/O) as top concerns
Sources 1
State of AI vs Human Code Generation Report
This is the most rigorous empirical comparison of AI vs human code quality to date. The 1.7x issue rate and specific vulnerability multipliers (2.74x XSS, 1.88x password handling) are critical for trust_calibration recommendations. Key insight: AI makes the same kinds of mistakes humans do, just more often at larger scale. The 8x I/O performance issue rate shows AI favors simple patterns over efficiency.
- AI-generated PRs contain 1.7x more issues overall (10.83 vs 6.45 issues per PR)
- AI PRs show 1.4-1.7x more critical and major issues
- Logic and correctness issues 75% more common in AI PRs
GitClear found 8x increase in code duplication from AI tools. Do you monitor for this?
Note: GitClear 2025: AI makes it easier to add new code than reuse existing (limited context). Refactoring dropped from 25% to <10% of changes.
Sources 1
AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones
GitClear's research on 211M lines of code shows AI is changing how code is written - more duplication, less refactoring. The 8x increase in code clones and decline in refactoring suggest AI makes it easier to add new code than reuse existing code (limited context). Critical for understanding long-term maintainability implications.
- 8x increase in duplicated code blocks during 2024
- Refactoring dropped from 25% of changed lines (2021) to <10% (2024)
- Copy/pasted (cloned) code rose from 8.3% to 12.3% (2021-2024)
With agent mode generating larger diffs, how do you manage review fatigue?
Note: This is a new critical question for 2025. Layered AI+human review is emerging best practice.
Sources 2
State of AI Code Quality 2025
This is the most comprehensive 2025 survey on AI code quality (609 developers). The key insight is the 'Confidence Flywheel' - context-rich suggestions reduce hallucinations, which improves quality, which builds trust. The finding that 80% of PRs don't receive human review when AI tools are enabled is critical for our agentic_supervision dimension. NOTE: The previously cited 1.7x issue rate and 41% commit stats were not found in the current report.
- 82% of developers use AI coding tools daily or weekly
- 65% of developers say at least a quarter of each commit is AI-generated
- 59% say AI has improved code quality
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
- Blind trust in AI-generated code is a vulnerability
- AI tools function as 'Copilot, Not Autopilot'
- Human verification is the new development bottleneck
How often do you 'Accept All' AI changes without reading the diff? (Karpathy's 'vibe coding' pattern)
Note: Karpathy coined 'vibe coding' (Feb 2025) for accepting without reading. Fast Company reported 'vibe coding hangover' (Sep 2025) with 'development hell' consequences. Score of 2+ is a critical risk flag.
Sources 2
Vibe Coding Definition (Original Tweet)
This tweet coined the term 'vibe coding' on February 3, 2025, defining it as a programming style where you 'forget that the code even exists' and 'Accept All' without reading diffs. Critically, Karpathy explicitly limits this to 'throwaway weekend projects' - a nuance often missed in subsequent coverage. The full quote shows he acknowledges the code grows 'beyond my usual comprehension' and he works around bugs rather than fixing them. This is essential context for our trust_calibration dimension: even the person who coined the term warns it's not for production work.
- Coined term 'vibe coding' for accepting AI changes without reading
- Described code 'beyond your comprehension'
- Pattern of full trust in AI output
The vibe coding hangover is upon us
This article documents the real-world consequences of 'vibe coding' practices going wrong. The Tea App case study is particularly powerful: a dating app built with minimal oversight leaked 72,000 images including driver's licenses due to a misconfigured Firebase bucket - a basic security error that proper review would have caught. PayPal engineer Jack Hays calls AI-generated code 'development hell' to maintain. Stack Overflow data shows declining trust (46% distrust vs 33% trust) and positive sentiment falling from 70% to 60%. This is essential evidence for our trust_calibration and agentic_supervision dimensions.
- Documented 'vibe coding hangover' phenomenon
- Teams in 'development hell' from unreviewed AI code
- Tea App data breach: 72,000 sensitive images leaked from unsecured Firebase
If you sometimes accept without reading: In what contexts?
Note: Vibe coding on prototypes/docs is lower risk. On production/features is critical risk. Score 0 for high-risk contexts.
Sources 2
Vibe Coding Definition (Original Tweet)
This tweet coined the term 'vibe coding' on February 3, 2025, defining it as a programming style where you 'forget that the code even exists' and 'Accept All' without reading diffs. Critically, Karpathy explicitly limits this to 'throwaway weekend projects' - a nuance often missed in subsequent coverage. The full quote shows he acknowledges the code grows 'beyond my usual comprehension' and he works around bugs rather than fixing them. This is essential context for our trust_calibration dimension: even the person who coined the term warns it's not for production work.
- Coined term 'vibe coding' for accepting AI changes without reading
- Described code 'beyond your comprehension'
- Pattern of full trust in AI output
The vibe coding hangover is upon us
This article documents the real-world consequences of 'vibe coding' practices going wrong. The Tea App case study is particularly powerful: a dating app built with minimal oversight leaked 72,000 images including driver's licenses due to a misconfigured Firebase bucket - a basic security error that proper review would have caught. PayPal engineer Jack Hays calls AI-generated code 'development hell' to maintain. Stack Overflow data shows declining trust (46% distrust vs 33% trust) and positive sentiment falling from 70% to 60%. This is essential evidence for our trust_calibration and agentic_supervision dimensions.
- Documented 'vibe coding hangover' phenomenon
- Teams in 'development hell' from unreviewed AI code
- Tea App data breach: 72,000 sensitive images leaked from unsecured Firebase
Submit Incomplete Assessment?
Your results will be marked as partial and may not reflect your full maturity level.
Start Over?
This will clear all your answers and reset the assessment to the beginning.
This action cannot be undone.
About This Assessment
Section titled “About This Assessment”This assessment evaluates your team’s maturity across 12 research-backed dimensions of AI-assisted development:
- Foundational Adoption — Basic usage patterns and tool activation
- Context Curation — Providing AI with the right information
- Model Routing — Matching tasks to appropriate models
- Agentic Supervision — Managing autonomous AI workflows
- Trust Calibration — Appropriate trust and verification
- Appropriate Non-Use — Knowing when to skip AI
- Organizational Integration — Enterprise policies and governance
- Legal & Compliance — Regulatory and licensing considerations
- Team Composition — Experience balance across the team
- Legacy Modernization — Using AI for legacy code
- Advanced Workflows — Multi-model and multi-tool patterns
- Outcomes — Measuring actual results
Scoring Methodology
Section titled “Scoring Methodology”Each question is weighted based on 2025 industry research from:
- DORA State of AI-Assisted Software Development 2025
- Veracode GenAI Code Security Report 2025
- GitHub/Accenture Enterprise Research
- METR Developer Productivity Study
Your results include personalized recommendations based on your specific gaps.