AI & ML Dec 5, 2025 11 min read

Automating Code Reviews with AI: Our Internal Tool

How we built an AI-powered code review assistant that catches 87% of common issues before human review, saving 15+ hours per sprint.

DSC

Dr. Sarah Chen

Head of AI Engineering

Why We Built This

Code reviews are a bottleneck. Our team of 80+ engineers generates 200+ pull requests per week. Senior engineers were spending 30% of their time on code reviews, and the average PR sat for 8 hours before receiving its first review.

We asked ourselves: what if AI could handle the repetitive parts of code review — style consistency, common bug patterns, security issues — so human reviewers could focus on architecture and design decisions?

Architecture Overview

The system has three layers:

  1. Static Analysis Layer: Enhanced ESLint/SonarQube rules that catch structural issues
  2. ML Pattern Detection Layer: Custom models trained on our historical review data that identify code smells, potential bugs, and performance issues
  3. LLM Review Layer: GPT-4 with custom prompts for contextual code review — understanding intent, suggesting improvements, and explaining issues

Each layer runs independently and produces scored findings. A consolidation service merges results, deduplicates, and ranks by severity.

Training Data & Models

We trained our pattern detection models on:

  • 18 months of internal code review data: 12,000+ PRs with reviewer comments mapped to code changes
  • Public code review datasets: Augmented with open-source code review data for broader coverage
  • Manual labeling: Our senior engineers labeled 2,000 code snippets for specific issue categories

The models are surprisingly simple — gradient-boosted trees on code embeddings outperformed complex transformer models for pattern detection. The LLM layer handles the nuanced cases that require understanding context and intent.

CI/CD Integration

The tool integrates seamlessly into our GitHub workflow:

  • Triggered on PR open/update: Runs automatically on every push to an open PR
  • Inline comments: Posts findings as inline review comments on the exact lines of concern
  • Severity labels: Critical (blocks merge), Warning (should fix), Info (style suggestion)
  • Auto-approve: Low-risk PRs (documentation, config changes) can be auto-approved if no critical findings
  • Learning loop: When a human reviewer dismisses a finding, it feeds back into model training

Average processing time: 45 seconds for a 500-line PR.

Results After 6 Months

The impact exceeded our expectations:

  • 87% of common issues caught before human review (null checks, error handling, SQL injection risks)
  • 15+ hours per sprint saved across the engineering team
  • Average PR review time reduced from 8 hours to 2.5 hours
  • 4x fewer production bugs related to code quality issues
  • Developer satisfaction: 92% of engineers rate the tool as "helpful" or "very helpful"

The biggest surprise: junior developers improved faster because they received instant, consistent feedback on every PR.

Open Source Plans

We're preparing to open-source the core framework in Q2 2026. The release will include:

  • Pattern detection models (pre-trained on public data only)
  • LLM prompt templates for code review
  • GitHub Action for easy integration
  • Configuration framework for custom rules

Stay tuned on our GitHub organization for the release announcement. We believe every engineering team deserves better code review tooling.

Tags
AICode ReviewDeveloper ToolsAutomation
DSC

Written by

Dr. Sarah Chen

Head of AI Engineering

Part of the Fixl engineering team, sharing insights from building production-grade software for startups and enterprises.

NDA-friendlyConfidentialEngineering-led