Why We Built This
Code reviews are a bottleneck. Our team of 80+ engineers generates 200+ pull requests per week. Senior engineers were spending 30% of their time on code reviews, and the average PR sat for 8 hours before receiving its first review.
We asked ourselves: what if AI could handle the repetitive parts of code review — style consistency, common bug patterns, security issues — so human reviewers could focus on architecture and design decisions?
Architecture Overview
The system has three layers:
- Static Analysis Layer: Enhanced ESLint/SonarQube rules that catch structural issues
- ML Pattern Detection Layer: Custom models trained on our historical review data that identify code smells, potential bugs, and performance issues
- LLM Review Layer: GPT-4 with custom prompts for contextual code review — understanding intent, suggesting improvements, and explaining issues
Each layer runs independently and produces scored findings. A consolidation service merges results, deduplicates, and ranks by severity.
Training Data & Models
We trained our pattern detection models on:
- 18 months of internal code review data: 12,000+ PRs with reviewer comments mapped to code changes
- Public code review datasets: Augmented with open-source code review data for broader coverage
- Manual labeling: Our senior engineers labeled 2,000 code snippets for specific issue categories
The models are surprisingly simple — gradient-boosted trees on code embeddings outperformed complex transformer models for pattern detection. The LLM layer handles the nuanced cases that require understanding context and intent.
CI/CD Integration
The tool integrates seamlessly into our GitHub workflow:
- Triggered on PR open/update: Runs automatically on every push to an open PR
- Inline comments: Posts findings as inline review comments on the exact lines of concern
- Severity labels: Critical (blocks merge), Warning (should fix), Info (style suggestion)
- Auto-approve: Low-risk PRs (documentation, config changes) can be auto-approved if no critical findings
- Learning loop: When a human reviewer dismisses a finding, it feeds back into model training
Average processing time: 45 seconds for a 500-line PR.
Results After 6 Months
The impact exceeded our expectations:
- 87% of common issues caught before human review (null checks, error handling, SQL injection risks)
- 15+ hours per sprint saved across the engineering team
- Average PR review time reduced from 8 hours to 2.5 hours
- 4x fewer production bugs related to code quality issues
- Developer satisfaction: 92% of engineers rate the tool as "helpful" or "very helpful"
The biggest surprise: junior developers improved faster because they received instant, consistent feedback on every PR.
Open Source Plans
We're preparing to open-source the core framework in Q2 2026. The release will include:
- Pattern detection models (pre-trained on public data only)
- LLM prompt templates for code review
- GitHub Action for easy integration
- Configuration framework for custom rules
Stay tuned on our GitHub organization for the release announcement. We believe every engineering team deserves better code review tooling.
Written by
Dr. Sarah Chen
Head of AI Engineering
Part of the Fixl engineering team, sharing insights from building production-grade software for startups and enterprises.