← 返回投肯智能知识库首页

AI代码审查系统实战:从0到1搭建自动化Code Review流水线

作者:重庆投肯小刚更新日期:2026年5月

目录

  1. 项目概述:为什么要做AI代码审查
  2. 系统架构设计
  3. 环境准备与依赖安装
  4. GitHub Webhook配置
  5. 代码审查核心逻辑实现
  6. Prompt工程:如何让AI给出专业审查
  7. Docker部署与生产配置
  8. 监控、报警与持续优化
  9. 常见问题与排错指南

项目概述:为什么要做AI代码审查

1.1 传统Code Review的问题

代码审查(Code Review)是保证代码质量的重要手段,但传统方式存在三大痛点:

用AI做代码审查可以解决这些问题,而且24小时不间断、响应快、标准统一。

1.2 项目目标

我们要搭建一个AI代码审查系统,实现以下功能:

  1. 当团队成员推送代码到GitHub时,自动触发AI审查
  2. AI分析代码变更,识别:语法错误、潜在Bug、安全漏洞、性能问题、代码风格问题
  3. 将审查结果以评论形式自动发布到GitHub Pull Request
  4. 支持多种编程语言(Python、JavaScript、Java、Go等)
  5. 提供Web界面查看审查历史和统计

1.3 技术选型

组件选择原因
编程语言Python 3.11AI生态最丰富,LangChain/PyTorch都是Python
AI模型Claude 3.5 Sonnet / GPT-4o代码能力最强,支持长上下文
Web框架FastAPI异步高性能,类型安全,自动文档
数据库PostgreSQL + pgvector存储审查记录,向量检索历史案例
消息队列Redis Queue削峰填谷,避免GitHub API限流
容器化Docker + Docker Compose一键部署,环境隔离
部署平台任意(ECS/本地/K8s)架构与平台解耦

系统架构设计

2.1 整体架构图

架构说明:
┌─────────────────────────────────────────────────────────────────────┐
│                         用户/开发者                                  │
│                    推送代码到 GitHub PR                              │
└──────────────────────────────────┬──────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         GitHub Webhook                               │
│              推送事件 → POST /api/webhook/github                    │
└──────────────────────────────────┬──────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    FastAPI Web Server                               │
│           接收Webhook → 解析PR信息 → 发送到Redis Queue              │
└──────────────────────────────────┬──────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Redis Queue                                  │
│                   任务队列:review_tasks                             │
└──────────────────────────────────┬──────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Worker Process                                   │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐              │
│  │ GitHub API  │   │ Claude/GPT  │   │  代码分析   │              │
│  │ 获取Diff    │   │ 审查代码    │   │  解析AST    │              │
│  └─────────────┘   └─────────────┘   └─────────────┘              │
└──────────────────────────────────┬──────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      PostgreSQL                                     │
│              存储:审查记录、统计、配置                              │
└──────────────────────────────────┬──────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       GitHub API                                    │
│                 发布审查评论到PR                                     │
└─────────────────────────────────────────────────────────────────────┘

2.2 核心数据模型

python
# ============================================
# 数据模型定义
# 文件:models.py
# ============================================

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import List, Optional

class ReviewStatus(Enum):
    """审查状态枚举"""
    PENDING = "pending"          # 待审查
    IN_PROGRESS = "in_progress"  # 审查中
    COMPLETED = "completed"      # 完成
    FAILED = "failed"            # 失败

class Severity(Enum):
    """问题严重程度"""
    CRITICAL = "critical"        # 严重(必须修复)
    HIGH = "high"                # 高(强烈建议修复)
    MEDIUM = "medium"            # 中等(建议修复)
    LOW = "low"                  # 低(可选修复)
    INFO = "info"                # 信息(仅供参考)

class IssueType(Enum):
    """问题类型"""
    BUG = "bug"                  # 潜在Bug
    SECURITY = "security"        # 安全问题
    PERFORMANCE = "performance"  # 性能问题
    STYLE = "style"              # 代码风格
    BEST_PRACTICE = "best_practice"  # 最佳实践
    DOCUMENTATION = "documentation"  # 文档问题

@dataclass
class FileChange:
    """单个文件的变更"""
    filename: str                # 文件路径
    diff: str                    # Git diff内容
    status: str                  # added/modified/deleted
    language: str                # 编程语言
    
@dataclass
class CodeIssue:
    """代码问题"""
    issue_type: IssueType
    severity: Severity
    file: str                    # 所属文件
    line_start: int              # 问题起始行
    line_end: int                # 问题结束行
    title: str                   # 问题标题
    description: str             # 详细描述
    suggestion: str              # 修改建议
    rule_id: Optional[str] = None  # 匹配的规则ID(用于统计)
    
@dataclass
class ReviewResult:
    """审查结果"""
    pr_number: int
    repository: str
    commit_sha: str
    status: ReviewStatus
    created_at: datetime
    completed_at: Optional[datetime] = None
    issues: List[CodeIssue] = field(default_factory=list)
    summary: str = ""            # 总体评价
    stats: dict = field(default_factory=dict)  # 统计数据
    
    def to_dict(self) -> dict:
        """转换为字典,用于JSON序列化"""
        return {
            "pr_number": self.pr_number,
            "repository": self.repository,
            "commit_sha": self.commit_sha,
            "status": self.status.value,
            "created_at": self.created_at.isoformat(),
            "completed_at": self.completed_at.isoformat() if self.completed_at else None,
            "issues_count": len(self.issues),
            "issues_by_severity": self._count_by_severity(),
            "summary": self.summary
        }
    
    def _count_by_severity(self) -> dict:
        counts = {s.value: 0 for s in Severity}
        for issue in self.issues:
            counts[issue.severity.value] += 1
        return counts

@dataclass  
class ReviewRequest:
    """审查请求(从Webhook来)"""
    event_type: str              # push/pull_request
    action: str                  # opened/closed/synchronize
    repository: str
    pr_number: int
    commit_sha: str
    diff_url: str
    changes: List[FileChange] = field(default_factory=list)

环境准备与依赖安装

3.1 系统要求

bash
# ============================================
# 环境要求
# ============================================
# 操作系统:Ubuntu 20.04+ / macOS 12+ / CentOS 7+
# Python: 3.11+
# 内存:最低4GB,推荐8GB+
# 磁盘:10GB+

# 验证Python版本
python3 --version
# 输出应该是 Python 3.11.x 或更高

3.2 安装依赖

bash
# ============================================
# 创建项目目录并初始化
# ============================================
mkdir -p ~/ai-code-review && cd ~/ai-code-review

# 创建虚拟环境(推荐)
python3 -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

# 升级pip
pip install --upgrade pip setuptools wheel

# ============================================
# 安装核心依赖
# ============================================

# Web框架
pip install fastapi==0.115.0 uvicorn[standard]==0.30.0

# 异步任务队列
pip install redis==5.0.0 rq==1.16.0

# GitHub API
pip install PyGithub==2.3.0

# AI集成
pip install langchain-openai==0.1.0 langchain-anthropic==0.1.0

# 数据库ORM
pip install SQLAlchemy==2.0.0 asyncpg==0.29.0 alembic==1.13.0

# 向量数据库(用于相似案例检索)
pip install pgvector==0.2.0

# 配置文件
pip install PyYAML==6.0.1 python-dotenv==1.0.0 pydantic==2.6.0 pydantic-settings==2.2.0

# HTTP客户端
pip install httpx==0.27.0 aiohttp==3.9.0

# 日志
pip install loguru==0.7.0

# 测试
pip install pytest==8.0.0 pytest-asyncio==0.23.0 httpx

# 验证安装
python -c "import fastapi, redis, github, langchain; print('✅ 所有依赖安装成功')"

3.3 项目结构

bash
# ============================================
# 项目目录结构
# ============================================

cd ~/ai-code-review
mkdir -p app/{api,core,services,models,workers} tests configs
touch app/__init__.py app/api/__init__.py app/core/__init__.py
touch app/services/__init__.py app/models/__init__.py app/workers/__init__.py

tree -L 3 --dirsfirst

# 输出:
# .
# ├── app/
# │   ├── __init__.py
# │   ├── api/           # API路由
# │   │   ├── __init__.py
# │   │   ├── webhooks.py    # GitHub Webhook处理
# │   │   └── review.py     # 审查结果查询
# │   ├── core/          # 核心配置
# │   │   ├── __init__.py
# │   │   ├── config.py     # 配置管理
# │   │   └── security.py   # 安全相关
# │   ├── models/        # 数据模型
# │   │   ├── __init__.py
# │   │   └── database.py   # 数据库模型
# │   ├── services/      # 业务逻辑
# │   │   ├── __init__.py
# │   │   ├── github_service.py    # GitHub交互
# │   │   ├── review_service.py    # 审查逻辑
# │   │   └── ai_service.py        # AI模型调用
# │   └── workers/       # 后台任务
# │       ├── __init__.py
# │       └── review_worker.py     # 审查Worker
# ├── configs/           # 配置文件
# │   └── settings.yaml
# ├── tests/             # 测试
# ├── requirements.txt
# ├── Dockerfile
# ├── docker-compose.yml
# └── README.md

3.4 配置文件

yaml
# ============================================
# configs/settings.yaml
# ============================================

app:
  name: "AI Code Review System"
  host: "0.0.0.0"
  port: 8000
  debug: false
  secret_key: "your-secret-key-change-in-production"  # 必须修改

database:
  host: "localhost"
  port: 5432
  name: "code_review"
  user: "postgres"
  password: "postgres"  # 生产环境从环境变量读取

redis:
  host: "localhost"
  port: 6379
  db: 0
  password: null

github:
  # 从GitHub Settings → Developer settings → Personal access tokens生成
  app_id: "${GITHUB_APP_ID}"        # GitHub App模式(可选)
  private_key_path: "${GITHUB_PRIVATE_KEY_PATH}"
  webhook_secret: "${GITHUB_WEBHOOK_SECRET}"  # Webhook签名密钥
  access_token: "${GITHUB_ACCESS_TOKEN}"  # 直接用Access Token(简单模式)

ai:
  provider: "anthropic"  # anthropic 或 openai
  model: "claude-3-5-sonnet-20240620"
  max_tokens: 8192
  temperature: 0.3
  # 成本控制:每天最大审查次数
  daily_limit: 100

review:
  # 审查规则配置
  enabled_rules:
    - bug
    - security
    - performance
    - style
    - best_practice
  # 忽略的文件(Glob模式)
  ignore_patterns:
    - "*.md"
    - "*.txt"
    - "node_modules/**"
    - "dist/**"
    - "build/**"
    - "*.min.js"
  # 每个PR最大审查文件数
  max_files_per_pr: 50
  # 单文件最大行数(超长文件截断)
  max_lines_per_file: 2000

notifications:
  # 失败时通知(可扩展:钉钉/飞书/Slack)
  enabled: false
  webhook_url: ""

3.5 环境变量配置

bash
# ============================================
# 创建 .env 文件(不要提交到Git!)
# ============================================

cat > .env << 'EOF'
# GitHub配置(至少需要其中一个)
GITHUB_ACCESS_TOKEN=ghp_your_token_here
GITHUB_WEBHOOK_SECRET=your_webhook_secret

# AI模型API Key
OPENAI_API_KEY=sk-xxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxx

# 数据库(生产环境使用更强的密码)
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/code_review

# Redis
REDIS_URL=redis://localhost:6379/0

# 应用密钥(生成方法:python -c "import secrets; print(secrets.token_hex(32))")
APP_SECRET_KEY=change_this_to_a_secure_random_string

# 日志级别
LOG_LEVEL=INFO
EOF

# 保护敏感文件
chmod 600 .env
echo ".env" >> .gitignore

GitHub Webhook配置

4.1 GitHub Webhook工作原理

GitHub Webhook允许我们在特定事件发生时(如有人推送代码到PR),GitHub服务器主动向我们的服务发送HTTP POST请求。

python
# ============================================
# GitHub Webhook处理
# 文件:app/api/webhooks.py
# ============================================

import hmac
import hashlib
from fastapi import APIRouter, Request, HTTPException, Header
from typing import Optional
from pydantic import BaseModel
from loguru import logger

router = APIRouter(prefix="/webhook", tags=["webhooks"])

class GitHubWebhookPayload(BaseModel):
    """GitHub Webhook Payload模型"""
    action: Optional[str] = None
    pull_request: Optional[dict] = None
    repository: Optional[dict] = None
    commits: Optional[list] = None
    
def verify_github_signature(payload_bytes: bytes, signature: str, secret: str) -> bool:
    """
    验证GitHub Webhook签名
    GitHub使用HMAC-SHA256签名,签名格式:sha256=
    """
    if not signature:
        return False
    
    # 提取签名值(去掉 "sha256=" 前缀)
    if signature.startswith("sha256="):
        expected_sig = signature[7:]
    else:
        return False
    
    # 计算实际的HMAC
    mac = hmac.new(
        key=secret.encode('utf-8'),
        msg=payload_bytes,
        digestmod=hashlib.sha256
    )
    actual_sig = mac.hexdigest()
    
    # 使用常量时间比较,避免时序攻击
    return hmac.compare_digest(expected_sig, actual_sig)

@router.post("/github")
async def handle_github_webhook(
    request: Request,
    x_github_event: str = Header(None),
    x_github_signature: Optional[str] = Header(None, alias="X-Hub-Signature-256"),
    x_hub_signature: Optional[str] = Header(None, alias="X-Hub-Signature"),
):
    """
    处理GitHub Webhook请求
    
    必须的Headers:
    - X-GitHub-Event: 事件类型(push, pull_request, etc.)
    - X-Hub-Signature-256: HMAC签名
    """
    
    # 1. 读取原始请求体(用于验证签名)
    body_bytes = await request.body()
    
    # 2. 获取Webhook配置
    from app.core.config import settings
    webhook_secret = settings.github.webhook_secret
    
    # 3. 验证签名(生产环境必须开启!)
    if webhook_secret:
        # 优先用sha256签名(新版),降级用sha1(旧版)
        signature = x_github_signature or x_hub_signature
        if not verify_github_signature(body_bytes, signature or "", webhook_secret):
            logger.warning("GitHub webhook signature verification failed")
            # 生产环境应该返回403,这里为了调试先放过
            # raise HTTPException(status_code=403, detail="Invalid signature")
    
    # 4. 解析事件类型
    event_type = x_github_event
    logger.info(f"Received GitHub webhook: event={event_type}")
    
    # 5. 处理不同的事件
    if event_type == "pull_request":
        return await handle_pull_request(body_bytes)
    elif event_type == "push":
        return await handle_push(body_bytes)
    else:
        logger.info(f"Ignoring event type: {event_type}")
        return {"status": "ignored", "event": event_type}

async def handle_pull_request(body_bytes: bytes):
    """处理Pull Request事件"""
    import json
    payload = json.loads(body_bytes)
    
    action = payload.get("action")
    pr = payload.get("pull_request", {})
    repo = payload.get("repository", {})
    
    logger.info(f"PR event: action={action}, pr=#{pr.get('number')}, repo={repo.get('full_name')}")
    
    # 只在PR打开或更新时触发审查
    if action not in ["opened", "synchronize", "reopened"]:
        return {"status": "skipped", "reason": f"action={action} not trigger review"}
    
    # 构造审查请求
    review_request = {
        "event_type": "pull_request",
        "action": action,
        "repository": repo.get("full_name"),
        "pr_number": pr.get("number"),
        "pr_title": pr.get("title"),
        "pr_body": pr.get("body"),
        "head_branch": pr.get("head", {}).get("ref"),
        "base_branch": pr.get("base", {}).get("ref"),
        "commit_sha": pr.get("head", {}).get("sha"),
        "author": pr.get("user", {}).get("login"),
        "diff_url": pr.get("diff_url"),
    }
    
    # 发送到任务队列
    from app.services.queue_service import enqueue_review
    enqueue_review(review_request)
    
    logger.info(f"Review enqueued for PR #{pr.get('number')} in {repo.get('full_name')}")
    
    return {"status": "enqueued", "pr": pr.get("number")}

async def handle_push(body_bytes: bytes):
    """处理Push事件(可选实现)"""
    # Push事件也可以触发审查,根据需求启用
    # 实现逻辑与PR类似,这里略过
    return {"status": "handled"}

4.2 在GitHub上配置Webhook

bash
# ============================================
# GitHub Webhook配置步骤
# ============================================

# 1. 部署你的服务(确保有公网可达的URL)

# 2. 在GitHub仓库 Settings → Webhooks → Add webhook
#    - Payload URL: https://your-domain.com/api/webhook/github
#    - Content type: application/json
#    - Secret: 填写你在.env中设置的GITHUB_WEBHOOK_SECRET
#    - SSL verification: Enable(需要有效的SSL证书)
#    - Events: 选择 "Pull requests"

# 3. 测试Webhook
# 在Webhooks页面点击 "Send test" → "Test" → "pull_request"
# 查看是否有响应

# 4. 本地开发调试
# 使用ngrok将本地服务暴露到公网:
ngrok http 8000
# 会给你一个公网URL,如 https://xxxx.ngrok.io
# 临时测试时可以用这个URL配置Webhook

代码审查核心逻辑实现

5.1 GitHub服务:获取PR详情和Diff

python
# ============================================
# GitHub服务封装
# 文件:app/services/github_service.py
# ============================================

from github import Github
from github.GithubException import GithubException
from loguru import logger
from typing import List, Optional, Dict, Any
from dataclasses import dataclass

@dataclass
class PullRequestInfo:
    """PR信息"""
    repo_name: str
    pr_number: int
    title: str
    body: str
    author: str
    head_sha: str
    head_branch: str
    base_branch: str
    diff_url: str

@dataclass
class FileDiff:
    """文件变更"""
    filename: str
    status: str  # added, removed, modified, renamed
    patch: str   # 具体的diff内容
    additions: int
    deletions: int
    language: str

class GitHubService:
    """GitHub API封装"""
    
    def __init__(self, access_token: str):
        self.github = Github(access_token)
        logger.info("GitHub service initialized")
    
    def get_pr_info(self, repo_name: str, pr_number: int) -> PullRequestInfo:
        """
        获取PR的基本信息
        
        Args:
            repo_name: 仓库名,格式 "owner/repo"
            pr_number: PR编号
        
        Returns:
            PullRequestInfo对象
        """
        repo = self.github.get_repo(repo_name)
        pr = repo.get_pull(pr_number)
        
        return PullRequestInfo(
            repo_name=repo_name,
            pr_number=pr.number,
            title=pr.title,
            body=pr.body or "",
            author=pr.user.login,
            head_sha=pr.head.sha,
            head_branch=pr.head.ref,
            base_branch=pr.base.ref,
            diff_url=pr.diff_url
        )
    
    def get_pr_files(self, repo_name: str, pr_number: int) -> List[FileDiff]:
        """
        获取PR中所有变更的文件
        
        Returns:
            文件变更列表
        """
        repo = self.github.get_repo(repo_name)
        pr = repo.get_pull(pr_number)
        
        files = []
        for file in pr.get_files():
            diff = FileDiff(
                filename=file.filename,
                status=file.status,
                patch=file.patch or "",  # patch是Git的diff格式
                additions=file.additions,
                deletions=file.deletions,
                language=self._detect_language(file.filename)
            )
            files.append(diff)
        
        logger.info(f"获取PR #{pr_number} 的 {len(files)} 个文件变更")
        return files
    
    def post_review_comment(
        self, 
        repo_name: str, 
        pr_number: int, 
        commit_sha: str,
        body: str, 
        line: Optional[int] = None,
        path: Optional[str] = None,
        event: str = "COMMENT"
    ):
        """
        在PR上发布审查评论
        
        Args:
            repo_name: 仓库名
            pr_number: PR编号
            commit_sha: 提交SHA(评论会关联到这个commit)
            body: 评论内容(支持Markdown)
            line: 代码行号(如果要对具体代码行评论)
            path: 文件路径(如果要对具体文件评论)
            event: 评论类型
                - COMMENT: 普通评论
                - APPROVE: 批准PR
                - REQUEST_CHANGES: 请求修改
        """
        repo = self.github.get_repo(repo_name)
        pr = repo.get_pull(pr_number)
        
        # 构造评论数据
        if path and line:
            # 单行评论
            create_params = {
                "body": body,
                "commit_id": commit_sha,
                "path": path,
                "line": line,
                "side": "RIGHT"  # RIGHT表示新代码行
            }
        else:
            # PR整体评论
            create_params = {
                "body": body,
                "event": event  # APPROVE, REQUEST_CHANGES, COMMENT
            }
        
        try:
            pr.create_review(**create_params)
            logger.info(f"Posted {event} comment on PR #{pr_number}")
        except GithubException as e:
            logger.error(f"Failed to post comment: {e}")
            raise
    
    def _detect_language(self, filename: str) -> str:
        """根据文件扩展名检测编程语言"""
        ext_map = {
            '.py': 'python',
            '.js': 'javascript',
            '.ts': 'typescript',
            '.jsx': 'javascript',
            '.tsx': 'typescript',
            '.java': 'java',
            '.go': 'go',
            '.rs': 'rust',
            '.cpp': 'cpp',
            '.c': 'c',
            '.cs': 'csharp',
            '.rb': 'ruby',
            '.php': 'php',
            '.swift': 'swift',
            '.kt': 'kotlin',
            '.scala': 'scala',
            '.vue': 'vue',
            '.svelte': 'svelte',
            '.css': 'css',
            '.scss': 'scss',
            '.html': 'html',
            '.sql': 'sql',
            '.sh': 'bash',
            '.yaml': 'yaml',
            '.yml': 'yaml',
            '.json': 'json',
            '.xml': 'xml',
            '.md': 'markdown',
        }
        
        for ext, lang in ext_map.items():
            if filename.endswith(ext):
                return lang
        return 'unknown'

Prompt工程:如何让AI给出专业审查

6.1 Prompt设计原则

要让AI给出高质量的代码审查,Prompt设计是关键。需要:

  1. 明确角色:让AI扮演资深代码审查员
  2. 提供上下文:代码变更的文件、仓库、PR信息
  3. 定义审查维度:Bug、安全、性能、风格、最佳实践
  4. 指定输出格式:方便程序解析的结构化JSON
  5. 设置约束:避免泛泛而谈,要有具体行号和代码引用

6.2 完整的审查Prompt模板

python
# ============================================
# Prompt模板管理
# 文件:app/services/prompt_template.py
# ============================================

from typing import List
from dataclasses import dataclass

@dataclass
class ReviewContext:
    """审查上下文"""
    repo_name: str
    pr_number: int
    pr_title: str
    pr_description: str
    author: str
    files: List[dict]  # [{"filename": "...", "patch": "...", "language": "..."}]

def build_code_review_prompt(ctx: ReviewContext) -> str:
    """
    构建代码审查的Prompt
    
    这个Prompt的设计目标:
    1. 让AI扮演资深代码审查员
    2. 关注多个维度:Bug、安全、性能、风格、最佳实践
    3. 输出结构化JSON,方便程序解析
    4. 每条问题都要有:文件、行号、问题描述、严重程度、修复建议
    """
    
    # 构建文件变更摘要
    files_summary = "\n".join([
        f"### File {i+1}: {f['filename']} ({f['language']})\n```diff\n{f['patch'][:3000]}...\n```"
        if len(f['patch']) > 3000 else f"### File {i+1}: {f['filename']} ({f['language']})\n```diff\n{f['patch']}\n```"
        for i, f in enumerate(ctx.files)
    ])
    
    prompt = f"""你是一位资深的代码审查专家,拥有10年以上的软件开发经验。你正在审查一个Pull Request,你的审查意见会直接影响到代码是否能合并到主分支。

## 审查任务

**仓库**: {ctx.repo_name}
**PR标题**: {ctx.pr_title}
**PR描述**: {ctx.pr_description or "(无描述)"}
**作者**: {ctx.author}
**文件数量**: {len(ctx.files)}

## 审查维度

请从以下5个维度对代码进行审查:

1. **Bug检测** 🔴
   - 潜在的运行时错误
   - 空指针/数组越界
   - 逻辑错误
   - 边界条件遗漏
   - 异常处理不当

2. **安全问题** 🔴🔴
   - SQL注入
   - XSS跨站脚本
   - 敏感信息泄露(API Key、密码、Token)
   - 权限绕过
   - 不安全的依赖

3. **性能问题** 🟠
   - 循环中的重复计算
   - 不必要的数据库查询(N+1问题)
   - 内存泄漏
   - 大数据量处理不当

4. **代码风格** 🟡
   - 命名不规范
   - 缺少注释或注释错误
   - 函数过长
   - 重复代码

5. **最佳实践** 🟡
   - 错误处理不完整
   - 日志不充分
   - 测试覆盖不足
   - 配置硬编码

## 代码变更

{files_summary}

## 输出要求

请以JSON格式输出审查结果:

```json
{{
  "review_summary": "总体评价,一句话说清楚这次PR的质量",
  "recommendation": "approve | request_changes | comment",
  "issues": [
    {{
      "severity": "critical | high | medium | low | info",
      "category": "bug | security | performance | style | best_practice",
      "file": "文件名(包含路径)",
      "line_start": 行号,
      "line_end": 行号,
      "title": "问题标题(简短,一句话)",
      "description": "详细描述问题",
      "code_snippet": "有问题的代码片段(如果适用)",
      "suggestion": "修复建议"
    }}
  ],
  "statistics": {{
    "total_issues": 总问题数,
    "critical": 严重问题数,
    "high": 高问题数,
    "medium": 中问题数,
    "low": 低问题数,
    "info": 信息数
  }}
}}
```

## 审查原则

1. **具体胜于抽象**:每个问题都要指出具体的文件和行号
2. **建设性建议**:每个问题都要给出具体的修复方案
3. **聚焦关键问题**:优先指出真正影响功能或安全的Bug
4. **区分严重程度**:critical和high必须修复,low和info可以放过
5. **保持专业语气**:指出问题时不要人身攻击,对事不对人

请开始审查。
"""
    return prompt

def build_summary_prompt(ctx: ReviewContext, issues: List[dict]) -> str:
    """
    构建审查结果摘要的Prompt
    用于在发布评论前生成一个可读性好的摘要
    """
    
    issues_json = json.dumps(issues, ensure_ascii=False, indent=2)
    
    return f"""你是一位资深软件工程师。请根据以下代码审查的详细结果,为这个Pull Request生成一个简洁的摘要评论。

## PR信息
**仓库**: {ctx.repo_name}
**PR标题**: {ctx.pr_title}
**作者**: {ctx.author}

## 审查发现

{issues_json}

## 要求

1. 生成一个Markdown格式的摘要
2. 突出显示最关键的3-5个问题
3. 包含统计数据(按严重程度分类)
4. 结尾给出明确的建议(approve/request_changes)
5. 控制在500字以内
6. 使用emoji增加可读性

请生成摘要。
"""

6.3 AI服务封装

python
# ============================================
# AI服务封装
# 文件:app/services/ai_service.py
# ============================================

import json
from typing import Optional, List, Dict, Any
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from loguru import logger

class AIService:
    """AI模型服务封装"""
    
    def __init__(self, provider: str, api_key: str, model: str, **kwargs):
        self.provider = provider
        self.model = model
        self.kwargs = kwargs
        
        # 初始化不同的AI客户端
        if provider == "openai":
            self.llm = ChatOpenAI(
                model=model,
                api_key=api_key,
                **kwargs
            )
        elif provider == "anthropic":
            self.llm = ChatAnthropic(
                model=model,
                anthropic_api_key=api_key,
                **kwargs
            )
        else:
            raise ValueError(f"不支持的AI提供商:{provider}")
        
        logger.info(f"AI服务初始化完成: provider={provider}, model={model}")
    
    async def review_code(self, prompt: str) -> Dict[str, Any]:
        """
        调用AI进行代码审查
        
        Args:
            prompt: 构建好的审查Prompt
        
        Returns:
            解析后的审查结果(字典)
        """
        logger.info("开始调用AI进行代码审查...")
        
        try:
            # 调用AI模型
            response = self.llm.invoke(prompt)
            content = response.content if hasattr(response, 'content') else str(response)
            
            # 解析JSON响应
            # AI可能把JSON放在markdown代码块里
            json_str = self._extract_json(content)
            
            if json_str:
                result = json.loads(json_str)
                logger.info(f"AI审查完成,发现 {len(result.get('issues', []))} 个问题")
                return result
            else:
                logger.warning("AI响应中无法提取JSON")
                return self._create_fallback_result()
                
        except json.JSONDecodeError as e:
            logger.error(f"JSON解析失败:{e}")
            return self._create_fallback_result()
        except Exception as e:
            logger.error(f"AI调用失败:{e}")
            raise
    
    def _extract_json(self, text: str) -> Optional[str]:
        """从文本中提取JSON(可能是markdown代码块)"""
        import re
        
        # 尝试匹配 ```json ... ``` 代码块
        json_match = re.search(r'```json\s*(.*?)\s*```', text, re.DOTALL)
        if json_match:
            return json_match.group(1)
        
        # 尝试匹配 ``` ... ``` 代码块
        code_match = re.search(r'```\s*(.*?)\s*```', text, re.DOTALL)
        if code_match:
            potential_json = code_match.group(1)
            # 检查是否像JSON
            if potential_json.strip().startswith('{'):
                return potential_json
        
        # 尝试直接解析整个文本
        text = text.strip()
        if text.startswith('{') and text.endswith('}'):
            return text
        
        return None
    
    def _create_fallback_result(self) -> Dict[str, Any]:
        """当解析失败时返回默认结果"""
        return {
            "review_summary": "代码审查完成,但由于格式问题无法解析详细结果",
            "recommendation": "comment",
            "issues": [],
            "statistics": {
                "total_issues": 0,
                "critical": 0,
                "high": 0,
                "medium": 0,
                "low": 0,
                "info": 0
            }
        }

# 工厂函数:创建AI服务
def create_ai_service(config: dict) -> AIService:
    """根据配置创建AI服务"""
    provider = config.get("provider", "anthropic")
    api_key = config.get("api_key") or config.get("anthropic_api_key") or config.get("openai_api_key")
    model = config.get("model", "claude-3-5-sonnet-20240620")
    
    return AIService(
        provider=provider,
        api_key=api_key,
        model=model,
        temperature=config.get("temperature", 0.3),
        max_tokens=config.get("max_tokens", 8192)
    )

6.4 审查Worker实现

python
# ============================================
# 审查Worker - 后台任务处理
# 文件:app/workers/review_worker.py
# ============================================

from redis import Redis
from rq import Queue, Worker
from rq.job import Job
from loguru import logger
import json
from datetime import datetime
from typing import List, Dict, Any

from app.services.github_service import GitHubService, PullRequestInfo, FileDiff
from app.services.ai_service import create_ai_service
from app.services.prompt_template import build_code_review_prompt, ReviewContext
from app.models.database import ReviewResult, CodeIssue, db_save_review
from app.core.config import settings

# 连接Redis队列
redis_conn = Redis(
    host=settings.redis.host,
    port=settings.redis.port,
    db=settings.redis.db,
    password=settings.redis.password
)
review_queue = Queue("review_tasks", connection=redis_conn)

def enqueue_review(review_request: dict):
    """将审查任务加入队列"""
    job = review_queue.enqueue(
        "app.workers.review_worker.process_review",
        review_request,
        job_timeout="10m",  # 最多10分钟
        result_ttl=86400     # 结果保存24小时
    )
    logger.info(f"审查任务入队: job_id={job.id}")
    return job.id

def process_review(review_request: dict) -> Dict[str, Any]:
    """
    处理代码审查任务(主函数)
    
    流程:
    1. 获取PR信息和文件变更
    2. 构建审查Prompt
    3. 调用AI进行审查
    4. 解析结果并发布评论到GitHub
    5. 保存结果到数据库
    """
    start_time = datetime.now()
    
    repo_name = review_request["repository"]
    pr_number = review_request["pr_number"]
    commit_sha = review_request["commit_sha"]
    
    logger.info(f"开始审查: repo={repo_name}, PR={pr_number}, commit={commit_sha[:8]}")
    
    try:
        # ========== 第1步:初始化服务 ==========
        github_service = GitHubService(settings.github.access_token)
        ai_service = create_ai_service(settings.ai.to_dict())
        
        # ========== 第2步:获取PR信息和变更文件 ==========
        pr_info = github_service.get_pr_info(repo_name, pr_number)
        files = github_service.get_pr_files(repo_name, pr_number)
        
        # 过滤掉不需要审查的文件
        files = filter_files(files)
        
        if not files:
            logger.info("没有需要审查的文件")
            return {"status": "skipped", "reason": "no files to review"}
        
        logger.info(f"需要审查的文件数: {len(files)}")
        
        # ========== 第3步:构建Prompt并调用AI ==========
        files_data = [
            {"filename": f.filename, "patch": f.patch, "language": f.language}
            for f in files
        ]
        
        ctx = ReviewContext(
            repo_name=repo_name,
            pr_number=pr_number,
            pr_title=pr_info.title,
            pr_description=pr_info.body,
            author=pr_info.author,
            files=files_data
        )
        
        prompt = build_code_review_prompt(ctx)
        ai_result = ai_service.review_code(prompt)
        
        # ========== 第4步:发布评论到GitHub ==========
        post_review_comments(github_service, repo_name, pr_number, commit_sha, ai_result)
        
        # ========== 第5步:保存结果到数据库 ==========
        review_result = parse_ai_result(ai_result, pr_info, commit_sha)
        db_save_review(review_result)
        
        elapsed = (datetime.now() - start_time).total_seconds()
        logger.info(f"审查完成!耗时{elapsed:.1f}秒,发现{len(ai_result.get('issues', []))}个问题")
        
        return {
            "status": "success",
            "repo": repo_name,
            "pr": pr_number,
            "issues_count": len(ai_result.get("issues", [])),
            "elapsed_seconds": elapsed
        }
        
    except Exception as e:
        logger.error(f"审查失败: {e}")
        raise

def filter_files(files: List[FileDiff]) -> List[FileDiff]:
    """过滤掉不需要审查的文件"""
    import fnmatch
    
    ignore_patterns = settings.review.ignore_patterns
    max_lines = settings.review.max_lines_per_file
    
    filtered = []
    for f in files:
        # 检查是否匹配忽略模式
        if any(fnmatch.fnmatch(f.filename, pattern) for pattern in ignore_patterns):
            continue
        
        # 检查文件是否为空
        if not f.patch.strip():
            continue
        
        # 截断过长的文件
        if len(f.patch.split('\n')) > max_lines:
            logger.warning(f"文件 {f.filename} 过长(>{max_lines}行),将被截断")
        
        filtered.append(f)
    
    return filtered

def post_review_comments(
    github_service: GitHubService,
    repo_name: str,
    pr_number: int,
    commit_sha: str,
    ai_result: dict
):
    """将审查结果发布为GitHub评论"""
    
    issues = ai_result.get("issues", [])
    recommendation = ai_result.get("recommendation", "comment")
    
    # 构建评论内容
    if issues:
        # 有问题:详细列出每个问题
        comment_body = build_detailed_comment(ai_result)
    else:
        # 没有问题:简洁的通过评论
        comment_body = build_approval_comment(ai_result)
    
    # 发布主评论
    try:
        github_service.post_review_comment(
            repo_name=repo_name,
            pr_number=pr_number,
            commit_sha=commit_sha,
            body=comment_body,
            event=map_recommendation_to_github_event(recommendation)
        )
    except Exception as e:
        logger.error(f"发布评论失败:{e}")
        # 失败时降级为普通评论
        github_service.post_review_comment(
            repo_name=repo_name,
            pr_number=pr_number,
            commit_sha=commit_sha,
            body=comment_body,
            event="COMMENT"
        )

def build_detailed_comment(ai_result: dict) -> str:
    """构建详细的审查评论"""
    stats = ai_result.get("statistics", {})
    issues = ai_result.get("issues", [])
    
    # 按严重程度分组
    critical = [i for i in issues if i.get("severity") == "critical"]
    high = [i for i in issues if i.get("severity") == "high"]
    medium = [i for i in issues if i.get("severity") == "medium"]
    
    comment = f"""## 🤖 AI Code Review 结果

**{ai_result.get('review_summary', '代码审查完成')}**

### 📊 统计

| 严重程度 | 数量 |
|---------|------|
| 🔴 Critical | {stats.get('critical', 0)} |
| 🟠 High | {stats.get('high', 0)} |
| 🟡 Medium | {stats.get('medium', 0)} |
| 🔵 Low | {stats.get('low', 0)} |
| ⚪ Info | {stats.get('info', 0)} |

"""
    
    # 严重问题详细列出
    if critical:
        comment += "\n### 🔴🔴 必须修复的严重问题\n\n"
        for issue in critical:
            comment += format_issue(issue)
    
    if high:
        comment += "\n### 🟠 强烈建议修复\n\n"
        for issue in high:
            comment += format_issue(issue)
    
    if medium:
        comment += "\n### 🟡 建议改进(可选)\n\n"
        for issue in medium[:5]:  # 限制显示数量
            comment += format_issue(issue)
        if len(medium) > 5:
            comment += f"\n_还有 {len(medium) - 5} 个中等问题,可点击查看完整报告_"
    
    return comment

def format_issue(issue: dict) -> str:
    """格式化单个问题为Markdown"""
    category_icon = {
        "bug": "🐛",
        "security": "🔒",
        "performance": "⚡",
        "style": "🎨",
        "best_practice": "✨"
    }
    
    icon = category_icon.get(issue.get("category", ""), "📝")
    
    return f"""**{icon} {issue.get('title', '问题')}**

📁 `{issue.get('file', 'unknown')}` {"(行 " + str(issue.get('line_start', '?')) + ")" if issue.get('line_start') else ''}

{issue.get('description', '')}

💡 **建议**: {issue.get('suggestion', '请修复这个问题')}

"""
}

def build_approval_comment(ai_result: dict) -> str:
    """构建通过的评论"""
    return f"""## ✅ AI Code Review 通过

{ai_result.get('review_summary', '代码审查完成,未发现严重问题。')}

---

_此审查由 AI Code Review 系统自动生成_"""

def map_recommendation_to_github_event(recommendation: str) -> str:
    """将AI推荐映射为GitHub review事件"""
    mapping = {
        "approve": "APPROVE",
        "request_changes": "REQUEST_CHANGES",
        "comment": "COMMENT"
    }
    return mapping.get(recommendation, "COMMENT")

def parse_ai_result(ai_result: dict, pr_info: PullRequestInfo, commit_sha: str) -> ReviewResult:
    """将AI结果解析为ReviewResult对象"""
    issues = []
    for issue_data in ai_result.get("issues", []):
        issue = CodeIssue(
            issue_type=issue_data.get("category", "bug"),
            severity=issue_data.get("severity", "medium"),
            file=issue_data.get("file", ""),
            line_start=issue_data.get("line_start", 0),
            line_end=issue_data.get("line_end", 0),
            title=issue_data.get("title", ""),
            description=issue_data.get("description", ""),
            suggestion=issue_data.get("suggestion", "")
        )
        issues.append(issue)
    
    return ReviewResult(
        pr_number=pr_info.pr_number,
        repository=pr_info.repo_name,
        commit_sha=commit_sha,
        status=ReviewStatus.COMPLETED,
        created_at=datetime.now(),
        completed_at=datetime.now(),
        issues=issues,
        summary=ai_result.get("review_summary", ""),
        stats=ai_result.get("statistics", {})
    )

Docker部署与生产配置

7.1 Docker配置

bash
# ============================================
# Dockerfile
# ============================================

FROM python:3.11-slim

# 设置工作目录
WORKDIR /app

# 安装系统依赖(PDF处理等可能需要)
RUN apt-get update && apt-get install -y \
    gcc \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# 复制依赖文件
COPY requirements.txt .

# 安装Python依赖
RUN pip install --no-cache-dir -r requirements.txt

# 复制应用代码
COPY app/ ./app/
COPY configs/ ./configs/

# 创建非root用户(安全最佳实践)
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# 暴露端口
EXPOSE 8000

# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

# 启动命令
CMD ["uvicorn", "app.api.main:app", "--host", "0.0.0.0", "--port", "8000"]
yaml
# ============================================
# docker-compose.yml - 生产环境部署
# ============================================

version: '3.8'

services:
  # 主应用
  api:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: ai-code-review-api
    ports:
      - "8000:8000"
    environment:
      - GITHUB_ACCESS_TOKEN=${GITHUB_ACCESS_TOKEN}
      - GITHUB_WEBHOOK_SECRET=${GITHUB_WEBHOOK_SECRET}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@db:5432/code_review
      - REDIS_URL=redis://redis:6379/0
      - APP_SECRET_KEY=${APP_SECRET_KEY}
      - LOG_LEVEL=${LOG_LEVEL:-INFO}
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    networks:
      - code-review-network

  # Worker进程(处理审查任务)
  worker:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: ai-code-review-worker
    command: ["rq", "worker", "--url", "redis://redis:6379/0", "-w", "app.workers.review_worker", "review_tasks"]
    environment:
      - GITHUB_ACCESS_TOKEN=${GITHUB_ACCESS_TOKEN}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@db:5432/code_review
      - REDIS_URL=redis://redis:6379/0
      - LOG_LEVEL=${LOG_LEVEL:-INFO}
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    restart: unless-stopped
    networks:
      - code-review-network

  # PostgreSQL数据库
  db:
    image: postgres:16-alpine
    container_name: ai-code-review-db
    environment:
      - POSTGRES_DB=code_review
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped
    networks:
      - code-review-network

  # Redis队列
  redis:
    image: redis:7-alpine
    container_name: ai-code-review-redis
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes
    restart: unless-stopped
    networks:
      - code-review-network

  # Nginx反向代理(可选,但推荐)
  nginx:
    image: nginx:alpine
    container_name: ai-code-review-nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
    depends_on:
      - api
    restart: unless-stopped
    networks:
      - code-review-network

networks:
  code-review-network:
    driver: bridge

volumes:
  postgres_data:
  redis_data:

7.2 数据库初始化

sql
# init.sql - 数据库初始化脚本

-- 启用pgvector扩展(用于向量检索)
CREATE EXTENSION IF NOT EXISTS vector;

-- 审查记录表
CREATE TABLE IF NOT EXISTS reviews (
    id SERIAL PRIMARY KEY,
    repository VARCHAR(255) NOT NULL,
    pr_number INTEGER NOT NULL,
    commit_sha VARCHAR(40) NOT NULL,
    author VARCHAR(100),
    status VARCHAR(20) DEFAULT 'pending',
    summary TEXT,
    stats JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,
    
    -- 索引:加速查询
    CONSTRAINT unique_review UNIQUE (repository, pr_number, commit_sha)
);

CREATE INDEX idx_reviews_repo_pr ON reviews(repository, pr_number);
CREATE INDEX idx_reviews_created ON reviews(created_at DESC);
CREATE INDEX idx_reviews_status ON reviews(status);

-- 代码问题表
CREATE TABLE IF NOT EXISTS code_issues (
    id SERIAL PRIMARY KEY,
    review_id INTEGER REFERENCES reviews(id) ON DELETE CASCADE,
    issue_type VARCHAR(30) NOT NULL,
    severity VARCHAR(20) NOT NULL,
    file VARCHAR(500) NOT NULL,
    line_start INTEGER,
    line_end INTEGER,
    title VARCHAR(500),
    description TEXT,
    suggestion TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_issues_review ON code_issues(review_id);
CREATE INDEX idx_issues_severity ON code_issues(severity);

-- 统计视图:每天的审查统计
CREATE OR REPLACE VIEW daily_stats AS
SELECT 
    DATE(created_at) as date,
    COUNT(*) as total_reviews,
    SUM((stats->>'total_issues')::int) as total_issues,
    SUM(CASE WHEN stats->>'critical'::text > '0' THEN 1 ELSE 0 END) as has_critical
FROM reviews
GROUP BY DATE(created_at)
ORDER BY date DESC;

7.3 一键部署脚本

bash
#!/bin/bash
# deploy.sh - 一键部署AI代码审查系统

set -e

echo "=========================================="
echo "🚀 AI Code Review 系统部署"
echo "=========================================="

# 1. 检查环境变量
if [ ! -f .env ]; then
    echo "❌ 错误:.env 文件不存在"
    echo "   请先创建 .env 文件(参考 configs/settings.yaml)"
    exit 1
fi

# 2. 检查Docker
if ! command -v docker &> /dev/null; then
    echo "❌ 错误:Docker 未安装"
    exit 1
fi

if ! command -v docker-compose &> /dev/null; then
    echo "❌ 错误:Docker Compose 未安装"
    exit 1
fi

# 3. 生成必要的密钥(如果没有设置)
if [ -z "$APP_SECRET_KEY" ]; then
    export APP_SECRET_KEY=$(python3 -c "import secrets; print(secrets.token_hex(32))")
    echo "APP_SECRET_KEY=$APP_SECRET_KEY" >> .env
fi

if [ -z "$POSTGRES_PASSWORD" ]; then
    export POSTGRES_PASSWORD=$(python3 -c "import secrets; print(secrets.token_urlsafe(16))")
    echo "POSTGRES_PASSWORD=$POSTGRES_PASSWORD" >> .env
fi

# 4. 构建镜像
echo ""
echo "🔨 构建Docker镜像..."
docker-compose build

# 5. 启动服务
echo ""
echo "🐳 启动服务..."
docker-compose up -d

# 6. 等待服务就绪
echo ""
echo "⏳ 等待服务启动..."
sleep 15

# 7. 检查健康状态
echo ""
echo "🏥 检查服务状态..."

check_service() {
    local name=$1
    local url=$2
    local max_attempts=10
    local attempt=1
    
    while [ $attempt -le $max_attempts ]; do
        if curl -sf "$url" > /dev/null 2>&1; then
            echo "  ✅ $name 在线"
            return 0
        fi
        echo "  ⏳ $name 等待中... ($attempt/$max_attempts)"
        sleep 3
        attempt=$((attempt + 1))
    done
    
    echo "  ❌ $name 启动失败"
    return 1
}

check_service "API" "http://localhost:8000/health"
check_service "Redis" "tcp://localhost:6379"

# 8. 查看运行状态
echo ""
echo "📋 运行中的容器:"
docker-compose ps

echo ""
echo "=========================================="
echo "✅ 部署完成!"
echo ""
echo "📝 接下来:"
echo "   1. 配置GitHub Webhook(参考文档第4节)"
echo "   2. 访问 http://localhost:8000/docs 查看API文档"
echo ""
echo "📌 常用命令:"
echo "   查看日志:docker-compose logs -f api"
echo "   重启服务:docker-compose restart"
echo "   停止服务:docker-compose down"
echo "=========================================="

监控、报警与持续优化

8.1 健康检查接口

python
# ============================================
# 健康检查与监控
# 文件:app/api/monitoring.py
# ============================================

from fastapi import APIRouter, Response
from pydantic import BaseModel
from datetime import datetime
import psutil
import os

router = APIRouter()

class HealthStatus(BaseModel):
    """健康状态响应"""
    status: str
    timestamp: str
    services: dict
    system: dict

@router.get("/health")
async def health_check():
    """健康检查接口 - 用于负载均衡器和健康探测"""
    
    # 检查各组件状态
    services = {
        "api": "healthy",
        "redis": check_redis(),
        "database": check_database()
    }
    
    # 判断整体状态
    all_healthy = all(v == "healthy" for v in services.values())
    
    return HealthStatus(
        status="healthy" if all_healthy else "degraded",
        timestamp=datetime.now().isoformat(),
        services=services,
        system=get_system_metrics()
    )

@router.get("/metrics")
async def metrics():
    """Prometheus格式的指标接口"""
    from app.models.database import ReviewSession
    
    # 获取统计指标
    total_reviews = ReviewSession.query.count()
    pending_reviews = ReviewSession.query.filter_by(status="pending").count()
    
    # 生成Prometheus文本格式
    metrics_text = f"""# HELP code_review_total_reviews Total number of reviews
# TYPE code_review_total_reviews counter
code_review_total_reviews {total_reviews}

# HELP code_review_pending_reviews Current pending reviews
# TYPE code_review_pending_reviews gauge
code_review_pending_reviews {pending_reviews}

# HELP code_review_up Whether the service is up
# TYPE code_review_up gauge
code_review_up 1
"""
    
    return Response(content=metrics_text, media_type="text/plain")

def check_redis() -> str:
    """检查Redis连接"""
    try:
        from app.core.database import redis_client
        redis_client.ping()
        return "healthy"
    except:
        return "unhealthy"

def check_database() -> str:
    """检查数据库连接"""
    try:
        from app.core.database import get_db
        db = get_db()
        db.execute("SELECT 1")
        return "healthy"
    except:
        return "unhealthy"

def get_system_metrics() -> dict:
    """获取系统指标"""
    return {
        "cpu_percent": psutil.cpu_percent(interval=0.1),
        "memory_percent": psutil.virtual_memory().percent,
        "disk_percent": psutil.disk_usage('/').percent,
        "pid": os.getpid()
    }

8.2 日志配置

python
# ============================================
# 日志配置
# 文件:app/core/logging.py
# ============================================

from loguru import logger
import sys
import json
from datetime import datetime

# 移除默认的logger配置
logger.remove()

# 添加控制台输出(带颜色)
logger.add(
    sys.stdout,
    format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
    level="INFO",
    filter=lambda record: "uzsq" not in record["name"]  # 过滤某些噪音日志
)

# 添加文件输出(JSON格式,方便日志收集)
logger.add(
    "logs/app_{time:YYYY-MM-DD}.log",
    format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
    level="DEBUG",
    rotation="00:00",  # 每天轮转
    retention="30 days",  # 保留30天
    compression="gz",  # 压缩旧日志
    serialize=True  # JSON格式
)

# 添加错误日志单独文件
logger.add(
    "logs/error_{time:YYYY-MM-DD}.log",
    format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
    level="ERROR",
    rotation="00:00",
    retention="90 days",
    serialize=True
)

# 添加结构化日志函数
def log_review_event(event_type: str, data: dict):
    """记录审查事件(结构化日志)"""
    logger.bind(
        event_type=event_type,
        **data
    ).info(f"Review event: {event_type}")

# 使用示例
if __name__ == "__main__":
    logger.info("应用启动")
    logger.debug("调试信息")
    logger.warning("警告信息")
    logger.error("错误信息", exc_info=True)
    
    log_review_event("review_started", {
        "pr_number": 123,
        "repository": "owner/repo",
        "files_count": 5
    })

常见问题与排错指南

9.1 Webhook收不到请求

bash
# ============================================
# 问题1:Webhook收不到请求
# ============================================

# 排查步骤:

# 1. 确认Webhook配置正确
# GitHub → Settings → Webhooks → 点击你的webhook
# 检查:
#   - Payload URL 是否正确(必须是公网可访问的HTTPS URL)
#   - Content type 是否为 application/json
#   - Events 是否包含 Pull requests

# 2. 检查服务器是否收到请求
# 查看nginx/access.log或应用日志
tail -f /var/log/nginx/access.log | grep webhook

# 查看应用日志中的webhook记录
docker-compose logs -f api | grep -i webhook

# 3. 检查防火墙/安全组
# 阿里云:安全组 → 添加入方向规则 → 允许8000端口
# nginx防火墙:sudo firewall-cmd --list-all

# 4. 测试本地连通性(需要公网)
# 使用curl测试webhook端点
curl -X POST https://your-domain.com/api/webhook/github \
  -H "Content-Type: application/json" \
  -H "X-GitHub-Event: pull_request" \
  -d '{"action": "opened", "pull_request": {"number": 1}}'

# 5. 使用ngrok进行本地调试
# 如果你的服务在本地,需要用ngrok暴露到公网
ngrok http 8000
# 然后用ngrok给的URL配置Webhook

9.2 GitHub API限流

python
# ============================================
# 问题2:GitHub API限流(403错误)
# ============================================

# GitHub API限制:
# - 未认证:60请求/小时
# - 个人Access Token:5000请求/小时
# - GitHub App:5000请求/小时(按组织算)

# 排查方法:

# 1. 查看GitHub API剩余配额
# 在请求头中查看 X-RateLimit-Remaining

import requests

response = requests.get(
    "https://api.github.com/rate_limit",
    headers={"Authorization": "Bearer YOUR_TOKEN"}
)
print(response.json())

# 2. 解决方案:安装GitHub App(推荐)
# GitHub App 的API配额是按组织算的,比个人Token高很多

# 创建GitHub App步骤:
# 1. GitHub Settings → Developer settings → GitHub Apps → New GitHub App
# 2. 填写信息:
#    - Homepage URL: https://your-domain.com
#    - Webhook URL: https://your-domain.com/api/webhook/github
#    - 权限:Pull requests → Read & Write
# 3. 生成私钥(用于签名JWT)
# 4. 安装到你的组织

# 3. 代码中使用GitHub App
from github import GithubIntegration

# 安装PyJWT(用于生成JWT)
# pip install PyJWT

def get_github_app_installation_token(app_id, private_key_path, install_id):
    """使用GitHub App获取安装Token"""
    import jwt
    import time
    
    # 读取私钥
    with open(private_key_path, 'r') as f:
        private_key = f.read()
    
    # 生成JWT
    payload = {
        'iat': int(time.time()),
        'exp': int(time.time()) + 600,  # 10分钟过期
        'iss': app_id
    }
    jwt_token = jwt.encode(payload, private_key, algorithm='RS256')
    
    # 获取安装Token
    response = requests.post(
        f"https://api.github.com/app/installations/{install_id}/access_tokens",
        headers={
            "Authorization": f"Bearer {jwt_token}",
            "Accept": "application/vnd.github+json"
        }
    )
    
    return response.json()['token']

# 4. 限流降级方案
# 如果配额不足,可以:
# - 减少审查频率(如只在PR首次创建时审查,不在每次push时审查)
# - 增加缓存,减少重复API调用
# - 限制单次审查的文件数量

9.3 AI响应超时或失败

python
# ============================================
# 问题3:AI响应超时或失败
# ============================================

# 排查步骤:

# 1. 检查API Key是否有效
# OpenAI: https://platform.openai.com/api-keys
# Anthropic: https://console.anthropic.com/settings/keys

# 测试API是否正常工作
import anthropic

client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=100,
    messages=[{"role": "user", "content": "Hello"}]
)
print(message)

# 2. 检查网络连通性
# AI服务需要访问OpenAI/Anthropic的API,确保服务器能访问
ping api.openai.com
curl -I https://api.openai.com/v1/models

# 3. 超时配置调整
# 在config中增加超时时间
ai:
  timeout: 120  # 120秒超时
  max_retries: 3  # 最多重试3次

# 4. 实现降级策略
# 如果AI完全失败,至少给出基本的静态分析结果
def review_with_fallback(files: List[FileDiff]) -> dict:
    """降级审查:AI失败时使用规则引擎"""
    try:
        return review_with_ai(files)
    except Exception as e:
        logger.error(f"AI审查失败,使用降级策略:{e}")
        return basic_static_analysis(files)

# 5. 查看AI日志获取详细信息
# LangChain的debug模式
import langchain
langchain.debug = True

# 或查看requests库的详细日志
import logging
logging.getLogger("httpx").setLevel(logging.DEBUG)
logging.getLogger("openai").setLevel(logging.DEBUG)

9.4 内存占用过高

bash
# ============================================
# 问题4:内存占用过高
# ============================================

# 监控内存使用
docker stats

# 问题原因:
# 1. Worker进程内存泄漏
# 2. 大文件导致ChromaDB向量数据库占用过多内存

# 解决方案:

# 1. 限制单次审查的文件大小
# 在settings.yaml中
review:
  max_lines_per_file: 2000  # 单文件最大2000行
  max_files_per_pr: 20      # 单次最多审查20个文件

# 2. 优化ChromaDB内存配置
# 如果使用了向量检索,限制内存
import chromadb
client = chromadb.PersistentClient(path="./chroma_data")
# 启动时配置
settings = chromadb.Settings(
    allow_reset=True,
    anonymized_telemetry=False,
    chroma_db_impl="duckdb+parquet"
)

# 3. 增加Docker内存限制
# docker-compose.yml
services:
  worker:
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G

# 4. 定期清理
# 重启Worker释放内存
docker-compose restart worker

# 或使用autoheal自动重启不健康容器
autoheal:
  image: willfarrell/autoheal:latest
  environment:
    - AUTOHEAL_CONTAINER_LABEL=autoheal
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock

总结

本文详细讲解了AI代码审查系统的完整实现:

  1. 架构设计:Webhook → FastAPI → Redis Queue → Worker → GitHub API
  2. 核心组件:GitHub服务、AI服务、Prompt模板、审查Worker
  3. 部署方案:Docker Compose一键部署,Nginx反向代理
  4. 监控运维:健康检查、Prometheus指标、结构化日志
  5. 排错指南:Webhook、限流、AI超时、内存问题的解决方案

这个系统可以直接用于团队实践中。需要根据自己团队的技术栈和需求做适当调整,比如增加对更多语言的支持、接入团队常用的IM工具通知等。

相关推荐