Skip to content

API 性能监控和测试指南

本指南详细说明了源丰后端API的性能监控、压力测试和优化方案,帮助您确保系统的高性能和稳定性。

📋 概述

API性能监控和测试是保障系统质量的重要环节。本指南涵盖了从开发到生产的完整性能测试流程,包括基准测试、压力测试、性能监控和优化建议。

🎯 性能指标

关键性能指标 (KPI)

指标名称目标值说明
响应时间< 200ms (95th)95%的请求响应时间
吞吐量> 1000 QPS每秒处理的请求数
并发用户数> 500同时在线用户数
错误率< 0.1%请求失败率
CPU使用率< 70%服务器CPU使用率
内存使用率< 80%服务器内存使用率
数据库连接池< 80%数据库连接池使用率

业务性能指标

业务场景目标指标说明
用户登录< 100ms登录接口响应时间
数据查询< 200ms列表查询接口响应时间
文件上传< 5s10MB文件上传时间
报表生成< 30s复杂报表生成时间
批量操作< 60s1000条记录批量处理时间

🛠️ 性能测试工具

1. Apache Bench (ab)

bash
# 基础压力测试
ab -n 1000 -c 10 http://localhost:8080/api/v1/health

# 带认证的测试
ab -n 1000 -c 10 -H "Authorization: Bearer your_token" http://localhost:8080/api/v1/users/profile

# POST请求测试
ab -n 100 -c 5 -T application/json -p post_data.json http://localhost:8080/api/v1/auth/login

2. wrk

bash
# 基础测试
wrk -t12 -c400 -d30s http://localhost:8080/api/v1/health

# 带认证的测试
wrk -t12 -c400 -d30s -H "Authorization: Bearer your_token" http://localhost:8080/api/v1/users/profile

# 自定义脚本测试
wrk -t12 -c400 -d30s -s post_script.lua http://localhost:8080/api/v1/todos

Lua脚本示例 (post_script.lua):

lua
wrk.method = "POST"
wrk.body = '{"title":"性能测试任务","description":"用于性能测试的任务"}'
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Authorization"] = "Bearer your_token"

3. hey

bash
# 基础测试
hey -n 1000 -c 10 http://localhost:8080/api/v1/health

# 带认证测试
hey -n 1000 -c 10 -H "Authorization: Bearer your_token" http://localhost:8080/api/v1/users/profile

# POST测试
hey -n 100 -c 5 -m POST -d '{"email":"test@example.com","password":"test123"}' \
  -H "Content-Type: application/json" http://localhost:8080/api/v1/auth/login

4. JMeter

JMeter测试计划配置

xml
<!-- JMeter测试计划示例 -->
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.5">
  <hashTree>
    <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="源丰后端API性能测试" enabled="true">
      <stringProp name="TestPlan.comments">API性能测试</stringProp>
      <boolProp name="TestPlan.functional_mode">false</boolProp>
      <boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
      <boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
      <elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="用户定义的变量" enabled="true">
        <collectionProp name="Arguments.arguments">
          <elementProp name="BASE_URL" elementType="Argument">
            <stringProp name="Argument.name">BASE_URL</stringProp>
            <stringProp name="Argument.value">http://localhost:8080</stringProp>
          </elementProp>
          <elementProp name="TOKEN" elementType="Argument">
            <stringProp name="Argument.name">TOKEN</stringProp>
            <stringProp name="Argument.value">your_jwt_token</stringProp>
          </elementProp>
        </collectionProp>
      </elementProp>
    </TestPlan>
    <hashTree>
      <ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="用户登录测试" enabled="true">
        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="循环控制器" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <stringProp name="LoopController.loops">100</stringProp>
        </elementProp>
        <stringProp name="ThreadGroup.num_threads">10</stringProp>
        <stringProp name="ThreadGroup.ramp_time">1</stringProp>
        <boolProp name="ThreadGroup.scheduler">false</stringProp>
        <stringProp name="ThreadGroup.duration"></stringProp>
        <stringProp name="ThreadGroup.delay"></stringProp>
      </ThreadGroup>
      <hashTree>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="用户登录" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="用户定义的变量" enabled="true">
            <collectionProp name="Arguments.arguments">
              <elementProp name="" elementType="HTTPArgument">
                <boolProp name="HTTPArgument.always_encode">false</boolProp>
                <stringProp name="Argument.value">{"email":"test@example.com","password":"test123"}</stringProp>
                <stringProp name="Argument.metadata">=</stringProp>
              </elementProp>
            </collectionProp>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${BASE_URL}</stringProp>
          <stringProp name="HTTPSampler.port">8080</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding">UTF-8</stringProp>
          <stringProp name="HTTPSampler.path">/api/v1/auth/login</stringProp>
          <stringProp name="HTTPSampler.method">POST</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">true</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">false</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
      </hashTree>
    </hashTree>
  </hashTree>
</jmeterTestPlan>

📊 性能监控

1. 应用级监控

Prometheus指标收集

go
// metrics.go
package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    // HTTP请求总数
    httpRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )

    // HTTP请求持续时间
    httpRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )

    // 数据库查询持续时间
    dbQueryDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "db_query_duration_seconds",
            Help:    "Database query duration in seconds",
            Buckets: []float64{0.001, 0.01, 0.1, 0.5, 1, 2, 5},
        },
        []string{"query_type", "table"},
    )

    // 活跃连接数
    activeConnections = promauto.NewGauge(
        prometheus.GaugeOpts{
            Name: "active_connections",
            Help: "Number of active connections",
        },
    )
)

// 记录HTTP请求指标
func RecordHTTPRequest(method, endpoint, status string, duration float64) {
    httpRequestsTotal.WithLabelValues(method, endpoint, status).Inc()
    httpRequestDuration.WithLabelValues(method, endpoint).Observe(duration)
}

// 记录数据库查询指标
func RecordDBQuery(queryType, table string, duration float64) {
    dbQueryDuration.WithLabelValues(queryType, table).Observe(duration)
}

中间件集成

go
// middleware.go
package middleware

import (
    "time"
    "github.com/gin-gonic/gin"
    "your-project/metrics"
)

func MetricsMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        start := time.Now()

        c.Next()

        duration := time.Since(start).Seconds()
        status := c.Writer.Status()

        metrics.RecordHTTPRequest(
            c.Request.Method,
            c.FullPath(),
            fmt.Sprintf("%d", status),
            duration,
        )
    }
}

2. 系统级监控

Node Exporter配置

yaml
# docker-compose.yml
version: '3.8'
services:
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=200h'
      - '--web.enable-lifecycle'
    restart: unless-stopped

volumes:
  prometheus_data:

Prometheus配置

yaml
# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: 'yuanfeng-backend'
    static_configs:
      - targets: ['localhost:9090']
    metrics_path: '/metrics'
    scrape_interval: 5s

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

3. 告警规则

yaml
# alert_rules.yml
groups:
  - name: yuanfeng-backend-alerts
    rules:
      # 高错误率告警
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} errors per second"

      # 高响应时间告警
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }} seconds"

      # 高CPU使用率告警
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is {{ $value }}%"

      # 高内存使用率告警
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is {{ $value }}%"

      # 磁盘空间不足告警
      - alert: DiskSpaceLow
        expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low"
          description: "Disk usage is {{ $value }}%"

🧪 性能测试场景

1. 基准性能测试

单接口基准测试

bash
#!/bin/bash
# benchmark.sh

# 测试配置
BASE_URL="http://localhost:8080"
TOKEN="your_jwt_token"

# 测试接口列表
declare -A endpoints=(
    ["/health"]="GET"
    ["/api/v1/users/profile"]="GET"
    ["/api/v1/employees"]="GET"
    ["/api/v1/todos"]="POST"
    ["/api/v1/attendance/clock-in"]="POST"
)

# 运行基准测试
run_benchmark() {
    local endpoint=$1
    local method=$2

    echo "测试 $method $endpoint"

    if [ "$method" = "GET" ]; then
        hey -n 1000 -c 10 -H "Authorization: Bearer $TOKEN" "$BASE_URL$endpoint"
    else
        hey -n 100 -c 5 -m POST -H "Authorization: Bearer $TOKEN" \
            -H "Content-Type: application/json" \
            -d '{"test": "data"}' "$BASE_URL$endpoint"
    fi

    echo "----------------------------------------"
}

# 执行所有测试
for endpoint in "${!endpoints[@]}"; do
    run_benchmark "$endpoint" "${endpoints[$endpoint]}"
done

2. 负载测试

渐进式负载测试

bash
#!/bin/bash
# load_test.sh

BASE_URL="http://localhost:8080"
TOKEN="your_jwt_token"

# 渐进式增加负载
for users in 10 50 100 200 500; do
    echo "测试并发用户数: $users"

    hey -n 5000 -c $users -H "Authorization: Bearer $TOKEN" \
        "$BASE_URL/api/v1/users/profile"

    echo "等待系统恢复..."
    sleep 30
done

3. 压力测试

极限压力测试

bash
#!/bin/bash
# stress_test.sh

BASE_URL="http://localhost:8080"
TOKEN="your_jwt_token"

# 持续压力测试
echo "开始5分钟持续压力测试..."

hey -z 5m -c 1000 -H "Authorization: Bearer $TOKEN" \
    "$BASE_URL/api/v1/users/profile"

echo "压力测试完成"

4. 混合场景测试

真实业务场景模拟

python
# mixed_scenario_test.py
import asyncio
import aiohttp
import random
import time
from datetime import datetime

class MixedScenarioTest:
    def __init__(self, base_url, token):
        self.base_url = base_url
        self.token = token
        self.session = None

    async def setup(self):
        self.session = aiohttp.ClientSession(
            headers={'Authorization': f'Bearer {self.token}'}
        )

    async def cleanup(self):
        if self.session:
            await self.session.close()

    async def user_login(self):
        """用户登录场景"""
        async with self.session.post(
            f'{self.base_url}/api/v1/auth/login',
            json={'email': 'test@example.com', 'password': 'test123'}
        ) as response:
            return await response.json()

    async def get_profile(self):
        """获取用户资料"""
        async with self.session.get(
            f'{self.base_url}/api/v1/users/profile'
        ) as response:
            return await response.json()

    async def create_todo(self):
        """创建待办事项"""
        todo_data = {
            'title': f'测试任务_{int(time.time())}',
            'description': '这是一个性能测试任务',
            'priority': random.choice(['low', 'medium', 'high'])
        }

        async with self.session.post(
            f'{self.base_url}/api/v1/todos',
            json=todo_data
        ) as response:
            return await response.json()

    async def clock_in(self):
        """上班打卡"""
        clock_data = {
            'latitude': 39.908823 + random.random() * 0.01,
            'longitude': 116.397470 + random.random() * 0.01,
            'note': '正常打卡'
        }

        async with self.session.post(
            f'{self.base_url}/api/v1/attendance/clock-in',
            json=clock_data
        ) as response:
            return await response.json()

    async def simulate_user_behavior(self):
        """模拟用户行为"""
        # 随机选择操作
        operations = [
            self.get_profile,
            self.create_todo,
            self.clock_in
        ]

        operation = random.choice(operations)
        try:
            result = await operation()
            return True
        except Exception as e:
            print(f"操作失败: {e}")
            return False

async def run_mixed_test():
    """运行混合场景测试"""
    base_url = "http://localhost:8080"
    token = "your_jwt_token"

    test = MixedScenarioTest(base_url, token)
    await test.setup()

    try:
        # 创建多个并发用户
        tasks = []
        for i in range(50):  # 50个并发用户
            for j in range(20):  # 每个用户执行20次操作
                task = asyncio.create_task(test.simulate_user_behavior())
                tasks.append(task)

        start_time = time.time()
        results = await asyncio.gather(*tasks)
        end_time = time.time()

        success_count = sum(results)
        total_count = len(results)
        duration = end_time - start_time

        print(f"测试完成:")
        print(f"总请求数: {total_count}")
        print(f"成功请求数: {success_count}")
        print(f"失败请求数: {total_count - success_count}")
        print(f"成功率: {success_count/total_count*100:.2f}%")
        print(f"总耗时: {duration:.2f}秒")
        print(f"平均QPS: {total_count/duration:.2f}")

    finally:
        await test.cleanup()

if __name__ == "__main__":
    asyncio.run(run_mixed_test())

📈 性能分析和优化

1. 性能分析工具

Go性能分析

bash
# 启用pprof
go run cmd/server/main.go -cpuprofile=cpu.prof -memprofile=mem.prof

# 分析CPU性能
go tool pprof cpu.prof

# 分析内存性能
go tool pprof mem.prof

# 生成性能图
go tool pprof -png cpu.prof > cpu.png
go tool pprof -png mem.prof > mem.png

数据库性能分析

sql
-- 查看慢查询
SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10;

-- 分析查询执行计划
EXPLAIN SELECT * FROM employees WHERE department_id = 1;

-- 查看索引使用情况
SELECT
    TABLE_NAME,
    INDEX_NAME,
    CARDINALITY,
    SUB_PART,
    PACKED,
    NULLABLE,
    INDEX_TYPE
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'yuanfeng_backend';

2. 性能优化建议

数据库优化

sql
-- 添加索引
CREATE INDEX idx_employees_department_id ON employees(department_id);
CREATE INDEX idx_attendance_user_date ON attendance(user_id, attendance_date);
CREATE INDEX idx_todos_status_priority ON todos(status, priority);

-- 优化查询
-- 避免SELECT *
SELECT id, name, email FROM employees WHERE department_id = 1;

-- 使用LIMIT分页
SELECT * FROM todos ORDER BY created_at DESC LIMIT 20 OFFSET 0;

-- 批量操作
INSERT INTO todos (title, description, user_id) VALUES
    ('任务1', '描述1', 1),
    ('任务2', '描述2', 1),
    ('任务3', '描述3', 1);

应用层优化

go
// 数据库连接池优化
db.SetMaxOpenConns(100)
db.SetMaxIdleConns(10)
db.SetConnMaxLifetime(time.Hour)

// 缓存优化
func GetUserProfile(userID int) (*User, error) {
    // 先检查缓存
    cacheKey := fmt.Sprintf("user:profile:%d", userID)
    if cached, err := redis.Get(cacheKey).Result(); err == nil {
        var user User
        json.Unmarshal([]byte(cached), &user)
        return &user, nil
    }

    // 缓存未命中,查询数据库
    user, err := getUserFromDB(userID)
    if err != nil {
        return nil, err
    }

    // 写入缓存
    userData, _ := json.Marshal(user)
    redis.Set(cacheKey, userData, 5*time.Minute)

    return user, nil
}

// 并发优化
func GetEmployeeList(departmentID int) ([]Employee, error) {
    var wg sync.WaitGroup
    var employees []Employee
    var mu sync.Mutex

    // 并发获取员工信息
    employeeIDs := getEmployeeIDs(departmentID)

    for _, id := range employeeIDs {
        wg.Add(1)
        go func(empID int) {
            defer wg.Done()
            emp, err := getEmployee(empID)
            if err == nil {
                mu.Lock()
                employees = append(employees, *emp)
                mu.Unlock()
            }
        }(id)
    }

    wg.Wait()
    return employees, nil
}

3. 缓存策略

Redis缓存配置

go
// cache.go
package cache

import (
    "time"
    "github.com/go-redis/redis/v8"
)

type Cache struct {
    client *redis.Client
}

func NewCache(addr, password string, db int) *Cache {
    rdb := redis.NewClient(&redis.Options{
        Addr:     addr,
        Password: password,
        DB:       db,
        PoolSize: 10,
    })

    return &Cache{client: rdb}
}

// 设置缓存
func (c *Cache) Set(key string, value interface{}, expiration time.Duration) error {
    return c.client.Set(context.Background(), key, value, expiration).Err()
}

// 获取缓存
func (c *Cache) Get(key string) (string, error) {
    return c.client.Get(context.Background(), key).Result()
}

// 删除缓存
func (c *Cache) Delete(key string) error {
    return c.client.Del(context.Background(), key).Err()
}

// 批量删除缓存
func (c *Cache) DeletePattern(pattern string) error {
    keys, err := c.client.Keys(context.Background(), pattern).Result()
    if err != nil {
        return err
    }

    if len(keys) > 0 {
        return c.client.Del(context.Background(), keys...).Err()
    }

    return nil
}

缓存使用示例

go
// service层使用缓存
func (s *employeeService) GetEmployee(id int) (*models.Employee, error) {
    cacheKey := fmt.Sprintf("employee:%d", id)

    // 尝试从缓存获取
    cached, err := s.cache.Get(cacheKey)
    if err == nil {
        var employee models.Employee
        if err := json.Unmarshal([]byte(cached), &employee); err == nil {
            return &employee, nil
        }
    }

    // 缓存未命中,从数据库获取
    employee, err := s.repo.GetByID(id)
    if err != nil {
        return nil, err
    }

    // 写入缓存
    if data, err := json.Marshal(employee); err == nil {
        s.cache.Set(cacheKey, data, 10*time.Minute)
    }

    return employee, nil
}

// 更新时清除缓存
func (s *employeeService) UpdateEmployee(employee *models.Employee) error {
    err := s.repo.Update(employee)
    if err != nil {
        return err
    }

    // 清除相关缓存
    cacheKey := fmt.Sprintf("employee:%d", employee.ID)
    s.cache.Delete(cacheKey)

    // 清除列表缓存
    s.cache.DeletePattern("employees:*")

    return nil
}

📋 性能测试检查清单

测试前准备

  • [ ] 确认测试环境配置正确
  • [ ] 准备测试数据
  • [ ] 配置监控工具
  • [ ] 设置告警阈值
  • [ ] 准备回滚方案

测试执行

  • [ ] 执行基准性能测试
  • [ ] 执行负载测试
  • [ ] 执行压力测试
  • [ ] 执行混合场景测试
  • [ ] 记录测试结果

测试后分析

  • [ ] 分析性能指标
  • [ ] 识别性能瓶颈
  • [ ] 制定优化方案
  • [ ] 验证优化效果
  • [ ] 编写测试报告

持续监控

  • [ ] 部署监控工具
  • [ ] 配置告警规则
  • [ ] 定期性能评估
  • [ ] 持续优化改进

最后更新: 2024-01-24 文档版本: v1.0 维护团队: 源丰后端开发团队

基于 MIT 许可发布