API 性能监控和测试指南
本指南详细说明了源丰后端API的性能监控、压力测试和优化方案,帮助您确保系统的高性能和稳定性。
📋 概述
API性能监控和测试是保障系统质量的重要环节。本指南涵盖了从开发到生产的完整性能测试流程,包括基准测试、压力测试、性能监控和优化建议。
🎯 性能指标
关键性能指标 (KPI)
| 指标名称 | 目标值 | 说明 |
|---|---|---|
| 响应时间 | < 200ms (95th) | 95%的请求响应时间 |
| 吞吐量 | > 1000 QPS | 每秒处理的请求数 |
| 并发用户数 | > 500 | 同时在线用户数 |
| 错误率 | < 0.1% | 请求失败率 |
| CPU使用率 | < 70% | 服务器CPU使用率 |
| 内存使用率 | < 80% | 服务器内存使用率 |
| 数据库连接池 | < 80% | 数据库连接池使用率 |
业务性能指标
| 业务场景 | 目标指标 | 说明 |
|---|---|---|
| 用户登录 | < 100ms | 登录接口响应时间 |
| 数据查询 | < 200ms | 列表查询接口响应时间 |
| 文件上传 | < 5s | 10MB文件上传时间 |
| 报表生成 | < 30s | 复杂报表生成时间 |
| 批量操作 | < 60s | 1000条记录批量处理时间 |
🛠️ 性能测试工具
1. Apache Bench (ab)
bash
# 基础压力测试
ab -n 1000 -c 10 http://localhost:8080/api/v1/health
# 带认证的测试
ab -n 1000 -c 10 -H "Authorization: Bearer your_token" http://localhost:8080/api/v1/users/profile
# POST请求测试
ab -n 100 -c 5 -T application/json -p post_data.json http://localhost:8080/api/v1/auth/login2. wrk
bash
# 基础测试
wrk -t12 -c400 -d30s http://localhost:8080/api/v1/health
# 带认证的测试
wrk -t12 -c400 -d30s -H "Authorization: Bearer your_token" http://localhost:8080/api/v1/users/profile
# 自定义脚本测试
wrk -t12 -c400 -d30s -s post_script.lua http://localhost:8080/api/v1/todosLua脚本示例 (post_script.lua):
lua
wrk.method = "POST"
wrk.body = '{"title":"性能测试任务","description":"用于性能测试的任务"}'
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Authorization"] = "Bearer your_token"3. hey
bash
# 基础测试
hey -n 1000 -c 10 http://localhost:8080/api/v1/health
# 带认证测试
hey -n 1000 -c 10 -H "Authorization: Bearer your_token" http://localhost:8080/api/v1/users/profile
# POST测试
hey -n 100 -c 5 -m POST -d '{"email":"test@example.com","password":"test123"}' \
-H "Content-Type: application/json" http://localhost:8080/api/v1/auth/login4. JMeter
JMeter测试计划配置
xml
<!-- JMeter测试计划示例 -->
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.5">
<hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="源丰后端API性能测试" enabled="true">
<stringProp name="TestPlan.comments">API性能测试</stringProp>
<boolProp name="TestPlan.functional_mode">false</boolProp>
<boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
<boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
<elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="用户定义的变量" enabled="true">
<collectionProp name="Arguments.arguments">
<elementProp name="BASE_URL" elementType="Argument">
<stringProp name="Argument.name">BASE_URL</stringProp>
<stringProp name="Argument.value">http://localhost:8080</stringProp>
</elementProp>
<elementProp name="TOKEN" elementType="Argument">
<stringProp name="Argument.name">TOKEN</stringProp>
<stringProp name="Argument.value">your_jwt_token</stringProp>
</elementProp>
</collectionProp>
</elementProp>
</TestPlan>
<hashTree>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="用户登录测试" enabled="true">
<stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
<elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="循环控制器" enabled="true">
<boolProp name="LoopController.continue_forever">false</boolProp>
<stringProp name="LoopController.loops">100</stringProp>
</elementProp>
<stringProp name="ThreadGroup.num_threads">10</stringProp>
<stringProp name="ThreadGroup.ramp_time">1</stringProp>
<boolProp name="ThreadGroup.scheduler">false</stringProp>
<stringProp name="ThreadGroup.duration"></stringProp>
<stringProp name="ThreadGroup.delay"></stringProp>
</ThreadGroup>
<hashTree>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="用户登录" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="用户定义的变量" enabled="true">
<collectionProp name="Arguments.arguments">
<elementProp name="" elementType="HTTPArgument">
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<stringProp name="Argument.value">{"email":"test@example.com","password":"test123"}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
</collectionProp>
</elementProp>
<stringProp name="HTTPSampler.domain">${BASE_URL}</stringProp>
<stringProp name="HTTPSampler.port">8080</stringProp>
<stringProp name="HTTPSampler.protocol">http</stringProp>
<stringProp name="HTTPSampler.contentEncoding">UTF-8</stringProp>
<stringProp name="HTTPSampler.path">/api/v1/auth/login</stringProp>
<stringProp name="HTTPSampler.method">POST</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">true</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree/>
</hashTree>
</hashTree>
</hashTree>
</jmeterTestPlan>📊 性能监控
1. 应用级监控
Prometheus指标收集
go
// metrics.go
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
// HTTP请求总数
httpRequestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests",
},
[]string{"method", "endpoint", "status"},
)
// HTTP请求持续时间
httpRequestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "endpoint"},
)
// 数据库查询持续时间
dbQueryDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "db_query_duration_seconds",
Help: "Database query duration in seconds",
Buckets: []float64{0.001, 0.01, 0.1, 0.5, 1, 2, 5},
},
[]string{"query_type", "table"},
)
// 活跃连接数
activeConnections = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "active_connections",
Help: "Number of active connections",
},
)
)
// 记录HTTP请求指标
func RecordHTTPRequest(method, endpoint, status string, duration float64) {
httpRequestsTotal.WithLabelValues(method, endpoint, status).Inc()
httpRequestDuration.WithLabelValues(method, endpoint).Observe(duration)
}
// 记录数据库查询指标
func RecordDBQuery(queryType, table string, duration float64) {
dbQueryDuration.WithLabelValues(queryType, table).Observe(duration)
}中间件集成
go
// middleware.go
package middleware
import (
"time"
"github.com/gin-gonic/gin"
"your-project/metrics"
)
func MetricsMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
start := time.Now()
c.Next()
duration := time.Since(start).Seconds()
status := c.Writer.Status()
metrics.RecordHTTPRequest(
c.Request.Method,
c.FullPath(),
fmt.Sprintf("%d", status),
duration,
)
}
}2. 系统级监控
Node Exporter配置
yaml
# docker-compose.yml
version: '3.8'
services:
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
restart: unless-stopped
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
volumes:
prometheus_data:Prometheus配置
yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: 'yuanfeng-backend'
static_configs:
- targets: ['localhost:9090']
metrics_path: '/metrics'
scrape_interval: 5s
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:90933. 告警规则
yaml
# alert_rules.yml
groups:
- name: yuanfeng-backend-alerts
rules:
# 高错误率告警
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors per second"
# 高响应时间告警
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "High response time detected"
description: "95th percentile response time is {{ $value }} seconds"
# 高CPU使用率告警
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is {{ $value }}%"
# 高内存使用率告警
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected"
description: "Memory usage is {{ $value }}%"
# 磁盘空间不足告警
- alert: DiskSpaceLow
expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Disk space low"
description: "Disk usage is {{ $value }}%"🧪 性能测试场景
1. 基准性能测试
单接口基准测试
bash
#!/bin/bash
# benchmark.sh
# 测试配置
BASE_URL="http://localhost:8080"
TOKEN="your_jwt_token"
# 测试接口列表
declare -A endpoints=(
["/health"]="GET"
["/api/v1/users/profile"]="GET"
["/api/v1/employees"]="GET"
["/api/v1/todos"]="POST"
["/api/v1/attendance/clock-in"]="POST"
)
# 运行基准测试
run_benchmark() {
local endpoint=$1
local method=$2
echo "测试 $method $endpoint"
if [ "$method" = "GET" ]; then
hey -n 1000 -c 10 -H "Authorization: Bearer $TOKEN" "$BASE_URL$endpoint"
else
hey -n 100 -c 5 -m POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"test": "data"}' "$BASE_URL$endpoint"
fi
echo "----------------------------------------"
}
# 执行所有测试
for endpoint in "${!endpoints[@]}"; do
run_benchmark "$endpoint" "${endpoints[$endpoint]}"
done2. 负载测试
渐进式负载测试
bash
#!/bin/bash
# load_test.sh
BASE_URL="http://localhost:8080"
TOKEN="your_jwt_token"
# 渐进式增加负载
for users in 10 50 100 200 500; do
echo "测试并发用户数: $users"
hey -n 5000 -c $users -H "Authorization: Bearer $TOKEN" \
"$BASE_URL/api/v1/users/profile"
echo "等待系统恢复..."
sleep 30
done3. 压力测试
极限压力测试
bash
#!/bin/bash
# stress_test.sh
BASE_URL="http://localhost:8080"
TOKEN="your_jwt_token"
# 持续压力测试
echo "开始5分钟持续压力测试..."
hey -z 5m -c 1000 -H "Authorization: Bearer $TOKEN" \
"$BASE_URL/api/v1/users/profile"
echo "压力测试完成"4. 混合场景测试
真实业务场景模拟
python
# mixed_scenario_test.py
import asyncio
import aiohttp
import random
import time
from datetime import datetime
class MixedScenarioTest:
def __init__(self, base_url, token):
self.base_url = base_url
self.token = token
self.session = None
async def setup(self):
self.session = aiohttp.ClientSession(
headers={'Authorization': f'Bearer {self.token}'}
)
async def cleanup(self):
if self.session:
await self.session.close()
async def user_login(self):
"""用户登录场景"""
async with self.session.post(
f'{self.base_url}/api/v1/auth/login',
json={'email': 'test@example.com', 'password': 'test123'}
) as response:
return await response.json()
async def get_profile(self):
"""获取用户资料"""
async with self.session.get(
f'{self.base_url}/api/v1/users/profile'
) as response:
return await response.json()
async def create_todo(self):
"""创建待办事项"""
todo_data = {
'title': f'测试任务_{int(time.time())}',
'description': '这是一个性能测试任务',
'priority': random.choice(['low', 'medium', 'high'])
}
async with self.session.post(
f'{self.base_url}/api/v1/todos',
json=todo_data
) as response:
return await response.json()
async def clock_in(self):
"""上班打卡"""
clock_data = {
'latitude': 39.908823 + random.random() * 0.01,
'longitude': 116.397470 + random.random() * 0.01,
'note': '正常打卡'
}
async with self.session.post(
f'{self.base_url}/api/v1/attendance/clock-in',
json=clock_data
) as response:
return await response.json()
async def simulate_user_behavior(self):
"""模拟用户行为"""
# 随机选择操作
operations = [
self.get_profile,
self.create_todo,
self.clock_in
]
operation = random.choice(operations)
try:
result = await operation()
return True
except Exception as e:
print(f"操作失败: {e}")
return False
async def run_mixed_test():
"""运行混合场景测试"""
base_url = "http://localhost:8080"
token = "your_jwt_token"
test = MixedScenarioTest(base_url, token)
await test.setup()
try:
# 创建多个并发用户
tasks = []
for i in range(50): # 50个并发用户
for j in range(20): # 每个用户执行20次操作
task = asyncio.create_task(test.simulate_user_behavior())
tasks.append(task)
start_time = time.time()
results = await asyncio.gather(*tasks)
end_time = time.time()
success_count = sum(results)
total_count = len(results)
duration = end_time - start_time
print(f"测试完成:")
print(f"总请求数: {total_count}")
print(f"成功请求数: {success_count}")
print(f"失败请求数: {total_count - success_count}")
print(f"成功率: {success_count/total_count*100:.2f}%")
print(f"总耗时: {duration:.2f}秒")
print(f"平均QPS: {total_count/duration:.2f}")
finally:
await test.cleanup()
if __name__ == "__main__":
asyncio.run(run_mixed_test())📈 性能分析和优化
1. 性能分析工具
Go性能分析
bash
# 启用pprof
go run cmd/server/main.go -cpuprofile=cpu.prof -memprofile=mem.prof
# 分析CPU性能
go tool pprof cpu.prof
# 分析内存性能
go tool pprof mem.prof
# 生成性能图
go tool pprof -png cpu.prof > cpu.png
go tool pprof -png mem.prof > mem.png数据库性能分析
sql
-- 查看慢查询
SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10;
-- 分析查询执行计划
EXPLAIN SELECT * FROM employees WHERE department_id = 1;
-- 查看索引使用情况
SELECT
TABLE_NAME,
INDEX_NAME,
CARDINALITY,
SUB_PART,
PACKED,
NULLABLE,
INDEX_TYPE
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'yuanfeng_backend';2. 性能优化建议
数据库优化
sql
-- 添加索引
CREATE INDEX idx_employees_department_id ON employees(department_id);
CREATE INDEX idx_attendance_user_date ON attendance(user_id, attendance_date);
CREATE INDEX idx_todos_status_priority ON todos(status, priority);
-- 优化查询
-- 避免SELECT *
SELECT id, name, email FROM employees WHERE department_id = 1;
-- 使用LIMIT分页
SELECT * FROM todos ORDER BY created_at DESC LIMIT 20 OFFSET 0;
-- 批量操作
INSERT INTO todos (title, description, user_id) VALUES
('任务1', '描述1', 1),
('任务2', '描述2', 1),
('任务3', '描述3', 1);应用层优化
go
// 数据库连接池优化
db.SetMaxOpenConns(100)
db.SetMaxIdleConns(10)
db.SetConnMaxLifetime(time.Hour)
// 缓存优化
func GetUserProfile(userID int) (*User, error) {
// 先检查缓存
cacheKey := fmt.Sprintf("user:profile:%d", userID)
if cached, err := redis.Get(cacheKey).Result(); err == nil {
var user User
json.Unmarshal([]byte(cached), &user)
return &user, nil
}
// 缓存未命中,查询数据库
user, err := getUserFromDB(userID)
if err != nil {
return nil, err
}
// 写入缓存
userData, _ := json.Marshal(user)
redis.Set(cacheKey, userData, 5*time.Minute)
return user, nil
}
// 并发优化
func GetEmployeeList(departmentID int) ([]Employee, error) {
var wg sync.WaitGroup
var employees []Employee
var mu sync.Mutex
// 并发获取员工信息
employeeIDs := getEmployeeIDs(departmentID)
for _, id := range employeeIDs {
wg.Add(1)
go func(empID int) {
defer wg.Done()
emp, err := getEmployee(empID)
if err == nil {
mu.Lock()
employees = append(employees, *emp)
mu.Unlock()
}
}(id)
}
wg.Wait()
return employees, nil
}3. 缓存策略
Redis缓存配置
go
// cache.go
package cache
import (
"time"
"github.com/go-redis/redis/v8"
)
type Cache struct {
client *redis.Client
}
func NewCache(addr, password string, db int) *Cache {
rdb := redis.NewClient(&redis.Options{
Addr: addr,
Password: password,
DB: db,
PoolSize: 10,
})
return &Cache{client: rdb}
}
// 设置缓存
func (c *Cache) Set(key string, value interface{}, expiration time.Duration) error {
return c.client.Set(context.Background(), key, value, expiration).Err()
}
// 获取缓存
func (c *Cache) Get(key string) (string, error) {
return c.client.Get(context.Background(), key).Result()
}
// 删除缓存
func (c *Cache) Delete(key string) error {
return c.client.Del(context.Background(), key).Err()
}
// 批量删除缓存
func (c *Cache) DeletePattern(pattern string) error {
keys, err := c.client.Keys(context.Background(), pattern).Result()
if err != nil {
return err
}
if len(keys) > 0 {
return c.client.Del(context.Background(), keys...).Err()
}
return nil
}缓存使用示例
go
// service层使用缓存
func (s *employeeService) GetEmployee(id int) (*models.Employee, error) {
cacheKey := fmt.Sprintf("employee:%d", id)
// 尝试从缓存获取
cached, err := s.cache.Get(cacheKey)
if err == nil {
var employee models.Employee
if err := json.Unmarshal([]byte(cached), &employee); err == nil {
return &employee, nil
}
}
// 缓存未命中,从数据库获取
employee, err := s.repo.GetByID(id)
if err != nil {
return nil, err
}
// 写入缓存
if data, err := json.Marshal(employee); err == nil {
s.cache.Set(cacheKey, data, 10*time.Minute)
}
return employee, nil
}
// 更新时清除缓存
func (s *employeeService) UpdateEmployee(employee *models.Employee) error {
err := s.repo.Update(employee)
if err != nil {
return err
}
// 清除相关缓存
cacheKey := fmt.Sprintf("employee:%d", employee.ID)
s.cache.Delete(cacheKey)
// 清除列表缓存
s.cache.DeletePattern("employees:*")
return nil
}📋 性能测试检查清单
测试前准备
- [ ] 确认测试环境配置正确
- [ ] 准备测试数据
- [ ] 配置监控工具
- [ ] 设置告警阈值
- [ ] 准备回滚方案
测试执行
- [ ] 执行基准性能测试
- [ ] 执行负载测试
- [ ] 执行压力测试
- [ ] 执行混合场景测试
- [ ] 记录测试结果
测试后分析
- [ ] 分析性能指标
- [ ] 识别性能瓶颈
- [ ] 制定优化方案
- [ ] 验证优化效果
- [ ] 编写测试报告
持续监控
- [ ] 部署监控工具
- [ ] 配置告警规则
- [ ] 定期性能评估
- [ ] 持续优化改进
最后更新: 2024-01-24 文档版本: v1.0 维护团队: 源丰后端开发团队