RAG(Retrieval-Augmented Generation,检索增强生成)的核心流程是:用户提问 → 检索相关文档 → 将文档内容注入LLM上下文 → 生成回答。在本地搭建一套完整的RAG系统,需要三类组件协同工作:
本指南使用的版本组合经过实际测试验证:
| 组件 | 推荐版本 | 说明 |
|---|---|---|
| Ollama | ≥ 0.5.0 | 支持tool calling和Function Calling |
| Qdrant | ≥ 1.7.0 | 推荐使用Docker部署 |
| Dify | ≥ 1.0.0 | 社区版,使用Docker Compose部署 |
| Embedding模型 | m3e-base / bge-large-zh | 中文效果好,支持本地运行 |
| LLM模型 | qwen2.5-7b-instruct | 7B参数,适合本地单卡运行 |
最低配置要求(单卡7B模型推荐):
下面的命令在 Ubuntu 22.04 / 20.04 上验证通过。CentOS/Alibaba Cloud Linux 请替换对应的包管理器命令。
# ===== Step 1: 检查NVIDIA驱动 =====
nvidia-smi
# 期望输出:显示GPU型号、驱动版本、显存信息
# 如果报错 "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver"
# 说明驱动未安装,需要先安装驱动
# ===== Step 2: 安装NVIDIA驱动(如果未安装) =====
sudo apt update
sudo apt install -y nvidia-driver-535 nvidia-dkms-535
# 或者使用.run文件安装(从NVIDIA官网下载对应的驱动安装包):
# chmod +x NVIDIA-Linux-x86_64-535.xxx.run
# sudo ./NVIDIA-Linux-x86_64-535.xxx.run --silent
# 安装完成后重启
sudo reboot
# ===== Step 3: 验证驱动安装 =====
nvidia-smi
# 正常输出示例:
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
# +-------------------------------+----------------------+----------------------+
# ===== Step 4: 安装Docker =====
sudo apt install -y apt-transport-https ca-certificates curl gnupg lsb-release
# 添加Docker GPG密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 添加Docker仓库
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# 启动Docker并设置开机自启
sudo systemctl start docker
sudo systemctl enable docker
# ===== Step 5: 安装NVIDIA Container Toolkit =====
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-toolkit/installer.sh | sudo bash
# 或者手动安装:
# sudo apt install -y nvidia-container-toolkit
# sudo nvidia-ctk runtime configure --runtime=docker
# sudo systemctl restart docker
# ===== Step 6: 验证NVIDIA Container Toolkit =====
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
# 期望:在Docker容器内也能调用nvidia-smi成功
nvidia-smi: command not found 或 CUDA error,请检查:sudo nvidia-ctk runtime configure --runtime=docker 是否执行过/etc/docker/daemon.json 中是否正确配置了 nvidia runtime推荐使用Docker部署Ollama,这样可以与宿主机环境隔离,避免依赖冲突。
# 创建Ollama数据目录(用于存储下载的模型)
mkdir -p /data/ollama/models
mkdir -p /data/ollama/data
# 编写 docker-compose.yml
cat > /data/ollama/docker-compose.yml << 'EOF'
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434" # Ollama API端口
volumes:
# 将宿主机的模型目录挂载到容器内
- /data/ollama/models:/root/.ollama/models
- /data/ollama/data:/data
environment:
# 设置能访问的GPU(默认all)
- CUDA_VISIBLE_DEVICES=0
# OLLAMA_HOST=0.0.0.0 让容器外的服务也能访问API
- OLLAMA_HOST=0.0.0.0
# OLLAMA_MODELS=/root/.ollama/models 模型存储路径
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- rag-network
networks:
rag-network:
driver: bridge
EOF
# 启动Ollama
cd /data/ollama
docker compose up -d
# 查看日志确认启动成功
docker compose logs -f ollama
# 正常日志:#1] listening on http://0.0.0.0:11434
# 如果看到 "Error: could not sync gpu driver" 说明驱动或CUDA有问题
# ===== 进入Ollama容器 =====
docker exec -it ollama ollama list
# 查看当前已下载的模型
# ===== 下载Embedding模型(用于文档向量化)=====
# m3e-base:铭之川开源的中文Embedding模型,适合本地使用
docker exec -it ollama ollama pull m3e-base
# 下载完成后显示:pulling manifest ... pulling 8f6d7a1d2f3e ... success
# ===== 下载LLM模型 ======
# qwen2.5-7b-instruct:阿里通义千问2.5,7B参数,中文能力强
docker exec -it ollama ollama pull qwen2.5-7b-instruct
# 也可以试试 qwen2.5-14b-instruct(如果显存足够)
# docker exec -it ollama ollama pull qwen2.5-14b-instruct
# ===== 验证模型是否可用 =====
docker exec -it ollama ollama show qwen2.5-7b-instruct
# 显示模型详情:尺寸、参数、层数等
# 测试Ollama是否正常运行(宿主机上执行)
curl http://localhost:11434/api/tags
# 期望输出:{"models": [{"name": "qwen2.5-7b-instruct", ...}, {"name": "m3e-base", ...}]}
# 测试LLM推理
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-7b-instruct",
"prompt": "你好,请用一句话介绍RAG是什么?",
"stream": false
}'
# 期望输出JSON包含模型生成的回复
# 测试Embedding生成
curl http://localhost:11434/api/embeddings -d '{
"model": "m3e-base",
"prompt": "这是一段测试文本"
}'
# 返回一个向量数组 [0.0231, -0.0945, ...]
# 查看已下载的模型
ollama list
# 删除模型(释放磁盘空间)
ollama rm qwen2.5-7b-instruct
# 复制模型(用于自定义微调)
ollama cp qwen2.5-7b-instruct my-custom-qwen
# 查看模型信息
ollama show qwen2.5-7b-instruct
# 运行交互式对话
ollama run qwen2.5-7b-instruct# 创建Qdrant数据目录
mkdir -p /data/qdrant/storage
mkdir -p /data/qdrant/snapshots
# 编写 docker-compose.yml
cat > /data/qdrant/docker-compose.yml << 'EOF'
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
restart: unless-stopped
ports:
- "6333:6333" # REST API端口
- "6334:6334" # gRPC端口(用于高性能检索)
volumes:
- /data/qdrant/storage:/qdrant/storage # 数据存储目录
- /data/qdrant/snapshots:/qdrant/snapshots # 快照备份目录
environment:
# Qdrant日志级别,可选:DEBUG, INFO, WARN, ERROR
- QDRANT__LOG_LEVEL=INFO
# 开启性能分析(生产环境建议关闭)
# - QDRANT__SERVICE__MEASUREMENT_INTERVAL=10000
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- rag-network
networks:
rag-network:
driver: bridge
EOF
# 启动Qdrant
cd /data/qdrant
docker compose up -d
# 查看日志
docker compose logs -f qdrant
# 正常日志:HTTP API正在监听:6333 端口
# gRPC API正在监听:6334 端口
Collection是Qdrant中存储同类文档向量的容器。在导入文档前,需要先创建Collection并配置向量化参数。
# 创建名为 "knowledge_base" 的Collection
# vector_size: 1024 对应 m3e-base 模型的向量维度
# distance: Cosine 表示用余弦相似度做检索排序
curl -X PUT "http://localhost:6333/collections/knowledge_base" \
-H "Content-Type: application/json" \
-d '{
"vectors": {
"size": 1024,
"distance": "Cosine"
},
"optimizers_config": {
"default_segment_number": 2,
"memmap_threshold": 50000,
"indexing_threshold": 20000
},
"params": {
"vectors": {
"on_disk": false
}
}
}'
# 期望返回:
# {"result": true, "status": "acknowledged", "time": 0.001}
# 查看已创建的Collection
curl http://localhost:6333/collections/knowledge_base
# 返回Collection详情:向量维度、文档数、状态等
# 删除Collection(如果需要重建)
curl -X DELETE "http://localhost:6333/collections/knowledge_base"
# {"result": true, "status": "acknowledged"}
size 参数。维度不匹配会导致检索结果完全错误。如果数据量大(>100万条向量),单节点Qdrant可能不够用,可以搭建分布式集群:
# 在各节点上启动Qdrant,然后通过配置cluster.yml组建集群
cat > /data/qdrant/config/cluster.yml << 'EOF'
# Qdrant集群配置文件
# 节点1: 192.168.1.101
# 节点2: 192.168.1.102
# 节点3: 192.168.1.103
cluster:
# 集群配置节点列表
config:
# 共识参数
consensus:
tick_period_ms: 100
# 本节点信息
node:
# 本节点IP和端口
uri: "http://192.168.1.101:6335"
EOF
# 克隆Dify源码(推荐使用稳定版本tag)
cd /data
git clone https://github.com/langgenius/dify.git
cd dify
# 切换到稳定版本(当前最新LTS版本)
git checkout v1.0.0
# 查看Docker Compose配置
ls docker/
Dify社区版默认使用SQLite做数据库,生产环境建议改用PostgreSQL。下面是完整的docker-compose.yml配置:
# 编写生产级 docker-compose.yml
cat > /data/dify/docker-compose.yml << 'EOF'
version: '3.8'
services:
# Nginx反向代理
nginx:
image: nginx:latest
container_name: dify-nginx
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- api
- web
networks:
- dify-network
# PostgreSQL数据库(生产环境必须使用)
db:
image: postgres:15-alpine
container_name: dify-db
restart: unless-stopped
environment:
PGUSER: dify
PGDATABASE: dify
POSTGRES_PASSWORD: dify_secure_password_change_me
POSTGRES_INITDB_ARGS: 'encoding=UTF8'
volumes:
- db_data:/var/lib/postgresql/data
networks:
- dify-network
healthcheck:
test: ["CMD-SHELL", "pg_isready -U dify"]
interval: 10s
timeout: 5s
retries: 5
# Redis缓存
redis:
image: redis:7-alpine
container_name: dify-redis
restart: unless-stopped
volumes:
- redis_data:/data
networks:
- dify-network
command: redis-server --appendonly yes
healthcheck:
test: ["CMD-SHELL", "redis-cli ping"]
interval: 10s
timeout: 5s
retries: 5
# Dify后端API
api:
image: langgenius/dify-api:1.0.0
container_name: dify-api
restart: unless-stopped
ports:
- "5001:5001"
env_file:
- .env
environment:
# Ollama配置
OLLAMA_BASE_URL: http://ollama:11434
# 模型列表(逗号分隔)
MODULES: llm,rag
# 启用哪些模型(去掉不需要的可以减少内存占用)
ENABLE_LLM: true
ENABLE_RAG: true
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- dify-network
# Dify前端
web:
image: langgenius/dify-web:1.0.0
container_name: dify-web
restart: unless-stopped
ports:
- "3000:3000"
env_file:
- .env
depends_on:
- api
networks:
- dify-network
# Worker(异步任务处理)
worker:
image: langgenius/dify-api:1.0.0
container_name: dify-worker
restart: unless-stopped
env_file:
- .env
command: python -m celery worker -A app.celery -Q/general,/parsing,/emails index --concurrency 4
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- dify-network
# Ollama服务(通过内部网络访问)
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- /data/ollama/models:/root/.ollama/models
environment:
- CUDA_VISIBLE_DEVICES=0
- OLLAMA_HOST=0.0.0.0
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- dify-network
volumes:
db_data:
redis_data:
networks:
dify-network:
driver: bridge
EOF
生成环境变量配置:
# 生成Dify环境变量文件
cat > /data/dify/.env << 'EOF'
# =======================
# 基础配置
# =======================
# 密钥(生产环境请使用随机生成的强密码)
SECRET_KEY=change_me_to_a_strong_random_secret_key_at_least_32_chars
# 当前版本
DIFY_VERSION=1.0.0
# =======================
# 数据库配置(使用PostgreSQL)
# =======================
DB_USERNAME=dify
DB_PASSWORD=dify_secure_password_change_me
DB_HOST=db
DB_PORT=5432
DB_DATABASE=dify
# =======================
# Redis配置
# =======================
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_PASSWORD=dify_secure_password_change_me
# =======================
# 模型供应商配置
# =======================
# Ollama 配置
OLLAMA_BASE_URL=http://ollama:11434
# 默认使用的LLM模型(需要和Ollama中下载的模型名称一致)
DEFAULT_LLM_MODEL=qwen2.5-7b-instruct
# 默认使用的Embedding模型
DEFAULT_EMBEDDING_MODEL=m3e-base
# =======================
# 日志配置
# =======================
LOG_LEVEL=INFO
EOF
# 启动Dify所有服务
cd /data/dify
docker compose up -d
# 查看服务启动状态
docker compose ps
# 期望:所有服务的STATUS都是healthy/running
登录Dify后,进入"设置 → 模型供应商"页面进行配置:
# 通过Dify API配置Ollama
# 1. 先获取Access Token(假设Dify运行在5001端口)
curl -X POST "http://localhost:5001/api/v1/workspaces/current/model-providers" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"provider": "ollama",
"base_url": "http://ollama:11434",
"models": [
{
"model_name": "qwen2.5-7b-instruct",
"model_type": "llm",
"enabled": true
},
{
"model_name": "m3e-base",
"model_type": "text_embedding",
"enabled": true
}
]
}'
# 配置Qdrant作为向量数据库
curl -X PUT "http://localhost:5001/api/v1/datasets/vector-stores/qdrant" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"client_params": {
"url": "http://qdrant:6333",
"api_key": "",
"timeout": 30,
"https": false,
"grpc_port": 6334
},
"vector_field": "embedding",
"distance_field": "distance"
}'
# 通过API上传文档到Dify知识库
# 假设我们已经创建了一个名为 "my-knowledge" 的数据集
# Step 1: 上传文件
curl -X POST "http://localhost:5001/api/v1/datasets/my-knowledge/documents" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-F "file=@/path/to/your/document.pdf" \
-F "process_rule={\"mode\":\"automatic\",\"rules\":{}}"
# 返回结果包含 document_id,后续需要用到
# {"document": {"id": "uuid-of-document", "name": "document.pdf", "status": "processing"}}
# Step 2: 查询文档处理状态
curl "http://localhost:5001/api/v1/datasets/my-knowledge/documents/uuid-of-document" \
-H "Authorization: Bearer ${ACCESS_TOKEN}"
# 等待处理完成,status字段变为 "completed" 时表示处理完毕
# 处理过程中会自动:1.解析文档内容 2.分Chunk 3.生成Embedding存入Qdrant
# Step 3: 查看已处理文档列表
curl "http://localhost:5001/api/v1/datasets/my-knowledge/documents" \
-H "Authorization: Bearer ${ACCESS_TOKEN}"
# 查看Qdrant中某个Collection的全部点(Points)
curl -X GET "http://localhost:6333/collections/knowledge_base/points?scroll_by={\"limit\":3}" \
-H "Content-Type: application/json"
# 返回示例(已脱敏):
# {
# "result": {
# "points": [
# {
# "id": "1",
# "vector": [0.0231, -0.0945, ...], // 1024维向量
# "payload": {
# "text": "RAG的核心是检索增强生成...",
# "document_id": "uuid-of-document",
# "chunk_order": 1,
# "metadata": {
# "source": "document.pdf",
# "page": 1
# }
# }
# }
# ]
# },
# "status": "ok",
# "耗时": 0.001
# }
# 搜索测试:查找与query最相关的文档片段
curl -X POST "http://localhost:6333/collections/knowledge_base/points/search" \
-H "Content-Type: application/json" \
-d '{
"vector": [0.0231, -0.0945, ...], // 这里应该是 query 的embedding
"limit": 5,
"score_threshold": 0.7
}'
# 通过API创建一个完整的RAG应用
# Step 1: 创建数据集
curl -X POST "http://localhost:5001/api/v1/datasets" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "投肯AI知识库",
"description": "投肯智能技术文档知识库",
"indexing_technique": "high_quality",
"embedding_model": "m3e-base",
"embedding_model_retrieval_resource": true,
"permission": "only_me"
}'
# Step 2: 创建应用
curl -X POST "http://localhost:5001/api/v1/apps" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "投肯RAG助手",
"description": "基于本地知识库的AI问答助手",
"icon": "🤖",
"app_mode": "chatbot",
"model_config": {
"provider": "ollama",
"model_id": "qwen2.5-7b-instruct",
"temperature": 0.7,
"max_tokens": 2048
},
"dataset_configs": {
"datasets": [
{
"dataset_id": "dataset-uuid-here",
"rules": {
"top_k": 5,
"score_threshold": 0.7,
"reranking_model": null
}
}
],
"retrieval_method": "semantic_search"
}
}'
# Step 3: 测试对话
curl -X POST "http://localhost:5001/api/v1/chat-messages" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"app_id": "app-uuid-here",
"query": "RAG系统的核心组件有哪些?",
"response_mode": "blocking",
"conversation_id": null
}'
# 问题1:模型下载失败,提示 "no such file or directory"
# 原因:Ollama默认从官方仓库拉取,国内网络拉取HuggingFace模型可能超时
# 解决方案1:使用代理
export HTTP_PROXY=http://your-proxy:7890
export HTTPS_PROXY=http://your-proxy:7890
docker exec -it ollama ollama pull qwen2.5-7b-instruct
# 解决方案2:手动下载模型文件后导入
# 从魔搭社区(modelscope.cn)下载GGUF格式模型
# 然后放到 volumes 挂载的目录
docker exec -it ollama ollama create my-model -f /data/models/qwen2.5-7b-instruct-q4_k_m.gguf
# ============
# 问题2:Ollama服务正常但API返回500错误
# 原因:显存不足(OOM),模型加载失败
# 检查GPU显存使用情况
nvidia-smi
# 如果显存使用率 > 95%,说明显存不足
# 解决方案:降低模型量化级别或减少并发数
# qwen2.5-7b 默认使用 q4_K_M 量化,占用约4GB显存
# 如果显存不够,可以换用 q2_K(更小但精度稍低):
docker exec -it ollama ollama pull qwen2.5-7b-instruct:q2_k
# ============
# 问题3:容器内无法访问GPU
# 检查NVIDIA Container Toolkit是否正确安装
docker exec -it ollama nvidia-smi
# 如果报错:docker: Error response from daemon: could not select device driver "nvidia"...
# 解决方案:重新配置nvidia runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# 然后重启ollama容器
cd /data/ollama
docker compose down && docker compose up -d
# 问题1:Qdrant启动后无法访问6333端口
# 检查容器是否正常运行
docker compose ps
# 如果状态为 "restarting",说明启动失败
# 查看详细日志
docker compose logs qdrant
# 常见原因:storage目录权限不足
# 解决方案:修改目录权限
sudo chown -R 1000:1000 /data/qdrant/storage
cd /data/qdrant
docker compose up -d
# ============
# 问题2:导入向量后检索结果不准确
# 可能原因1:Embedding模型没有针对中文优化
# bge-large-zh 和 m3e-base 都对中文有优化,请确认使用这两个之一
# 可能原因2:检索参数设置不当
# top_k 设置过小可能遗漏相关内容
# score_threshold 设置过高可能过滤掉正确结果
# 建议的 top_k=10, score_threshold=0.5,然后在前端做二次过滤
# 解决方案:查看Qdrant的检索日志,确认检索的向量是否正确
curl -X POST "http://localhost:6333/collections/knowledge_base/points/search" \
-d '{"vector": [...], "limit": 10, "with_vector": true}'
# 对比返回的vector与实际embedding是否一致
# 问题1:Dify页面显示 "无法连接到模型供应商"
# 原因:Ollama容器与Dify不在同一Docker网络
# 检查网络连通性
docker exec -it dify-api ping -c 2 ollama
# 如果无法解析主机名,检查docker-compose.yml中的networks配置
# 解决方案:确保所有服务都在同一个network下
# 在docker-compose.yml中添加 networks:
# dify-network:
# external: false
# ============
# 问题2:文档解析失败,所有文档都显示 "failed" 状态
# 原因:Dify的文档解析Worker需要较多的内存和CPU
# 查看Worker日志
docker compose logs worker | grep -A 10 "parsing"
# 解决方案:增加Worker资源配置
# 在docker-compose.yml的worker服务中添加:
# deploy:
# resources:
# limits:
# memory: 4G
# reservations:
# memory: 2G
# 1. 启用混合检索(稀疏+稠密向量检索)
# Qdrant支持在同一次检索中同时使用BM25(稀疏)和向量相似度(稠密)
curl -X POST "http://localhost:6333/collections/knowledge_base/points/search" \
-H "Content-Type: application/json" \
-d '{
"vector": {"name": "embedding", "vector": [...]},
"prefetch": [
{
"prefetch": [{"vector": [...], "limit": 20}],
"query": "sparse-vector BM25权重"
}
],
"limit": 5
}'
# 2. 启用重排(Reranking)
# 在Qdrant初筛后,使用重排模型对结果进行二次排序
# 推荐使用 bge-reranker-large 模型:
# docker exec -it ollama ollama pull bge-reranker-large
# 1. 启用Ollama的streaming模式(减少首Token延迟)
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-7b-instruct",
"prompt": "你好",
"stream": true
}' | head -20
# 2. Ollama模型缓存
# 编辑 /etc/systemd/system/ollama.service 添加环境变量:
# Environment="OLLAMA_KEEP_ALIVE=24h"
# 这样模型会被缓存在显存中,避免每次请求重新加载
# 3. 并行推理配置
# OLLAMA_NUM_PARALLEL=4 设置并发数
# OLLAMA_MAX_LOADED_MODELS=2 最多同时加载几个模型
本指南从环境准备到Dify编排,详细介绍了使用Dify + Ollama + Qdrant搭建本地RAG知识库的全流程。关键要点回顾:
docker compose logs -f service_name完整的本地RAG系统搭建完成后,建议将关键配置和命令记录到运维文档中,方便后续维护和升级。