28.1 轻量化服务端部署（Flask/FastAPI）

TensorFlow轻量化服务端部署：Flask与FastAPI模型API封装实战

TensorFlow 中文手册

本章节详细讲解如何使用Flask和FastAPI将TensorFlow模型轻量化部署为RESTful API，包括单模型与多模型封装、参数校验、异常处理、API测试工具（Postman/curl）以及性能优化技巧（如缓存、多线程），适合初学者快速掌握模型部署基础。

TensorFlow轻量化服务端部署：Flask与FastAPI实战指南

引言

在深度学习项目中，将训练好的TensorFlow模型部署到服务端，使其能够通过API对外提供服务，是一个关键步骤。轻量化部署意味着使用高效、易于维护的框架（如Flask或FastAPI）来封装模型，提供RESTful接口，支持快速响应和扩展。本章节将引导新人逐步掌握轻量化服务端部署的核心技能，从基础设置到高级优化。

使用Flask封装TensorFlow模型为API

Flask是一个轻量级的Python Web框架，适合快速构建API。以下是封装TensorFlow模型的步骤：

1. 安装Flask和TensorFlow

首先，确保已安装Python和必要的库：

pip install flask tensorflow

2. 加载TensorFlow模型

假设你有一个已训练的TensorFlow模型保存为 model.h5，使用Keras加载它：

import tensorflow as tf
from flask import Flask, request, jsonify

# 加载模型
model = tf.keras.models.load_model('model.h5')

# 初始化Flask应用
app = Flask(__name__)

3. 定义API端点

创建一个简单的RESTful端点来处理预测请求。例如，单模型预测：

@app.route('/predict', methods=['POST'])
def predict():
    # 获取请求数据
    data = request.get_json()
    
    # 参数校验：检查是否有'input'字段
    if 'input' not in data:
        return jsonify({'error': 'Missing input field'}), 400
    
    # 进行预测
    try:
        # 假设输入是一个列表
        input_data = [data['input']]
        prediction = model.predict(input_data)
        return jsonify({'prediction': prediction.tolist()}), 200
    except Exception as e:
        # 异常处理
        return jsonify({'error': str(e)}), 500

4. 多模型部署示例

如果你需要部署多个模型，可以在Flask应用中定义不同端点：

model1 = tf.keras.models.load_model('model1.h5')
model2 = tf.keras.models.load_model('model2.h5')

@app.route('/predict/model1', methods=['POST'])
def predict_model1():
    # 类似单模型逻辑，使用model1
    pass

@app.route('/predict/model2', methods=['POST'])
def predict_model2():
    # 使用model2
    pass

运行Flask应用：

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

现在，通过 http://localhost:5000/predict 发送POST请求即可进行预测。

使用FastAPI封装TensorFlow模型为API

FastAPI是一个现代、高性能的Web框架，内置自动文档生成和异步支持，非常适合高并发场景。

1. 安装FastAPI和Uvicorn

pip install fastapi uvicorn

2. 加载模型并定义API

FastAPI使用Pydantic进行数据验证，使得参数校验更加简洁：

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import tensorflow as tf

# 加载模型
model = tf.keras.models.load_model('model.h5')

# 定义输入数据模型
class PredictionInput(BaseModel):
    input: list  # 假设输入是一个列表

# 初始化FastAPI应用
app = FastAPI()

# 定义预测端点，支持异步
@app.post('/predict')
async def predict(input_data: PredictionInput):
    try:
        prediction = model.predict([input_data.input])
        return {'prediction': prediction.tolist()}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

3. 自动文档

FastAPI会自动生成交互式API文档，访问 http://localhost:8000/docs 即可查看和测试。

4. 运行FastAPI应用

使用Uvicorn作为服务器，支持异步处理：

uvicorn main:app --reload --host 0.0.0.0 --port 8000

API接口的参数校验与异常处理

参数校验：在Flask中使用 request.get_json() 和条件判断，在FastAPI中使用Pydantic模型进行自动验证。确保输入数据的格式正确，避免模型错误。
异常处理：使用try-except块捕获错误，如TensorFlow预测错误或数据解析错误，并返回适当的HTTP状态码（如400表示客户端错误，500表示服务器错误）。

API测试：使用Postman和curl

使用Postman：在Postman中创建新请求，选择POST方法，设置URL为API端点（如 http://localhost:5000/predict），在Body中选择raw JSON，输入类似 {'input': [1, 2, 3]} 的数据，发送请求查看响应。

使用curl：在终端运行以下命令进行单样本测试：

curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"input": [1, 2, 3]}'

批量预测：在API设计中，可以支持批量输入，例如在请求中传递一个列表的列表，并在服务端循环处理。例如，FastAPI端点可以修改为接受 List[list] 类型输入。

轻量化部署的性能优化

为了提升API性能，可以考虑以下优化策略：

请求缓存：对重复的请求进行缓存，减少模型计算。例如，使用Python的 functools.lru_cache 或在服务端实现简单的缓存机制。
多线程处理：Flask默认是单线程的，可以通过使用Gunicorn等WSGI服务器启用多进程。FastAPI基于Starlette，支持异步操作，天然适合高并发。在部署时，可以配置Uvicorn使用多个工作进程。
静态图优化：TensorFlow模型在预测时可以使用静态图模式提升性能。通过 tf.function 装饰器或使用SavedModel格式加载模型，以减少动态图的开销。例如：
```
# 在加载模型后，使用tf.function
@tf.function
def predict_fn(input_data):
    return model(input_data)
```

总结

本章节介绍了TensorFlow模型的轻量化服务端部署，包括使用Flask和FastAPI封装RESTful API。通过参数校验、异常处理和API测试，确保API的健壮性和可用性。性能优化技巧如缓存、多线程和静态图使用，能进一步提升服务效率。这些步骤对新人友好，帮助快速上手实际部署。

接下来，你可以尝试修改代码，适配自己的模型和数据，并探索更多高级功能，如使用Docker容器化部署或结合云服务进行扩展。

上一章 27.4 TFLite 实战：端侧图像分类

下一章 28.2 企业级服务端部署（TensorFlow Serving）

TensorFlow 中文手册

28.1 轻量化服务端部署（Flask/FastAPI）

TensorFlow轻量化服务端部署：Flask与FastAPI实战指南

引言

使用Flask封装TensorFlow模型为API

1. 安装Flask和TensorFlow

2. 加载TensorFlow模型

3. 定义API端点

4. 多模型部署示例

使用FastAPI封装TensorFlow模型为API

1. 安装FastAPI和Uvicorn

2. 加载模型并定义API

3. 自动文档

4. 运行FastAPI应用

API接口的参数校验与异常处理

API测试：使用Postman和curl

轻量化部署的性能优化

总结

相关文档

Python 教程

FastAPI 教程

Django 6中文教程

Flask 中文教程

NumPy 中文教程

Scikit-learn 中文教程