@Cimix用Transformers部署DiffusionGemma,提供OpenAI格式接口 中发帖

干就完了,冲!

至于为什么不用vLLM…
因为是内网机器拉不下来镜像,而且我懒不想做太多编译
import asyncio
import json
import os
import re
import time
import uuid
from typing import Any

import torch
import uvicorn
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from transformers import AutoProcessor, Diffu...