# Unique9-SE

**Repository Path**: software-engineering-pre-project/unique9-se

## Basic Information

- **Project Name**: Unique9-SE
- **Description**: 语音转文字
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-05
- **Last Updated**: 2025-12-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: Python

## README

# Unique9-SE 语音识别服务

基于 FunASR 的实时语音识别 API 服务，支持流式识别和标点添加。

## 项目结构

```
.
├── .python-version          # Python 版本配置
├── client.py               # 客户端测试示例
├── main.py                 # 服务主入口
├── service.py              # 语音识别核心服务
├── pyproject.toml          # 项目依赖配置
├── uv.lock                 # 依赖锁定文件
├── README.md               # 项目文档
├── api/                    # API 相关代码
│   ├── generate_openapi.py # OpenAPI 规范生成
│   ├── routes/             # API 路由
│       └── routes.py   
└── doc/                    # 文档
    └── openapi.json        # OpenAPI 规范文件
```

---

## 功能特性

- ✅ 实时语音识别（基于 Paraformer-large 模型）
- ✅ 自动标点符号添加
- ✅ 并行模型加载，快速启动
- ✅ 单例模式录音器，避免资源冲突
- ✅ SSE 流式通信
- ✅ FastAPI RESTful API
- ✅ 使用 jieba-fast 加速分词

---

## 快速开始

### 安装依赖

```bash
# 克隆项目
git clone <your-repo>
cd unique9-se

# 使用 uv 安装依赖（推荐）
uv sync

```

### 启动服务

```bash
python main.py
```

服务将在 `http://localhost:30703` 启动，访问 `/docs` 查看 Swagger API 文档。

---

## API 接口

### 📡 SSE 流式识别

**URL**: `GET /stream/recognize/{session_id}`

**说明**: 建立 SSE 连接接收实时识别结果

#### 事件格式

| 事件 | 数据格式 | 说明 |
|------|---------|------|
| `start` | `{"status": "connected", "timestamp": str, "session_id": str}` | 连接建立 |
| `message` | `{"type": "partial/final", "text": str, "timestamp": str, "session_id": str}` | 识别结果 |
| `ping` | `{"status": "ping", "timestamp": str, "session_id": str}` | 心跳（5秒） |
| `end` | `{"status": "completed", "timestamp": str, "session_id": str}` | 会话结束 |
| `error` | `{"code": int, "message": str, "timestamp": str, "session_id": str}` | 错误 |

#### ⚠️ message 事件类型（重要）

- **`type: "partial"`**: 实时识别中的部分结果（无标点）
  - 用于实时显示给用户
  - **不要发送给 llm_chat**
  
- **`type: "final"`**: 最终识别结果（含标点）✅
  - 用户说完一句话后的完整结果
  - **这个才是要发送给 llm_chat 的内容**

---

### 🎤 录音控制

#### 开始录音
```http
POST /recording/start/{session_id}
```

**说明**: 开始录音并将识别结果推送到指定的 session_id

**Path 参数**:
- `session_id` (string): 会话唯一标识符

**响应**:
```json
{
  "status": "recording",
  "session_id": "abc-123",
  "timestamp": "2025-12-01T12:00:00"
}
```

#### 停止录音
```http
POST /recording/stop
```

**说明**: 停止当前录音，触发 final 事件

**响应**:
```json
{
  "status": "stopped",
  "timestamp": "2025-12-01T12:00:05"
}
```

---

## 前端调用流程

### 1. 初始化阶段

```
1. 生成唯一 session_id（使用 crypto.randomUUID() 或 uuid 库）
2. 建立 SSE 连接：GET /stream/recognize/{session_id}
3. 监听 start 事件确认连接成功
```

### 2. 录音阶段

```
用户按下按钮
    ↓
调用 POST /recording/start/{session_id}
    ↓
用户说话（实时识别中）
    ↓
收到 partial 事件 → 实时显示文本（灰色/提示状态）
    ↓
用户松开按钮
    ↓
调用 POST /recording/stop
    ↓
收到 final 事件 → 显示最终结果（黑色/确认状态）
```

### 3. 调用 llm_chat 阶段

```
收到 final 事件
    ↓
提取 data.text
    ↓
调用 llm_chat 的 POST /chat/stream
    body: {
        "message": data.text,      ← ASR 的 final 结果
        "thread_id": session_id    ← 使用同一个 ID
    }
    ↓
接收 llm_chat 的 SSE 流 → 显示 AI 回复
```

---

## 关键要点

### ✅ DO（应该做的）

1. **只发送 final 结果给 llm_chat**
   ```javascript
   if (data.type === 'final') {
       sendToLLM(data.text);  // 只有这个时候才调用
   }
   ```

2. **使用同一个 session_id**
   - ASR 的 `session_id` = llm_chat 的 `thread_id`
   - 方便追踪完整对话流程

3. **区分显示状态**
   - `partial`: 灰色/虚化显示（实时反馈）
   - `final`: 黑色/加粗显示（确认状态）

### ❌ DON'T（不应该做的）

1. **不要发送 partial 给 llm_chat**
   ```javascript
   // ❌ 错误示范
   if (data.type === 'partial') {
       sendToLLM(data.text);  // 不要这样做！
   }
   ```

2. **不要忽略 final 事件**
   - partial 只是过程，final 才是结果
   - 必须等到 final 才能确定用户完整输入

---

## 数据流向图

```
用户语音
    ↓
ASR 服务
    ↓
partial 事件（实时识别，无标点） → 前端显示（灰色）
    ↓
final 事件（最终结果，有标点） → 前端显示（黑色）
    ↓                                    ↓
                            提取 final.text
                                    ↓
                            llm_chat 服务
                                    ↓
                            AI 回复（流式）
```

---

## 事件处理示例代码

```javascript
// 1. 建立 SSE 连接
const sessionId = crypto.randomUUID();
const eventSource = new EventSource(`/stream/recognize/${sessionId}`);

// 2. 监听 message 事件
eventSource.addEventListener('message', (event) => {
    const data = JSON.parse(event.data);
    
    if (data.type === 'partial') {
        // 实时显示（灰色）
        showRealtimeText(data.text);
    }
    else if (data.type === 'final') {
        // 显示最终结果（黑色）
        showFinalText(data.text);
        
        // ✅ 调用 llm_chat
        callLLMChat({
            message: data.text,
            thread_id: sessionId
        });
    }
});

// 3. 录音控制
button.addEventListener('mousedown', () => {
    fetch(`/recording/start/${sessionId}`, { method: 'POST' });
});

button.addEventListener('mouseup', () => {
    fetch('/recording/stop', { method: 'POST' });
});
```

---

## 与 llm_chat 的对接

### ASR 的输出（final）
```json
{
  "type": "final",
  "text": "今天天气怎么样？",
  "timestamp": "2025-12-01T12:00:00",
  "session_id": "abc-123"
}
```

### 发送给 llm_chat
```http
POST /chat/stream
Content-Type: application/json

{
  "message": "今天天气怎么样？",  // ← 来自 ASR 的 final.text
  "thread_id": "abc-123"          // ← 同一个 session_id
}
```

---

## 核心模块

### `service.py`
语音识别核心服务，包含：
- **`RealtimeSpeechRecognizer`**: 实时语音识别器
  - 并行加载 ASR 和标点符号模型
  - 支持流式识别和标点添加
- **`AudioRecorder`**: 音频录制器（单例模式）
  - 避免麦克风资源冲突
  - 支持实时音频流处理

### `api/routes/routes.py`
API 路由定义，提供：
- SSE 流式识别接口
- 录音控制接口（开始/停止）

### `main.py`
服务启动入口，初始化 FastAPI 应用。

### `client.py`
客户端测试示例，演示如何使用语音识别服务。

---

## 技术栈

- **FastAPI**: Web 框架
- **FunASR**: 语音识别引擎（Paraformer-large + CT-Punc）
- **jieba-fast**: 快速中文分词（替代 jieba，提升启动速度）
- **SoundDevice**: 音频采集
- **NumPy**: 数值计算
- **Pydantic**: 数据验证

---

## 性能优化

- ✅ 使用 `jieba-fast` 替代 `jieba`，提升分词速度 3-5 倍
- ✅ 并行加载 ASR 和标点符号模型，减少启动时间约 50%
- ✅ 线程池处理最终结果，避免阻塞主线程
- ✅ 单例模式录音器，避免资源浪费

---

## 测试

### 后端测试
```bash
# 使用测试客户端
python client.py
```

### 前端测试流程
1. 按住按钮说话
2. 观察实时文本变化（partial）
3. 松开按钮
4. 看到完整结果（final）
5. 确认 llm_chat 收到了 final.text

---

## 常见问题

**Q: 什么时候调用 llm_chat？**  
A: 收到 `type: "final"` 的 message 事件时。

**Q: partial 有什么用？**  
A: 给用户实时反馈，让用户知道系统在识别。不发送给 llm_chat。

**Q: 一次对话多次录音怎么处理？**  
A: 
- 第一次录音 → final1 → 发给 llm_chat（thread_id: xxx）
- 第二次录音 → final2 → 发给 llm_chat（thread_id: xxx）
- 使用同一个 thread_id，llm_chat 会记住上下文

**Q: 如何判断用户说完了？**  
A: 收到 `type: "final"` 事件就代表这句话说完了。

**Q: 为什么启动这么快？**  
A: 使用了 jieba-fast 和并行模型加载优化。

---

## API 文档

启动服务后访问：
- **Swagger UI**: `http://localhost:30703/docs`
- **ReDoc**: `http://localhost:30703/redoc`
- **OpenAPI JSON**: `doc/openapi.json`

---

## 依赖项

详见 `pyproject.toml`

主要依赖：
- `fastapi >= 0.120.4`
- `funasr[all] >= 1.2.7`
- `jieba-fast >= 0.53`
- `numpy >= 2.3.4`
- `sounddevice >= 0.5.1`

---

## 技术支持

- **ASR 模块问题**: 联系后端团队
- **llm_chat 集成**: 参考 llm_chat 的 README
- **问题反馈**: 提 Issue 或联系开发团队

---

## License

Apache License 2.0

Copyright (c) 2025 Unique9

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.