ESP32S3适配百炼大模型记录
API报文说明参考文档
文档地址:报文说明参考文档
调用记录
下面是调试时的报文情况
[00010636][UT][I][app_cmd_init]done
[00010640][UT][W][_mmi_event_callback]event 0
[00010645][UT][D][c_mmi_data_set_prompt_param]prompt_param: {"name":"阿云","skill":"唱歌"}
[00010654][UT][I][c_mmi_ringbuffer_init]rb_p[0x3c155a78] rb_rb[0x3c157a90] write_mode[1] list_flag[0]
[00010664][UT][I][c_mmi_ringbuffer_init]rb_p[0x3c157acc] rb_rb[0x3c159ae4] write_mode[0] list_flag[1]
[00010673][UT][I][_mmi_set_ringbuffer]rb recorder[0x3c155a78] player[0x3c157acc]
[00010681][UT][I][c_mmi_init]header [Authorization: Bearer sk-5eee8c7fa0dc463d9632ed6b2b821850
]
[00010691][UT][I][c_mmi_init]ws_id[llm-z2uhjku27i6thqe3] app_id[mm_6ea692550a8045e498146ab50b2c]
[00010700][UT][W][_mmi_event_callback]event 1
[00010705][UT][I][util_malloc]size 245760
[00010709][UT][I][_websocket_init]mbedtls init
[00010721][UT][I][_websocket_init]done
[00010722][UT][I][c_websocket_task_init][ws_send_mmi] send task [0x3c19a35c]
[00010728][UT][I][c_websocket_task_init][ws_recv_mmi] recv task [0x3c19a37c]
[00012941][UT][I][_websocket_connect]wss update
[00012942][UT][I][_websocket_connect]request[239][GET /api-ws/v1/inference HTTP/1.1
Host: dashscope.aliyuncs.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: 1B03f+kP9Sy38j4qTwgjRg==
Sec-WebSocket-Version: 13
Authorization: Bearer sk-5eee8c7fa0dc463d9632ed6b2b821850
]
[00012964][UT][I][_websocket_connect]Reading response...
[00013246][UT][I][_websocket_connect]response[mmi][187][HTTP/1.1 101 Switching Protocols
upgrade: websocket
connection: upgrade
sec-websocket-accept: 4e1EbQwcfW9goaAPN/YZ1NpAxMY=
date: Fri, 09 Jan 2026 02:26:52 GMT
server: istio-envoy
]
[00013259][UT][I][_websocket_connect][mmi]done
[00013263][UT][I][_gen_cmd_start]task_id [ce3e4c16af0b3375539acb058d1a475d]
[00013273][UT][I][_gen_cmd_start]out [884][{"header":{"action":"run-task","streaming":"duplex","task_id":"ce3e4c16af0b3375539acb058d1a475d"},"payload":{"task_group":"aigc","task":"multimodal-generation","function":"generation","model":"multimodal-dialog","input":{"workspace_id":"llm-z2uhjku27i6thqe3","app_id":"mm_6ea692550a8045e498146ab50b2c","directive":"Start"},"parameters":{"upstream":{"sample_rate":16000,"type":"AudioOnly","mode":"tap2talk","audio_format":"pcm","enable_server_vad_start":false},"downstream":{"voice":"longanhuan","sample_rate":16000,"audio_format":"mp3","volume":50,"speech_rate":100,"pitch_rate":100,"intermediate_text":"transcript,dialog","transmit_rate_limit":16000},"client_info":{"user_id":"lichuang_dev_s3","device":{"uuid":"lichuang_dev_s3"}},"biz_params":{"user_prompt_params":{"name":"阿云","skill":"唱歌"},"user_defined_params":{"children_story":{"speaker_1_voice":"longxiaochun_v2"}}}}}}]
[00013352][UT][I][_send_cmd_start]send [run-task] [0-2] [0]
[00014143][UT][D][c_mmi_analyze_recv_data]recv[109][{"header":{"task_id":"ce3e4c16af0b3375539acb058d1a475d","event":"task-started","attributes":{}},"payload":{}}]
[00014148][UT][I][c_mmi_analyze_recv_data]recv [task-started] [0-2]
[00014156][UT][D][c_mmi_analyze_recv_data]recv[192][{"header":{"task_id":"ce3e4c16af0b3375539acb058d1a475d","event":"result-generated","attributes":{}},"payload":{"output":{"event":"Started","dialog_id":"026a36a7-f23c-494a-9d0a-6b10f7fe8176"}}}]
[00014177][UT][I][_on_payload_event_start]get dialog_id[026a36a7-f23c-494a-9d0a-6b10f7fe8176]
[00014186][UT][I][c_mmi_storage_set_dialog_id]dialog_id [026a36a7-f23c-494a-9d0a-6b10f7fe8176]
[00014614][UT][I][c_mmi_storage_save]ver[0x00010100][1.1.0] done
[00014615][UT][I][_on_payload_event_start]recv [Started] [0-3]
[00014616][UT][W][_mmi_event_callback]event 2
[00014624][UT][D][c_mmi_analyze_recv_data]recv[223][{"header":{"task_id":"ce3e4c16af0b3375539acb058d1a475d","event":"result-generated","attributes":{}},"payload":{"output":{"event":"DialogStateChanged","state":"Listening","dialog_id":"026a36a7-f23c-494a-9d0a-6b10f7fe8176"}}}]
[00014646][UT][W][_mmi_event_callback]event 3
[00014651][UT][D][_mmi_event_callback]speech prepare, work 0
[00014657][UT][I][c_mmi_speech_start]done [1-3] [4]
[00014662][UT][I][_on_payload_event_state_change]recv [Listening] [1-4]
[00014671][UT][I][_send_cmd_req2spk]ready to send [1-4]
[00014682][UT][D][_gen_cmd_common]out [201][{"header":{"action":"continue-task","streaming":"duplex","task_id":"ce3e4c16af0b3375539acb058d1a475d"},"payload":{"input":{"directive":"SendSpeech","dialog_id":"026a36a7-f23c-494a-9d0a-6b10f7fe8176"}}}]
[00014698][UT][I][_send_cmd_speech]send [SendSpeech] [1-5] [0]
[00014704][UT][W][_mmi_event_callback]event 4
[00014709][UT][D][_mmi_event_callback]************************ enable recorder when send speech ************************
[00014720][UT][I][c_audio_queue_register]id [2]
[00014725][UT][I][_player_register]_audio_data_info.queue_id = [2]
[00014734][UT][I][c_player_file_mp3_start]file size = [3200]
[00014756][UT][I][_decoder_task_entry]decoder switch [0/16/0]
[00014757][UT][D][_player_data_get]play progress [100.00]% (3200/3200 bytes)
[00014759][UT][I][c_decoder_update_param_mp3]format [br/sr/ch][32/16000/1] [576/144]
[00014768][UT][I][_player_init]frame [0x3c19a3fc/640]
[00014773][UT][I][_task_create_ext]stack[hal_player] info 0x3fcd7ac8 0x3c19a690
[00015071][UT][D][c_mmi_analyze_recv_data]recv[244][{"header":{"task_id":"ce3e4c16af0b3375539acb058d1a475d","event":"result-generated","attributes":{}},"payload":{"output":{"event":"SpeechStarted","dialog_id":"026a36a7-f23c-494a-9d0a-6b10f7fe8176","round_id":"9364aae897fc4e1cb0014ca1b0bde0ff"}}}]
[00015088][UT][I][_on_payload_event_speech_start]recv [SpeechStarted][ASR Start] [1-5]
[00015096][UT][W][_mmi_event_callback]event 7
[00015101][UT][I][_mmi_event_callback]************************ C_MMI_EVENT_ASR_START ************************
E (20964) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (20964) task_wdt: - IDLE0 (CPU 0)
E (20964) task_wdt: Tasks currently running:
E (20964) task_wdt: CPU 0: hal_recorder
E (20964) task_wdt: CPU 1: hal_player
E (20964) task_wdt: Print CPU 0 (current core) backtrace
Backtrace: 0x4205292E:0x3FCA0860 0x42052D44:0x3FCA0880 0x40377CA9:0x3FCA08B0 0x400559DD:0x3C14AE40 0x40380672:0x3C14AE50 0x403794EA:0x3C14AE70 0x40379511:0x3C14AE90 0x403824FE:0x3C14AEB0 0x4200EB00:0x3C14AED0 0x42011D92:0x3C14AEF0 0x40380311:0x3C14AF30
调试报文分析
鉴权或者说握手
与服务器连接握手
[00012941][UT][I][_websocket_connect]wss update
[00012942][UT][I][_websocket_connect]request[239][GET /api-ws/v1/inference HTTP/1.1
Host: dashscope.aliyuncs.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: 1B03f+kP9Sy38j4qTwgjRg==
Sec-WebSocket-Version: 13
Authorization: Bearer sk-5eee8c7fa0dc463d9632ed6b2b821850
]
[00012964][UT][I][_websocket_connect]Reading response...

大模型对话配置
{
"header": {
"action": "run-task", // 消息类型:启动任务
"streaming": "duplex", // 流式类型:全双工模式
"task_id": "ce3e4c16af0b3375539acb058d1a475d" // 任务唯一标识符
},
"payload": {
"task_group": "aigc", // 任务组名称:固定为aigc
"task": "multimodal-generation", // 任务类型:多模态生成
"function": "generation", // 调用功能:生成
"model": "multimodal-dialog", // 模型名称:多模态对话
"input": {
"workspace_id": "llm-z2uhjku27i6thqe3", // 业务空间ID
"app_id": "mm_6ea692550a8045e498146ab50b2c", // 应用ID
"directive": "Start" // 指令:开始会话
},
"parameters": {
"upstream": { // 上行参数(客户端到服务端)
"sample_rate": 16000, // 语音识别采样率:16kHz
"type": "AudioOnly", // 上行类型:仅语音
"mode": "tap2talk", // 交互模式:点击说话
"audio_format": "pcm", // 音频格式:PCM
"enable_server_vad_start": false // 服务端VAD起点检测:关闭
},
"downstream": { // 下行参数(服务端到客户端)
"voice": "longanhuan", // 语音合成音色:龙安欢
"sample_rate": 16000, // 合成语音采样率:16kHz
"audio_format": "mp3", // 音频格式:MP3
"volume": 50, // 音量:50%
"speech_rate": 100, // 语速:100%(正常)
"pitch_rate": 100, // 音调:100%(正常)
"intermediate_text": "transcript,dialog", // 返回中间文本:语音识别结果和对话结果
"transmit_rate_limit": 16000 // 下发音频速率限制:16k字节/秒
},
"client_info": { // 客户端信息
"user_id": "lichuang_dev_s3", // 终端用户ID
"device": {
"uuid": "lichuang_dev_s3" // 设备唯一标识
}
},
"biz_params": { // 业务参数
"user_prompt_params": { // 用户自定义prompt变量
"name": "阿云", // 自定义变量:姓名
"skill": "唱歌" // 自定义变量:技能
},
"user_defined_params": { // 透传给agent的参数
"children_story": {
"speaker_1_voice": "longxiaochun_v2" // 儿童故事中第一个说话者的音色
}
}
}
}
}
}
服务器告诉接收情况
上面是一个典型的任务启动确认响应消息,告知客户端任务已成功开始处理。
开始对话任务
[
{
"header": {
"task_id": "ce3e4c16af0b3375539acb058d1a475d", // 任务唯一标识符
"event": "result-generated", // 事件类型:结果生成
"attributes": {} // 事件属性:无额外属性
},
"payload": {
"output": { // 输出内容
"event": "Started", // 事件状态:已开始
"dialog_id": "026a36a7-f23c-494a-9d0a-6b10f7fe8176" // 对话ID,用于后续消息关联
}
}
}
]
根据阿里云百炼多模态交互协议,上面这个响应表示任务已经开始,并返回了一个对话ID。
聆听语音
[
{
"header": {
"task_id": "ce3e4c16af0b3375539acb058d1a475d", // 任务唯一标识符
"event": "result-generated", // 事件类型:结果生成
"attributes": {} // 事件属性:无额外属性
},
"payload": {
"output": { // 输出内容
"event": "DialogStateChanged", // 事件类型:对话状态变更
"state": "Listening", // 新的对话状态:正在聆听
"dialog_id": "026a36a7-f23c-494a-9d0a-6b10f7fe8176" // 对话ID
}
}
}
]
根据阿里云多模态交互协议,DialogStateChanged 事件表示对话状态发生了变化。当前状态变为Listening,意味着:
服务端已准备好接收用户输入
在语音交互场景中,表示可以开始接收用户的语音输入
在UI界面中可以显示相应的状态提示(如麦克风图标闪烁等)
这是一个状态同步消息,帮助客户端了解当前对话的进行状态,以便进行相应的界面更新和交互控制。
告诉服务器准备接收语音数据
[
{
"header": {
"action": "continue-task", // 指令类型:继续任务(在已建立的任务中发送后续指令)
"streaming": "duplex", // 流式类型:全双工通信
"task_id": "ce3e4c16af0b3375539acb058d1a475d" // 任务唯一标识符,需与run-task指令保持一致
},
"payload": {
"input": {
"directive": "SendSpeech", // 具体操作指令:告知服务端客户端即将开始发送语音数据流
"dialog_id": "026a36a7-f23c-494a-9d0a-6b10f7fe8176" // 对话会话的唯一标识符,由服务端在Started事件中分配,用于关联到同一对话实例
}
}
}
]
这是阿里云百炼平台多模态交互协议中的一个 continue-task消息,用于在已建立的WebSocket连接中推进任务流程。当 upstream.mode设置为 push2talk时,客户端通过发送 SendSpeech指令通知服务端准备接收音频,并应立即开始上传语音数据。
服务器检测到asr语音起点
[
{
"header": {
"task_id": "ce3e4c16af0b3375539acb058d1a475d", // 任务唯一标识符,与启动请求一致
"event": "result-generated", // 消息事件类型:结果生成
"attributes": {} // 扩展属性,此处为空对象
},
"payload": {
"output": {
"event": "SpeechStarted", // 具体事件:语音识别已开始
"dialog_id": "026a36a7-f23c-494a-9d0a-6b10f7fe8176", // 对话ID,关联到特定会话
"round_id": "9364aae897fc4e1cb0014ca1b0bde0ff" // 当前语音轮次的唯一标识符
}
}
}
]
这是服务端对客户端 SendSpeech指令的响应,表示语音识别处理已经开始。round_id用于标识本次语音输入的轮次,在后续的中间结果和最终结果中都会携带此标识,用于关联同一轮次的语音识别结果。
常见问题
在 qwenSDK 目录下执行 make esp32s3 时没有反应,大概率是环境变量 PLATFORM 异常导致的,先卸载再执行就好了。


评论