Streaming STT

This guide explains how to implement streaming speech-to-text. Two modes are supported: 1) gRPC and 2) WebSocket. See Streaming STT - gRPC and Streaming STT - WebSocket. For file-based conversion, see Batch STT.

caution

Streaming STT is limited by concurrent channels. See Rate limit.

Supported encodings

LINEAR16, FLAC, MULAW, ALAW, AMR, AMR_WB, OGG_OPUS, OPUS.

Name	Type (gRPC / WebSocket)	Description	Required	Default
sample_rate	int	8000 ~ 48000 Hz	Yes	-
encoding	AudioEncoding / string	See supported encodings	Yes	-
model_name	string	sommers_ko (Korean), sommers_ja (Japanese), whisper (multilingual)	No	sommers_ko
domain	string	See Domain	No	CALL
use_itn	bool	See ITN	No	true
use_disfluency_filter	bool	See Disfluency	No	false
use_profanity_filter	bool	See Profanity	No	false
use_punctuation	bool	Use punctuation	No	false
keywords	string[] / string	See Keyword boosting	No	-
language	string	Required for whisper; see supported list	No	ko

Boost or suppress recognition for specific words.

caution

This feature is only available for sommers_ko model.

Each keyword can be:

caution

// gRPC
["부스팅", "리턴제로:3.5", "에스티티:-1"]

// WebSocket
"부스팅,리턴제로:3.5,에스티티:-1"

Choose based on your input environment.