Speech-to-Text API
High-accuracy transcription API with speaker diarization and affordable prices. Designed for call centers and voice analytics.
Perfect Diarization with Multichannel Calls
Record each speaker on a separate audio channel and get flawless transcription with zero speaker-identification errors.
How It Works
Modern PBX systems record calls in 2 channels — agent on one, customer on the other.
Our model processes each channel separately, following every utterance in sequence.
It predicts the next response using previous dialogue context — dramatically reducing errors.
We can also help fine-tune transcription for your specific domain and terminology.
Speaker Diarization
Two options available:
-
stereoLeft/right channel separation. Use when your PBX records agent and customer on separate channels. -
aiNeural network-based detection. For single-channel recordings.
Response Example
[
{
"start": 0.00,
"end": 2.34,
"speaker": 0,
"text": "Good afternoon!"
},
{
"start": 2.80,
"end": 5.12,
"speaker": 1,
"text": "Hi, how are you?"
},
]
Try it
Register, upload a file via web interface, check the result. No code required.
RegisterFor any questions please contact us.