Speech-to-Text API

High-accuracy transcription API with speaker diarization and affordable prices. Designed for call centers and voice analytics.

Perfect Diarization with Multichannel Calls

Record each speaker on a separate audio channel and get flawless transcription with zero speaker-identification errors.

100% speaker diarization accuracy
Speaker 1 in channel 1, Speaker 2 in channel 2 — no guesswork
State-of-the-art transcription quality
The model tracks the dialogue chain in each channel and predicts responses from context
Domain-tuned recognition
Pass context and keywords so the model knows your terminology
Unbeatable prices
Find out how much it costs for your volume

How It Works

1

Modern PBX systems record calls in 2 channels — agent on one, customer on the other.

2

Our model processes each channel separately, following every utterance in sequence.

3

It predicts the next response using previous dialogue context — dramatically reducing errors.


We can also help fine-tune transcription for your specific domain and terminology.

How multichannel transcription works — two audio channels become a perfect transcript

Speaker Diarization

Two options available:

  • stereo Left/right channel separation. Use when your PBX records agent and customer on separate channels.
  • ai Neural network-based detection. For single-channel recordings.

Response Example

[
    {
      "start": 0.00,
      "end": 2.34,
      "speaker": 0,
      "text": "Good afternoon!"
    },
    {
      "start": 2.80,
      "end": 5.12,
      "speaker": 1,
      "text": "Hi, how are you?"
    },
]

Try it

Register, upload a file via web interface, check the result. No code required.

Register

For any questions please contact us.