Audio Transcription API

The Audio Transcription service allows for the transcription of audio files and offers optional translation into multiple languages.

Overview

The transcription API accepts audio files and returns accurate text transcriptions, with support for different languages.

This endpoint is OpenAI compatible

Transcription Endpoint

POST /v1/audio/transcriptions

This endpoint transcribes audio files into text format, with an option for translation into different languages.

Request Parameters

Parameter
Type
Required
Description

file

file

Yes

The audio file to transcribe (mp3, mp4, wav, m4a, webm)

model

string

No

Model to use (default: "KrosMLingualSTT1.0.0")

language

string

No

Optional language specification

prompt

string

No

Text to guide the transcription

response_format

string

No

Output format (default: "json")

temperature

float

No

Model temperature (default: 0.0)

Request Body

  • file (file, required): The audio file to be transcribed. Various audio formats are supported.

Response

  • choices (array):

    • text (string): The resulting transcribed and optionally translated text.

    • index (integer): The array index of the transcription choice.

    • finish_reason (string): Explanation of why the transcription process concluded.

  • model (string): Identifies the transcription/translation model used.

  • object (string): Specifies the type of response object.

Example Request

Example Response

Best Practices

Transcription

  • Use high-quality audio for better results

  • Specify the language when known for improved accuracy

  • Keep background noise to a minimum

Error Handling

All APIs return standard HTTP status codes:

  • 200: Success

  • 400: Bad request (check parameters)

  • 401: Unauthorized (check API key)

  • 429: Rate limit exceeded

  • 500: Server error

Error responses include a detail field with more information about the error.

Last updated