The Speech-i transcription platform exposes REST APIs to upload media files and retrieve the
resulting transcriptions.
All API calls require a Bearer Token in the Authentication header or an Api Key specified as URL
parameter. The Bearer Token and Api Key can be retrieved providing access credentials to the
/auth method.
Successful API calls responses return HTTP code 200 and JSON formatted output. HTTP status
codes section specifies the responses for unsuccessful calls.
Base URL for API calls is https://developer.speech-i.com/asr/api/v1
The method generates an authentication bearer token and an api key providing username and password. The token expires after a predetermined time interval. After expiration, a new token request is needed.
METHOD |
RESOURCE |
POST |
/auth |
HEADER |
VALUE |
Content-Type |
application/x-www-form-urlencoded |
PARAMETER |
TYPE |
DESCRIPTION |
username |
String |
Mandatory parameter provided by Speech-i |
password |
String |
Mandatory parameter provided by Speech-i |
Successful example response:
{"Authorization": "Bearer xxx.yyy.zzz", "api_key": "9LbfF21YvVyondsH6rMPqA"}
Upload a file, start sync transcription process and receive the transcription.
Max admitted audio duration is 3 minutes. Use async transcription methods for longer files
METHOD |
RESOURCE |
POST |
/transcribe |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
multipart/form-data |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
file |
File |
Mandatory media file to transcribe |
model |
String |
Mandatory language model to use |
callback_uri |
String |
Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field |
format |
String |
Optional transcription format (xml or json) |
remove_noise_words |
boolean |
Optional specifies if transcription must contain noise words |
diarize |
boolean |
Optional specifies if transcription must be diarized |
nbests |
int |
Optional specifies the max number of recognition hypotheses |
Successful transcription examples can be found here
Upload a file or url and start async the transcription process.
Upload a URL
METHOD |
RESOURCE |
POST |
/transcriptions |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
application/x-www-form-urlencoded |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
url |
String |
Mandatory URL of the media file to transcribe |
model |
String |
Mandatory language model to use |
callback_uri |
String |
Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field |
format |
String |
Optional transcription format (xml or json) |
remove_noise_words |
boolean |
Optional specifies if transcription must contain noise words |
diarize |
boolean |
Optional specifies if transcription must be diarized |
nbests |
int |
Optional specifies the max number of recognition hypotheses |
Successful example response:
{"transcription_id": 1325}
Upload a File
METHOD |
RESOURCE |
POST |
/transcriptions |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
multipart/form-data |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
file |
File |
Mandatory media file to transcribe |
model |
String |
Mandatory language model to use |
callback_uri |
String |
Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field |
format |
String |
Optional transcription format (xml or json) |
remove_noise_words |
boolean |
Optional specifies if transcription must contain noise words |
diarize |
boolean |
Optional specifies if transcription must be diarized |
nbests |
int |
Optional specifies the max number of recognition hypotheses |
Successful example response:
{"transcription_id": 1325}
The method returns the status of a particular async transcription.
METHOD |
RESOURCE |
GET |
/transcriptions/{transcription_id}/status |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
*Authorization header is not required if api_key is specified as URL parameter
Successful example response:
{"status": "queued"}
Transcription status can be one of the following:
STATUS |
DESCRIPTION |
queued |
Audio is in the transcription queue |
processing |
Audio is in transcription process |
completed |
Transcription is completed |
error |
An error occurred during transcription process |
audio-error |
An error occurred during transcription process due to audio file format |
call-back-uri-error |
An error occurred sending the callback request |
audio-uri-error |
An error occurred fetching audio from the provided url |
queued |
Audio is in the transcription queue |
Returns the results of a particular async transcription.
METHOD |
RESOURCE |
GET |
/transcriptions/{transcription_id}/results |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
*Authorization header is not required if api_key is specified as URL parameter
Successful transcription examples can be found here
Sets the default transcription properties for the user.
METHOD |
RESOURCE |
POST |
/properties |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
application/x-www-form-urlencoded |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
callback_uri |
String |
Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field |
format |
String |
Optional transcription format (xml or json) |
remove_noise_words |
boolean |
Optional specifies if transcription must contain noise words |
diarize |
boolean |
Optional specifies if transcription must be diarized |
nbests |
int |
Optional specifies the max number of recognition hypotheses |
Successful example response:
{"nbests": 2, "format": "json", "callback_uri": "http://www.example.com/callback.php", "remove_noise_words": true, "diarize": true}
Returns the transcription properties, quotas and limits for the user.
METHOD |
RESOURCE |
GET |
/properties |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
application/x-www-form-urlencoded |
*Authorization header is not required if api_key is specified as URL parameter
Successful example response:
{"nbests": 2, "format": "json", "callback_uri": "http://www.example.com/callback.php", "remove_noise_words": true, "diarize": true, "duration_limit": 1000, "duration_left": 600}
Upload a single or massive (CSV) report.
Upload a single report
METHOD |
RESOURCE |
POST |
/reports |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
application/x-www-form-urlencoded |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
model |
String |
Mandatory language model |
input |
String |
Mandatory what the speaker said (e.g. N Y U) |
stt_output |
String |
Mandatory what the system wrote (e.g. and why you) |
expected_output |
String |
Mandatory what was expected (e.g. NYU) |
examples |
String |
Optional context examples. Please provide some example containing the sentence/word |
comments |
String |
Optional comments and observations |
media_ids |
String |
Optional comma separated media ids (see reports/media) |
Successful example response:
{"report_id": 18}
Upload a CSV massive report
METHOD |
RESOURCE |
POST |
/reports |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
multipart/form-data |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
file |
File |
Mandatory CSV report file. Use this CSV template (do not remove the header row) |
model |
String |
Mandatory language model |
Medias cannot be attached to CSV massive reports. If you need to attach medias, use the single report upload.
Successful example response:
{"report_ids": "19,20,21"}
Upload one or more file to attach to a report. Call this method before POST /reports.
METHOD |
RESOURCE |
POST |
/reports/media |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
Content-Type |
multipart/form-data |
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER |
TYPE |
DESCRIPTION |
file |
String |
Mandatory file to attach |
tag |
String |
Optional tag for the file. Please use "REPORT_VOICE" for voice samples, "REPORT_MEDIA" for other media types |
Successful example response:
{
"id": 1,
"filename": "test.mp3",
"url": "",
"tag": "REPORT_VOICE",
"duration": 0,
"upload_time": "2020-06-03T18:41:05.923"
}
The method returns the details of a particular report.
METHOD |
RESOURCE |
GET |
/reports/{report_id} |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
*Authorization header is not required if api_key is specified as URL parameter
Successful example response:
{
"id": 1,
"input": "test input",
"stt_output": "test stt_output",
"expected_output": "test expected_output",
"examples": "test examples",
"comments": "test comments",
"status": "new",
"answer": null,
"upload_time": "2020-06-03T18:20:50",
"estimated_resolution_time": null,
"update_time": null
}
The method returns all the reports sent by the user.
METHOD |
RESOURCE |
GET |
/reports |
HEADER |
VALUE |
Authorization* |
Bearer xxx.yyy.zzz (retrieved in /auth) |
*Authorization header is not required if api_key is specified as URL parameter
Successful example response:
[
{
"id": 1,
"input": "test1 input",
"stt_output": "test1 stt_output",
"expected_output": "test1 expected_output",
"examples": "test1 examples",
"comments": "test1 comments",
"status": "new",
"answer": null,
"upload_time": "2020-06-03T18:20:50",
"estimated_resolution_time": null,
"update_time": null
},
{
"id": 2,
"input": "test2 input",
"stt_output": "test2 stt_output",
"expected_output": "test2 expected_output",
"examples": "test2 examples",
"comments": "test2 comments",
"status": "new",
"answer": null,
"upload_time": "2020-06-03T18:20:50",
"estimated_resolution_time": null,
"update_time": null
}
]
The following table shows supported language models (other languages are available on request):
MODEL |
DESCRIPTION |
en-GB_16k |
British English language model 16kHz |
en-US_16k |
American English language model 16kHz |
en-IE_16k |
Irish English language model 16kHz |
fr-FR_16k |
French language model 16kHz |
de-DE_16k |
German language model 16kHz |
it-IT_16k |
Italian language model 16kHz |
es-ES_16k |
Spanish language model 16kHz |
pl-PL_16k |
Polish language model 16kHz |
nl-NL_16k |
Dutch language model 16kHz |
pt-PT_16k |
Portugal Portuguese language model 16kHz |
pt-BR_16k |
Brazilian Portuguese language model 16kHz |
el-EL_16k |
Greek language model 16kHz |
ro-RO_16k |
Romanian language model 16kHz |
sl-SL_16k |
Slovenian language model 16kHz |
sk-SK_16k |
Slovak language model 16kHz |
cs-CS_16k |
Czech language model 16kHz |
lt-LT_16k |
Lithuanian language model 16kHz |
bg-BG_16k |
Bulgarian language model 16kHz |
hr-HR_16k |
Croatian language model 16kHz |
hu-HU_16k |
Hungarian language model 16kHz |
fi-FI_16k |
Finnish language model 16kHz |
sv-SV_16k |
Swedish language model 16kHz |
uk-UK_16k |
Ukrainian language model 16kHz |
ru-RU_16k |
Russian language model 16kHz |
zh-ZH_16k |
Chinese language model 16kHz |
ar-AR_16k |
Arabic language model 16kHz |
da-DA_16k |
Dansk language model 16kHz |
mt-MT_16k |
Maltese language model 16kHz |
ko-KR_16k |
Korean language model 16kHz |
fa-IR_16k |
Persian language model 16kHz |
et-EE_16k |
Estonian language model 16kHz |
lv-LV_16k |
Latvian language model 16kHz |
ga-IE_16k |
Irish language model 16kHz |
sq-AL_16k |
Albanian language model 16kHz |
The following table shows HTTP response status codes and description:
STATUS |
DESCRIPTION |
200 OK |
Standard response for successful HTTP requests |
400 Bad Request |
The server cannot process the request due to an apparent client error (ex. missing parameters) |
401 Unauthorized |
The authentication bearer is not valid or not provided |
403 Forbidden |
The request cannot be authorized (ex. quota limit exceeded) |
404 Not Found |
The requested resource is not found |
500 Internal Server Error |
Generic server error message. Please retry later or contact Speech-i support if the problem persists |
In case of status code 40x the response may contain the error specification.
Unsuccessful example response:
{"error": "invalid format"}
The transcription can be produced either in JSON or XML format.
JSON example
{
"duration": 0.99,
"media": "0.wav",
"model": "en-UK_16k",
"transcription_id": "26",
"results": [
{
"hypotheses": [
{
"transcript": "The mountain is there",
"conf": 0.887,
"word-alignment": [
{
"start": 0,
"end": 0.48,
"word": "The",
"conf": 0.496
},
{
"start": 0.48,
"end": 0.81,
"word": "mountain",
"conf": 0.794
},
{
"start": 0.81,
"end": 0.93,
"word": "is",
"conf": 0.789
},
{
"start": 0.93,
"end": 0.99,
"word": "there",
"conf": 0.871
}
]
}
],
"speaker-id": "S1",
"segment-start": 0,
"segment-length": 0.99
}
]
}
XML example
<?xml version='1.0' encoding='UTF-8'?>
<annotated-call>
<call-data>
<callid>20</callid>
<agentid channel="0">1</agentid>
<url/>
<call-back-uri/>
</call-data>
<decoder-options>
<model>en-UK_16k</model>
<code-page>UTF-8</code-page>
<nbest>1</nbest>
<out-type>xml-c85</out-type>
</decoder-options>
<annotation>
<type channel="0" id="transcription" nbest="1" reading="0">
<sentence start="0" speakerID="S1" end="3630">
<item start="0" end="480">The</item>
<item start="480" end="810">mountain</item>
<item start="810" end="930">is</item>
<item start="930" end="990">there</item>
</sentence>
</type>
</annotation>
</annotated-call>