Offline Api

The Speech-i transcription platform exposes REST APIs to upload media files and retrieve the resulting transcriptions.

All API calls require a Bearer Token in the Authentication header or an Api Key specified as URL parameter. The Bearer Token and Api Key can be retrieved providing access credentials to the /auth method.

Successful API calls responses return HTTP code 200 and JSON formatted output. HTTP status codes section specifies the responses for unsuccessful calls.

Base URL for API calls is https://developer.speech-i.com/asr/api/v1

METHOD RESOURCE DESCRIPTION
POST /auth Generates an authentication bearer token
POST /transcribe Upload a file, start sync transcription process and receive the transcription
POST /transcriptions Upload a file/url and start async transcription process
GET /transcriptions/{transcription_id}/status Returns the status of a particular async transcription
GET /transcriptions/{transcription_id}/results Returns the results of a particular async transcription
POST /properties Sets the default transcription properties for the user
GET /properties Returns the transcription properties, quotas and limits for the user
POST /reports Upload a single or massive report
POST /reports/media Upload one or more attachments to a single report
GET /reports/{report_id} Returns a single report
GET /reports Returns all the reports sent by the user

 POST /auth

The method generates an authentication bearer token and an api key providing username and password. The token expires after a predetermined time interval. After expiration, a new token request is needed.

METHOD RESOURCE
POST /auth
HEADER VALUE
Content-Type application/x-www-form-urlencoded
PARAMETER TYPE DESCRIPTION
username String Mandatory parameter provided by Speech-i
password String Mandatory parameter provided by Speech-i

Successful example response:

{"Authorization": "Bearer xxx.yyy.zzz", "api_key": "9LbfF21YvVyondsH6rMPqA"}

 POST /transcribe

Upload a file, start sync transcription process and receive the transcription.

Max admitted audio duration is 3 minutes. Use async transcription methods for longer files
METHOD RESOURCE
POST /transcribe
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type multipart/form-data
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
file File Mandatory media file to transcribe
model String Mandatory language model to use
callback_uri String Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field
format String Optional transcription format (xml or json)
remove_noise_words boolean Optional specifies if transcription must contain noise words
diarize boolean Optional specifies if transcription must be diarized
nbests int Optional specifies the max number of recognition hypotheses

Successful transcription examples can be found here

 POST /transcriptions

Upload a file or url and start async the transcription process.

Upload a URL

METHOD RESOURCE
POST /transcriptions
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type application/x-www-form-urlencoded
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
url String Mandatory URL of the media file to transcribe
model String Mandatory language model to use
callback_uri String Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field
format String Optional transcription format (xml or json)
remove_noise_words boolean Optional specifies if transcription must contain noise words
diarize boolean Optional specifies if transcription must be diarized
nbests int Optional specifies the max number of recognition hypotheses

Successful example response:

{"transcription_id": 1325}

Upload a File

METHOD RESOURCE
POST /transcriptions
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type multipart/form-data
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
file File Mandatory media file to transcribe
model String Mandatory language model to use
callback_uri String Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field
format String Optional transcription format (xml or json)
remove_noise_words boolean Optional specifies if transcription must contain noise words
diarize boolean Optional specifies if transcription must be diarized
nbests int Optional specifies the max number of recognition hypotheses

Successful example response:

{"transcription_id": 1325}

 GET status

The method returns the status of a particular async transcription.

METHOD RESOURCE
GET /transcriptions/{transcription_id}/status
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

{"status": "queued"}

Transcription status can be one of the following:

STATUS DESCRIPTION
queued Audio is in the transcription queue
processing Audio is in transcription process
completed Transcription is completed
error An error occurred during transcription process
audio-error An error occurred during transcription process due to audio file format
call-back-uri-error An error occurred sending the callback request
audio-uri-error An error occurred fetching audio from the provided url
queued Audio is in the transcription queue

 GET /results

Returns the results of a particular async transcription.

METHOD RESOURCE
GET /transcriptions/{transcription_id}/results
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
*Authorization header is not required if api_key is specified as URL parameter

Successful transcription examples can be found here

 POST properties

Sets the default transcription properties for the user.

METHOD RESOURCE
POST /properties
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type application/x-www-form-urlencoded
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
callback_uri String Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as sttOutput field
format String Optional transcription format (xml or json)
remove_noise_words boolean Optional specifies if transcription must contain noise words
diarize boolean Optional specifies if transcription must be diarized
nbests int Optional specifies the max number of recognition hypotheses

Successful example response:

{"nbests": 2, "format": "json", "callback_uri": "http://www.example.com/callback.php", "remove_noise_words": true, "diarize": true}

 GET properties

Returns the transcription properties, quotas and limits for the user.

METHOD RESOURCE
GET /properties
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type application/x-www-form-urlencoded
*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

{"nbests": 2, "format": "json", "callback_uri": "http://www.example.com/callback.php", "remove_noise_words": true, "diarize": true, "duration_limit": 1000, "duration_left": 600}

 POST /reports

Upload a single or massive (CSV) report.

Upload a single report

METHOD RESOURCE
POST /reports
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type application/x-www-form-urlencoded
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
model String Mandatory language model
input String Mandatory what the speaker said (e.g. N Y U)
stt_output String Mandatory what the system wrote (e.g. and why you)
expected_output String Mandatory what was expected (e.g. NYU)
examples String Optional context examples. Please provide some example containing the sentence/word
comments String Optional comments and observations
media_ids String Optional comma separated media ids (see reports/media)

Successful example response:

{"report_id": 18}

Upload a CSV massive report

METHOD RESOURCE
POST /reports
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type multipart/form-data
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
file File Mandatory CSV report file. Use this CSV template (do not remove the header row)
model String Mandatory language model
Medias cannot be attached to CSV massive reports. If you need to attach medias, use the single report upload.

Successful example response:

{"report_ids": "19,20,21"}

 POST /reports/media

Upload one or more file to attach to a report. Call this method before POST /reports.

METHOD RESOURCE
POST /reports/media
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type multipart/form-data
*Authorization header is not required if api_key is specified as URL parameter
PARAMETER TYPE DESCRIPTION
file String Mandatory file to attach
tag String Optional tag for the file. Please use "REPORT_VOICE" for voice samples, "REPORT_MEDIA" for other media types

Successful example response:

{ "id": 1, "filename": "test.mp3", "url": "", "tag": "REPORT_VOICE", "duration": 0, "upload_time": "2020-06-03T18:41:05.923" }

 GET report

The method returns the details of a particular report.

METHOD RESOURCE
GET /reports/{report_id}
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

{
    "id": 1,
    "input": "test input",
    "stt_output": "test stt_output",
    "expected_output": "test expected_output",
    "examples": "test examples",
    "comments": "test comments",
    "status": "new",
    "answer": null,
    "upload_time": "2020-06-03T18:20:50",
    "estimated_resolution_time": null,
    "update_time": null
}

 GET reports

The method returns all the reports sent by the user.

METHOD RESOURCE
GET /reports
HEADER VALUE
Authorization* Bearer xxx.yyy.zzz (retrieved in /auth)
*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

[
	{
	"id": 1,
	"input": "test1 input",
	"stt_output": "test1 stt_output",
	"expected_output": "test1 expected_output",
	"examples": "test1 examples",
	"comments": "test1 comments",
	"status": "new",
	"answer": null,
	"upload_time": "2020-06-03T18:20:50",
	"estimated_resolution_time": null,
	"update_time": null
	},
	{
	"id": 2,
	"input": "test2 input",
	"stt_output": "test2 stt_output",
	"expected_output": "test2 expected_output",
	"examples": "test2 examples",
	"comments": "test2 comments",
	"status": "new",
	"answer": null,
	"upload_time": "2020-06-03T18:20:50",
	"estimated_resolution_time": null,
	"update_time": null
	}
]

 Language models

The following table shows supported language models (other languages are available on request):

MODEL DESCRIPTION
en-GB_16k British English language model 16kHz
en-US_16k American English language model 16kHz
en-IE_16k Irish English language model 16kHz
fr-FR_16k French language model 16kHz
de-DE_16k German language model 16kHz
it-IT_16k Italian language model 16kHz
es-ES_16k Spanish language model 16kHz
pl-PL_16k Polish language model 16kHz
nl-NL_16k Dutch language model 16kHz
pt-PT_16k Portugal Portuguese language model 16kHz
pt-BR_16k Brazilian Portuguese language model 16kHz
el-EL_16k Greek language model 16kHz
ro-RO_16k Romanian language model 16kHz
sl-SL_16k Slovenian language model 16kHz
sk-SK_16k Slovak language model 16kHz
cs-CS_16k Czech language model 16kHz
lt-LT_16k Lithuanian language model 16kHz
bg-BG_16k Bulgarian language model 16kHz
hr-HR_16k Croatian language model 16kHz
hu-HU_16k Hungarian language model 16kHz
fi-FI_16k Finnish language model 16kHz
sv-SV_16k Swedish language model 16kHz
uk-UK_16k Ukrainian language model 16kHz
ru-RU_16k Russian language model 16kHz
zh-ZH_16k Chinese language model 16kHz
ar-AR_16k Arabic language model 16kHz
da-DA_16k Dansk language model 16kHz
mt-MT_16k Maltese language model 16kHz
ko-KR_16k Korean language model 16kHz
fa-IR_16k Persian language model 16kHz
et-EE_16k Estonian language model 16kHz
lv-LV_16k Latvian language model 16kHz
ga-IE_16k Irish language model 16kHz
sq-AL_16k Albanian language model 16kHz

 HTTP status codes

The following table shows HTTP response status codes and description:

STATUS DESCRIPTION
200 OK Standard response for successful HTTP requests
400 Bad Request The server cannot process the request due to an apparent client error (ex. missing parameters)
401 Unauthorized The authentication bearer is not valid or not provided
403 Forbidden The request cannot be authorized (ex. quota limit exceeded)
404 Not Found The requested resource is not found
500 Internal Server Error Generic server error message. Please retry later or contact Speech-i support if the problem persists
In case of status code 40x the response may contain the error specification.

Unsuccessful example response:

{"error": "invalid format"}

 Transcription formats

The transcription can be produced either in JSON or XML format.

JSON example

{
  "duration": 0.99,
  "media": "0.wav",
  "model": "en-UK_16k",
  "transcription_id": "26",
  "results": [
    {
      "hypotheses": [
        {
          "transcript": "The mountain is there",
          "conf": 0.887,
          "word-alignment": [
            {
              "start": 0,
              "end": 0.48,
              "word": "The",
              "conf": 0.496
            },
            {
              "start": 0.48,
              "end": 0.81,
              "word": "mountain",
              "conf": 0.794
            },
            {
              "start": 0.81,
              "end": 0.93,
              "word": "is",
              "conf": 0.789
            },
            {
              "start": 0.93,
              "end": 0.99,
              "word": "there",
              "conf": 0.871
            }
          ]
        }
      ],
      "speaker-id": "S1",
      "segment-start": 0,
      "segment-length": 0.99
    }
  ]
}

XML example

<?xml  version='1.0'  encoding='UTF-8'?>
<annotated-call>
    <call-data>
        <callid>20</callid>
        <agentid  channel="0">1</agentid>
        <url/>
        <call-back-uri/>
    </call-data>
    <decoder-options>
        <model>en-UK_16k</model>
        <code-page>UTF-8</code-page>
        <nbest>1</nbest>
        <out-type>xml-c85</out-type>
    </decoder-options>
    <annotation>
        <type  channel="0"  id="transcription"  nbest="1"  reading="0">
            <sentence  start="0"  speakerID="S1"  end="3630">
                <item  start="0"  end="480">The</item>
                <item  start="480"  end="810">mountain</item>
                <item  start="810"  end="930">is</item>
                <item  start="930"  end="990">there</item>
            </sentence>
        </type>
    </annotation>
</annotated-call>