Speech-i

The Speech-i transcription platform exposes REST APIs to upload media files and retrieve the resulting transcriptions.

All API calls require a Bearer Token in the Authentication header or an Api Key specified as URL parameter. The Bearer Token and Api Key can be retrieved providing access credentials to the /auth method.

Successful API calls responses return HTTP code 200 and JSON formatted output. HTTP status codes section specifies the responses for unsuccessful calls.

Base URL for API calls is https://developer.speech-i.com/asr/api/v1

METHOD	RESOURCE	DESCRIPTION
POST	/auth	Generates an authentication bearer token
POST	/transcribe	Upload a file, start sync transcription process and receive the transcription
POST	/transcriptions	Upload a file/url and start async transcription process
GET	/transcriptions/{transcription_id}/status	Returns the status of a particular async transcription
GET	/transcriptions/{transcription_id}/results	Returns the results of a particular async transcription
POST	/properties	Sets the default transcription properties for the user
GET	/properties	Returns the transcription properties, quotas and limits for the user
POST	/reports	Upload a single or massive report
POST	/reports/media	Upload one or more attachments to a single report
GET	/reports/{report_id}	Returns a single report
GET	/reports	Returns all the reports sent by the user

The method generates an authentication bearer token and an api key providing username and password. The token expires after a predetermined time interval. After expiration, a new token request is needed.

METHOD	RESOURCE
POST	/auth

HEADER	VALUE
Content-Type	application/x-www-form-urlencoded

PARAMETER	TYPE	DESCRIPTION
username	String	Mandatory parameter provided by Speech-i
password	String	Mandatory parameter provided by Speech-i

Successful example response:

{"Authorization": "Bearer xxx.yyy.zzz", "api_key": "9LbfF21YvVyondsH6rMPqA"}

Upload a file, start sync transcription process and receive the transcription.

Max admitted audio duration is 3 minutes. Use async transcription methods for longer files

METHOD	RESOURCE
POST	/transcribe

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	multipart/form-data

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
file	File	Mandatory media file to transcribe
model	String	Mandatory language model to use
callback_uri	String	Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as `sttOutput` field
format	String	Optional transcription format (xml or json)
remove_noise_words	boolean	Optional specifies if transcription must contain noise words
diarize	boolean	Optional specifies if transcription must be diarized
nbests	int	Optional specifies the max number of recognition hypotheses

Successful transcription examples can be found here

Upload a file or url and start async the transcription process.

Upload a URL

METHOD	RESOURCE
POST	/transcriptions

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	application/x-www-form-urlencoded

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
url	String	Mandatory URL of the media file to transcribe
model	String	Mandatory language model to use
callback_uri	String	Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as `sttOutput` field
format	String	Optional transcription format (xml or json)
remove_noise_words	boolean	Optional specifies if transcription must contain noise words
diarize	boolean	Optional specifies if transcription must be diarized
nbests	int	Optional specifies the max number of recognition hypotheses

Successful example response:

{"transcription_id": 1325}

Upload a File

METHOD	RESOURCE
POST	/transcriptions

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	multipart/form-data

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
file	File	Mandatory media file to transcribe
model	String	Mandatory language model to use
callback_uri	String	Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as `sttOutput` field
format	String	Optional transcription format (xml or json)
remove_noise_words	boolean	Optional specifies if transcription must contain noise words
diarize	boolean	Optional specifies if transcription must be diarized
nbests	int	Optional specifies the max number of recognition hypotheses

Successful example response:

{"transcription_id": 1325}

The method returns the status of a particular async transcription.

METHOD	RESOURCE
GET	/transcriptions/{transcription_id}/status

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)

*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

{"status": "queued"}

Transcription status can be one of the following:

STATUS	DESCRIPTION
queued	Audio is in the transcription queue
processing	Audio is in transcription process
completed	Transcription is completed
error	An error occurred during transcription process
audio-error	An error occurred during transcription process due to audio file format
call-back-uri-error	An error occurred sending the callback request
audio-uri-error	An error occurred fetching audio from the provided url
queued	Audio is in the transcription queue

Returns the results of a particular async transcription.

METHOD	RESOURCE
GET	/transcriptions/{transcription_id}/results

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)

*Authorization header is not required if api_key is specified as URL parameter

Successful transcription examples can be found here

Sets the default transcription properties for the user.

METHOD	RESOURCE
POST	/properties

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	application/x-www-form-urlencoded

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
callback_uri	String	Optional notification callback. If specified, a POST request will be sent to this URI including the transcript as `sttOutput` field
format	String	Optional transcription format (xml or json)
remove_noise_words	boolean	Optional specifies if transcription must contain noise words
diarize	boolean	Optional specifies if transcription must be diarized
nbests	int	Optional specifies the max number of recognition hypotheses

Successful example response:

{"nbests": 2, "format": "json", "callback_uri": "http://www.example.com/callback.php", "remove_noise_words": true, "diarize": true}

Returns the transcription properties, quotas and limits for the user.

METHOD	RESOURCE
GET	/properties

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	application/x-www-form-urlencoded

*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

{"nbests": 2, "format": "json", "callback_uri": "http://www.example.com/callback.php", "remove_noise_words": true, "diarize": true, "duration_limit": 1000, "duration_left": 600}

Upload a single or massive (CSV) report.

Upload a single report

METHOD	RESOURCE
POST	/reports

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	application/x-www-form-urlencoded

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
model	String	Mandatory language model
input	String	Mandatory what the speaker said (e.g. N Y U)
stt_output	String	Mandatory what the system wrote (e.g. and why you)
expected_output	String	Mandatory what was expected (e.g. NYU)
examples	String	Optional context examples. Please provide some example containing the sentence/word
comments	String	Optional comments and observations
media_ids	String	Optional comma separated media ids (see reports/media)

Successful example response:

{"report_id": 18}

Upload a CSV massive report

METHOD	RESOURCE
POST	/reports

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	multipart/form-data

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
file	File	Mandatory CSV report file. Use this CSV template (do not remove the header row)
model	String	Mandatory language model

Medias cannot be attached to CSV massive reports. If you need to attach medias, use the single report upload.

Successful example response:

{"report_ids": "19,20,21"}

Upload one or more file to attach to a report. Call this method before POST /reports.

METHOD	RESOURCE
POST	/reports/media

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)
Content-Type	multipart/form-data

*Authorization header is not required if api_key is specified as URL parameter

PARAMETER	TYPE	DESCRIPTION
file	String	Mandatory file to attach
tag	String	Optional tag for the file. Please use "REPORT_VOICE" for voice samples, "REPORT_MEDIA" for other media types

Successful example response:

{
    "id": 1,
    "filename": "test.mp3",
    "url": "",
    "tag": "REPORT_VOICE",
    "duration": 0,
    "upload_time": "2020-06-03T18:41:05.923"
}

The method returns the details of a particular report.

METHOD	RESOURCE
GET	/reports/{report_id}

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)

*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

{
    "id": 1,
    "input": "test input",
    "stt_output": "test stt_output",
    "expected_output": "test expected_output",
    "examples": "test examples",
    "comments": "test comments",
    "status": "new",
    "answer": null,
    "upload_time": "2020-06-03T18:20:50",
    "estimated_resolution_time": null,
    "update_time": null
}

The method returns all the reports sent by the user.

METHOD	RESOURCE
GET	/reports

HEADER	VALUE
Authorization*	Bearer xxx.yyy.zzz (retrieved in /auth)

*Authorization header is not required if api_key is specified as URL parameter

Successful example response:

[
	{
	"id": 1,
	"input": "test1 input",
	"stt_output": "test1 stt_output",
	"expected_output": "test1 expected_output",
	"examples": "test1 examples",
	"comments": "test1 comments",
	"status": "new",
	"answer": null,
	"upload_time": "2020-06-03T18:20:50",
	"estimated_resolution_time": null,
	"update_time": null
	},
	{
	"id": 2,
	"input": "test2 input",
	"stt_output": "test2 stt_output",
	"expected_output": "test2 expected_output",
	"examples": "test2 examples",
	"comments": "test2 comments",
	"status": "new",
	"answer": null,
	"upload_time": "2020-06-03T18:20:50",
	"estimated_resolution_time": null,
	"update_time": null
	}
]

The following table shows supported language models (other languages are available on request):

MODEL	DESCRIPTION
en-GB_16k	British English language model 16kHz
en-US_16k	American English language model 16kHz
en-IE_16k	Irish English language model 16kHz
fr-FR_16k	French language model 16kHz
de-DE_16k	German language model 16kHz
it-IT_16k	Italian language model 16kHz
es-ES_16k	Spanish language model 16kHz
pl-PL_16k	Polish language model 16kHz
nl-NL_16k	Dutch language model 16kHz
pt-PT_16k	Portugal Portuguese language model 16kHz
pt-BR_16k	Brazilian Portuguese language model 16kHz
el-EL_16k	Greek language model 16kHz
ro-RO_16k	Romanian language model 16kHz
sl-SL_16k	Slovenian language model 16kHz
sk-SK_16k	Slovak language model 16kHz
cs-CS_16k	Czech language model 16kHz
lt-LT_16k	Lithuanian language model 16kHz
bg-BG_16k	Bulgarian language model 16kHz
hr-HR_16k	Croatian language model 16kHz
hu-HU_16k	Hungarian language model 16kHz
fi-FI_16k	Finnish language model 16kHz
sv-SV_16k	Swedish language model 16kHz
uk-UK_16k	Ukrainian language model 16kHz
ru-RU_16k	Russian language model 16kHz
zh-ZH_16k	Chinese language model 16kHz
ar-AR_16k	Arabic language model 16kHz
da-DA_16k	Dansk language model 16kHz
mt-MT_16k	Maltese language model 16kHz
ko-KR_16k	Korean language model 16kHz
fa-IR_16k	Persian language model 16kHz
et-EE_16k	Estonian language model 16kHz
lv-LV_16k	Latvian language model 16kHz
ga-IE_16k	Irish language model 16kHz
sq-AL_16k	Albanian language model 16kHz

The following table shows HTTP response status codes and description:

STATUS	DESCRIPTION
200 OK	Standard response for successful HTTP requests
400 Bad Request	The server cannot process the request due to an apparent client error (ex. missing parameters)
401 Unauthorized	The authentication bearer is not valid or not provided
403 Forbidden	The request cannot be authorized (ex. quota limit exceeded)
404 Not Found	The requested resource is not found
500 Internal Server Error	Generic server error message. Please retry later or contact Speech-i support if the problem persists

In case of status code 40x the response may contain the error specification.

Unsuccessful example response:

{"error": "invalid format"}

The transcription can be produced either in JSON or XML format.

JSON example


{

  "duration": 0.99,

  "media": "0.wav",

  "model": "en-UK_16k",

  "transcription_id": "26",

  "results": [

    {

      "hypotheses": [

        {

          "transcript": "The mountain is there",

          "conf": 0.887,

          "word-alignment": [

            {

              "start": 0,

              "end": 0.48,

              "word": "The",

              "conf": 0.496

            },

            {

              "start": 0.48,

              "end": 0.81,

              "word": "mountain",

              "conf": 0.794

            },

            {

              "start": 0.81,

              "end": 0.93,

              "word": "is",

              "conf": 0.789

            },

            {

              "start": 0.93,

              "end": 0.99,

              "word": "there",

              "conf": 0.871

            }

          ]

        }

      ],

      "speaker-id": "S1",

      "segment-start": 0,

      "segment-length": 0.99

    }

  ]

}

XML example


<?xml  version='1.0'  encoding='UTF-8'?>

<annotated-call>

    <call-data>

        <callid>20</callid>

        <agentid  channel="0">1</agentid>

        <url/>

        <call-back-uri/>

    </call-data>

    <decoder-options>

        <model>en-UK_16k</model>

        <code-page>UTF-8</code-page>

        <nbest>1</nbest>

        <out-type>xml-c85</out-type>

    </decoder-options>

    <annotation>

        <type  channel="0"  id="transcription"  nbest="1"  reading="0">

            <sentence  start="0"  speakerID="S1"  end="3630">

                <item  start="0"  end="480">The</item>

                <item  start="480"  end="810">mountain</item>

                <item  start="810"  end="930">is</item>

                <item  start="930"  end="990">there</item>

            </sentence>

        </type>

    </annotation>

</annotated-call>

Offline Api

POST /auth

POST /transcribe

POST /transcriptions

Upload a URL

Upload a File

GET status

GET /results

POST properties

GET properties

POST /reports

Upload a single report

Upload a CSV massive report

POST /reports/media

GET report

GET reports

Language models

HTTP status codes

Transcription formats

JSON example

XML example