Android SDK

The Android SDK provides APIs to easily integrate Speech Recognition features into Android apps. The API automatically handles websocket access to speech server, audio capture, encoding, trasmission and transcription retrieval in real-time.

It gives you three standard ways to interact in app with the ASR engine:

The SDK requires Android 4.0.3 (API level 15) or higher.

 Get Started

  1. Download the latest SDK version here.

  2. In Android Studio, choose File → New → New Module → Import JAR/AAR Package. Click Next and select the downloaded file CedatSTT.aar.


  3. Open your application module Gradle file build.gradle and add the following line to the dependencies block:
  4. dependencies {
      implementation project(':CedatSTT')
      ...
    }
  5. Check the library presence in the settings.gradle file:
  6. include ':app',':CedatSTT'

 Recognition Intent

Call the method startActivityForResult() defining an explicit intent with class RecognitionActivity.class and set the extra parameter Cedat85Recognizer.EXTRA_API_KEY.

In the callback onActivityResult() you can handle the transcriptions list in RecognizerIntent.EXTRA_RESULTS and the confidence values array in RecognizerIntent.EXTRA_CONFIDENCE_SCORES.

The following code snippet shows how to use the library for online recognition via intent:

private static final int SPEECH_REQUEST_CODE = 0;

private void startVoiceRecognitionActivity() {
  Intent intent = new Intent(this, RecognitionActivity.class);
  intent.putExtra(Cedat85Recognizer.EXTRA_API_KEY, "YOUR_API_KEY");
  startActivityForResult(intent, SPEECH_REQUEST_CODE);
}

@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
  if (requestCode == SPEECH_REQUEST_CODE && resultCode == RESULT_OK) {
    ArrayList<String> result = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
    float[] confidence = data.getFloatArrayExtra(RecognizerIntent.EXTRA_CONFIDENCE_SCORES);
    // log transcription result and confidence value
    Log.i("STT", result.get(0) + ":" + confidence[0]);
}
Recognition intent requires RECORD_AUDIO and INTERNET permissions.

 Recognition service

Use the SpeechRecognizer class to access to the speech recognition service. This class's methods must be invoked only from the main application thread.

Call the static factory method createSpeechRecognizer() passing the component named OnlineRecognitionService.class, set your RecognitionListener implementation and start the recognition service using startListening(), providing extra parameters in the recognizer Intent. The extra parameter Cedat85Recognizer.EXTRA_API_KEY is mandatory.

Overriding the callbacks onPartialResults(), onResults() you can handle the transcriptions list in SpeechRecognizer.RESULTS_RECOGNITION and the confidence values array in SpeechRecognizer.CONFIDENCE_SCORES.

The speech recognition process automatically stops itself on endpoint. To prematurely stop or cancel the recognition process, use the SpeechRecognizer methods stopListening() or cancel(). Finally, to release speech recognition resources and avoid memory leaks, call destroy() in onResults() method of the RecognitionListener.

The following code snippet shows how to use the library for online recognition via service:

private SpeechRecognizer speechRecognizer;
private RecognitionListener recognitionListener;

private void startOnlineVoiceRecognitionService() {
  Intent intent = new Intent();
  intent.putExtra(Cedat85Recognizer.EXTRA_API_KEY, "YOUR_API_KEY");
  recognitionListener = new RecognitionListener() {
    ...
    @Override
    public void onResults(Bundle data) {
        ArrayList result = data.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
        float[] confidence = data.getFloatArray(SpeechRecognizer.CONFIDENCE_SCORES);
        Log.i(tag, "result: " + result.get(0) + ":" + confidence[0]);
        speechRecognizer.destroy();
       }

    @Override
    public void onPartialResults(Bundle data) {
       ArrayList result = data.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
   Log.i(tag, "partial result: " + result.get(0));
       }
  ...
  };
  speechRecognizer.setRecognitionListener(recognitionListener);
  speechRecognizer.startListening(intent);
}

@Override
public void onCreate(Bundle savedInstanceState) {
  super.onCreate(savedInstanceState);
  speechRecognizer = SpeechRecognizer.createSpeechRecognizer(
  	this, new ComponentName(getApplicationContext(), OnlineRecognitionService.class));
  startOnlineVoiceRecognitionService();
}

@Override
public void onPause() {
    super.onPause();
    if (speechRecognizer != null) {
        speechRecognizer.cancel();
        speechRecognizer.destroy();
   }
}
Recognition service requires RECORD_AUDIO and INTERNET permissions.

 Embedded Recognition Service

Speech-i STT library includes API for offline continuous keyphrase recognition, usually used to trig online voice recognition.

Use the SpeechRecognizer class to access to the speech recognition service. This class's methods must be invoked only from the main application thread.

Call the static factory method createSpeechRecognizer() passing the component named OfflineRecognitionService.class, set your RecognitionListener implementation and start the recognition service using startListening(), providing extra parameters in the recognizerIntent. The extra parameter Cedat85Recognizer.EXTRA_API_KEY is mandatory. You can customize the keyphrase using Cedat85Recognizer.EXTRA_KEY_PHRASE (only English words supported) and the recognition sensitivity with Cedat85Recognizer.EXTRA_THRESHOLD. For better recognition is suggested to use 2-3 words for the key phrase.

The callback onResults() will be invoked on match. Partial results will not be provided.

The speech recognition process automatically stops itself on success. To prematurely stop or cancel the recognition process, use the SpeechRecognizer methods stopListening() or cancel(). Finally, to release speech recognition resources and avoid memory leaks, use destroy() method.

The following code snippet shows how to use the library for offline recognition via service:

private SpeechRecognizer offlineSpeechRecognizer;
private RecognitionListener recognitionListener;
private void startOfflineVoiceRecognitionService() {
  Intent intent = new Intent();
  intent.putExtra(Cedat85Recognizer.EXTRA_API_KEY, "YOUR_API_KEY");
  recognitionListener = new RecognitionListener() {
    ...
    @Override
    public void onResults(Bundle data) {
          ArrayList result = data.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
          float[] confidence = data.getFloatArray(SpeechRecognizer.CONFIDENCE_SCORES);
          Log.i(tag, "result: " + result.get(0) + ":" + confidence[0]);
          offlineSpeechRecognizer.destroy();
         }
  ...
  };
  offlineSpeechRecognizer.setRecognitionListener(recognitionListener);
  offlineSpeechRecognizer.startListening(intent);
}

@Override
public void onCreate(Bundle savedInstanceState) {
     super.onCreate(savedInstanceState);
  offlineSpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(this,
  new ComponentName(getApplicationContext(), OfflineRecognitionService.class));
  startOfflineVoiceRecognitionService();
}
@Override
public void onPause() {
  super.onPause();
    if (offlineSpeechRecognizer != null) {
     offlineSpeechRecognizer.cancel();
     offlineSpeechRecognizer.destroy();
  }
}
Embedded Recognition service requires RECORD_AUDIO and WRITE_EXTERNAL_STORAGE permissions.

 Configuration

The library can be easily customized specifying the extra parameters defined in the Cedat85Recognizer class using the Intent class method putExtra().

The only mandatory parameter is Cedat85Recognizer.EXTRA_API_KEY, which must be filled with the api key provided by Speech-i.

Other parameters are optional and provide a default value if not specified.

The following table shows the customizable parameter from class Cedat85Recognizer.

PARAMETER TYPE DEFAULT DESCRIPTION
EXTRA_API_KEY String Mandatory parameter Mandatory api key provided by Speech-i
EXTRA_PROMPT String Device language message “Speak now” Prompt for the user
EXTRA_MAX_RESULTS int 1 Max number of results
EXTRA_MIN_SILENCE float 2.0f Silence seconds to terminate the speech recognition
EXTRA_DECODER_URL String DEFAULT_DECODER_URL Server URL. To enable SSL connection, use DEFAULT_DECODER_URL_HTTPS
EXTRA_LANGUAGE_MODEL String Device language or LANGUAGE_MODEL_en_US_8k if it is not supported Language model. Demo 8kHz values:
LANGUAGE_MODEL_it_IT_8k
LANGUAGE_MODEL_en_GB_8k
LANGUAGE_MODEL_en_US_8k
Demo 16kHz values:
LANGUAGE_MODEL_it_IT_16k
LANGUAGE_MODEL_en_GB_16k
LANGUAGE_MODEL_en_US_16k
LANGUAGE_MODEL_es_ES_16k
LANGUAGE_MODEL_pt_BR_16k
LANGUAGE_MODEL_pt_PT_16k
EXTRA_KEY_PHRASE String “OK Cedat” Keyphrase used in Embedded speech recognition
EXTRA_THRESHOLD float THRESHOLD_SENSITIVITY_HIGH = 1.0e-35f Sensitivity threshold used in Embedded speech recognition.