Speech to text possible?

I have an idea for a voice control app but can't find anything in the API's.

Any plans to implement it? Or any available workaround? (Besides Objective-C)

Thanks!

7 Answers

I would also like to know. Or maybe a payed module?

— answered June 21st 2010 by Richard Venneman
permalink

0 Comments

0 Votes
Yup! Still an issue…

Any updates? Payed module also of interest.

Thanks!

/J

— answered August 17th 2010 by Joacim Boive
permalink

0 Comments

0 Votes
Ok, this is where I'm at currently.

Found a couple of online services that might help:

http://create.spinvox.com/index.php?pageid=0
http://www.voicecloud.com/api.php

For my purposes Spinvox seems to fit the bill (also used by FaceBook).

However… The wav file MUST be A-Law mono 8 kHz WAV file or a G.711 ?-Law 8 kHz mono WAV file BASE64 encoded to the request. Currently Titanium is unable to encode files (as far as I can tell), but it does have support for the A-law codec: Titanium.Media.AUDIO_FORMAT_ALAW

I have been unable to find out if this is in stereo or mono (as ALAW allows both), but since it's intended use is for low bandwith devices it s-h-o-u-l-d be in mono. I've started a thread with this question here:
http://developer.appcelerator.com/question/53501/audioformatalaw–mono-sound

In my current app I only need very short voice recordings (a couple of letters) so maybe an online Base64 encoder can to the trick:

http://www.motobit.com/util/base64-decoder-encoder.asp
http://www.opinionatedgeek.com/dotnet/tools/base64encode/

Should be able to create a plugin to Titanium using the iPhone SDK:
http://stackoverflow.com/questions/392464/any-base64-library-on-iphone-sdk

Tutorial:
http://iphone.zcentric.com/2008/08/29/post-a-uiimage-to-the-web/

Hopefully it will help someone. Please post your results!

/J

— answered August 17th 2010 by Joacim Boive
permalink

0 Comments

0 Votes
I was think if anyone knows a text to speech engine for titanium?

— answered August 18th 2010 by Peter Lum
permalink

0 Comments

0 Votes
Any updates? I would be interested to pay for a good module..
— answered January 8th 2011 by Peter Lum
permalink

1 Comment
- Nothing yet…
  
  — commented January 10th 2011 by Joacim Boive
0 Votes

I got this to work using Lumenvox as a service- I purchased the LumenVox Lite Subscription License for Asterisk. (actually two subscriptions since you'll want to use the speech tuner $16/month)

The code I created interfaces with the simpleclient example supplied with the kit. [the asterisk interface is a bonus ;-)]

My appcelerator app sends the wav file to my linux host to process into words (speech to text) then process as it a grammar (text to action), giving me an action to trigger..

It works well on wifi, with the server connected to that network directly.

However, at 48k the wav files are too large to transmit over at&t's 3g and get the response in a reasonable amount of time.. the app works on 3g, but is too slow to use. I'd like to adjust the recording bit-rate to 8k (which is a rate Lumenvox is actually built to work with), but haven't found how in appcelerator— any ideas?

I suspect that latency becomes part of the usability formula next..

Here's some code to get you started…

–snip

var recorder=Ti.Media.createAudioRecorder({});
recorder.compression = Ti.Media.AUDIO_FORMAT_ULAW;
recorder.format = Ti.Media.AUDIO_FILEFORMAT_WAVE;

var start = Ti.UI.createButton({title: "Record", top: 100, height:40, width:180});
win1.add(start);
start.addEventListener('touchstart', function(e) {
Ti.Media.audioSessionMode = Titanium.Media.AUDIO_SESSION_MODE_RECORD;
  recorder.start();
  start.title = "Recording...";
  win1.backgroundColor='red';
});
start.addEventListener('touchend', function(e) {
  file = recorder.stop();
Ti.Media.audioSessionMode = Titanium.Media.AUDIO_SESSION_MODE_PLAYBACK;  
  start.title = "Recorded: " + file.size;
  win1.backgroundColor='#FFFFFF';
  Ti.API.info("Recorded: " + file.size);
   xhr.onerror = function(e)
  {
    Ti.UI.createAlertDialog({title:'Error', message:e.error}).show();
    Ti.API.info('IN ERROR ' + e.error);
    win1.backgroundColor='#FFFFFF';
  };

– snip

 Ti.API.info('sending http://'+server+'/uploadfile.php');
  xhr.open('POST','http://'+.server+'/uploadfile.php'');
  xhr.setRequestHeader("Content-Type", "audio/x-wav");
 var data_to_send = { 
             "file": file.blob
        };
  // send the data
  xhr.send(data_to_send);

– snip

// php code to return a json file with up to 4 grammar rule outputs
$xv2=path_file; // path+file (no .ext)

ini_set('max_upload_filesize', 8388608);
  if ($_FILES["file"]["error"] > 0)
    {
    echo "Return Code: " . $_FILES["file"]["error"] . "<br />";
    }
  else
    {
    echo "Upload: " . $_FILES["file"]["name"] . "<br />";
    echo "Type: " . $_FILES["file"]["type"] . "<br />";
    echo "Size: " . ($_FILES["file"]["size"] / 1024) . " Kb<br />";
    echo "Temp file: " . $_FILES["file"]["tmp_name"] . "<br />";
    if (file_exists("upload/" . $_FILES["file"]["name"]))
      {
      echo $_FILES["file"]["name"] . " already exists. ";
      }
    else
      {
       $xv3=  (int)gmdate('U');
       move_uploaded_file($_FILES["file"]["tmp_name"],
      $xv3."_".$xv2.".wav");
      echo "Stored in: " . "upload/" . $_FILES["file"]["name"];
      }
    }
}
$command1 = "examples/SimpleClient rules rules.gram 1 $xv3."_".$xv2.".wav 127.0.0.1";
$output = exec('echo "'.$command1.'" > command.sh');
// echo '#1:'.$output;
$output = exec("sh command.sh >".$xv2.".txt"); // *** this file contains the speech text ****
$output = exec("grep 'Interpretation String:' ".$xv2.".txt > ".$xv2."0.txt");
$output = exec("awk '{ if (NR==1) print $0 }' ".$xv2."0.txt >".$xv2."1.txt");
$output = exec("awk '{ if (NR==2) print $0 }' ".$xv2."0.txt >".$xv2."2.txt");
$output = exec("awk '{ if (NR==3) print $0 }' ".$xv2."0.txt >>".$xv2."2.txt");
$output = exec("awk '{ if (NR==4) print $0 }' ".$xv2."0.txt >>".$xv2."2.txt");
$output = exec("sed -s 's/    Interpretation String: /{\"gram\":[{\"reply\":\"/g' ".$xv2."1.txt >".$xv2."3.txt");
$output = exec("sed -s 's/    Interpretation String: /\"},{\"reply\":\"/g' ".$xv2."2.txt >/".$xv2."4.txt");
$output = exec("tr -d '\n' < ".$xv2."3.txt >".$xv2."5.txt");
$output = exec("tr -d '\n' < ".$xv2."4.txt >".$xv2."6.txt");
$output = exec("echo '\"}]}' >>  ".$xv2."6.txt");
$output = exec("cat ".$xv2."6.txt >> ".$xv2."5.txt");

?>

– snip

the php returns a json formated list of the 4 top actions.. however, if you look at
the original file.txt produced, it contains the words spoken.. so take it from there.

This is the the file.txt when I said "Go inside", see the 4 possible things my grammar logic
thinks I could be trying to tell the computer to do
:

access
Loaded Grammar!
Loaded Audio!
Decode returned!
Alternative 1:
  Interpretation 1 of 1:
    Grammar Label        : rules
    Input Sentence       : go inside
    Interpretation String: inside
    Interpretation Score : 999
Alternative 2:
  Interpretation 1 of 1:
    Grammar Label        : rules
    Input Sentence       : go inside house
    Interpretation String: inside
    Interpretation Score : 999
Alternative 3:
  Interpretation 1 of 1:
    Grammar Label        : rules
    Input Sentence       : go inside the
    Interpretation String: inside
    Interpretation Score : 999
Alternative 4:
  Interpretation 1 of 1:
    Grammar Label        : rules
    Input Sentence       : go inside the house
    Interpretation String: inside
    Interpretation Score : 999
Alternative 5:
  Interpretation 1 of 1:
    Grammar Label        : rules
    Input Sentence       : go inside well
    Interpretation String: inside
    Interpretation Score : 999

http://developer.appcelerator.com/question/128380/speech-to-text

— answered August 1st 2013 by pragnesh patel
permalink

0 Comments

0 Votes

Titanium Community Questions & Answer Archive

We felt that 6+ years of knowledge should not die so this is the Titanium Community Questions & Answer Archive

Speech to text possible?

7 Answers