Speech to text possible?
I have an idea for a voice control app but can't find anything in the API's.
Any plans to implement it? Or any available workaround? (Besides Objective-C)
Thanks!
/J
7 Answers
-
I would also like to know. Or maybe a payed module?
-
Yup! Still an issue…
Any updates? Payed module also of interest.
Thanks!
/J
-
Ok, this is where I'm at currently.
Found a couple of online services that might help:
http://create.spinvox.com/index.php?pageid=0
http://www.voicecloud.com/api.phpFor my purposes Spinvox seems to fit the bill (also used by FaceBook).
However… The wav file MUST be A-Law mono 8 kHz WAV file or a G.711 ?-Law 8 kHz mono WAV file BASE64 encoded to the request. Currently Titanium is unable to encode files (as far as I can tell), but it does have support for the A-law codec: Titanium.Media.AUDIO_FORMAT_ALAW
I have been unable to find out if this is in stereo or mono (as ALAW allows both), but since it's intended use is for low bandwith devices it s-h-o-u-l-d be in mono. I've started a thread with this question here:
http://developer.appcelerator.com/question/53501/audioformatalaw–mono-soundIn my current app I only need very short voice recordings (a couple of letters) so maybe an online Base64 encoder can to the trick:
http://www.motobit.com/util/base64-decoder-encoder.asp
http://www.opinionatedgeek.com/dotnet/tools/base64encode/Should be able to create a plugin to Titanium using the iPhone SDK:
http://stackoverflow.com/questions/392464/any-base64-library-on-iphone-sdkTutorial:
http://iphone.zcentric.com/2008/08/29/post-a-uiimage-to-the-web/Hopefully it will help someone. Please post your results!
/J
-
I was think if anyone knows a text to speech engine for titanium?
-
Any updates? I would be interested to pay for a good module..
-
I got this to work using Lumenvox as a service- I purchased the LumenVox Lite Subscription License for Asterisk. (actually two subscriptions since you'll want to use the speech tuner $16/month)
The code I created interfaces with the simpleclient example supplied with the kit. [the asterisk interface is a bonus ;-)]
My appcelerator app sends the wav file to my linux host to process into words (speech to text) then process as it a grammar (text to action), giving me an action to trigger..
It works well on wifi, with the server connected to that network directly.
However, at 48k the wav files are too large to transmit over at&t's 3g and get the response in a reasonable amount of time.. the app works on 3g, but is too slow to use. I'd like to adjust the recording bit-rate to 8k (which is a rate Lumenvox is actually built to work with), but haven't found how in appcelerator— any ideas?
I suspect that latency becomes part of the usability formula next..
Here's some code to get you started…
–snip
var recorder=Ti.Media.createAudioRecorder({}); recorder.compression = Ti.Media.AUDIO_FORMAT_ULAW; recorder.format = Ti.Media.AUDIO_FILEFORMAT_WAVE; var start = Ti.UI.createButton({title: "Record", top: 100, height:40, width:180}); win1.add(start); start.addEventListener('touchstart', function(e) { Ti.Media.audioSessionMode = Titanium.Media.AUDIO_SESSION_MODE_RECORD; recorder.start(); start.title = "Recording..."; win1.backgroundColor='red'; }); start.addEventListener('touchend', function(e) { file = recorder.stop(); Ti.Media.audioSessionMode = Titanium.Media.AUDIO_SESSION_MODE_PLAYBACK; start.title = "Recorded: " + file.size; win1.backgroundColor='#FFFFFF'; Ti.API.info("Recorded: " + file.size); xhr.onerror = function(e) { Ti.UI.createAlertDialog({title:'Error', message:e.error}).show(); Ti.API.info('IN ERROR ' + e.error); win1.backgroundColor='#FFFFFF'; };
– snip
Ti.API.info('sending http://'+server+'/uploadfile.php'); xhr.open('POST','http://'+.server+'/uploadfile.php''); xhr.setRequestHeader("Content-Type", "audio/x-wav"); var data_to_send = { "file": file.blob }; // send the data xhr.send(data_to_send);
– snip
// php code to return a json file with up to 4 grammar rule outputs $xv2=path_file; // path+file (no .ext) ini_set('max_upload_filesize', 8388608); if ($_FILES["file"]["error"] > 0) { echo "Return Code: " . $_FILES["file"]["error"] . "<br />"; } else { echo "Upload: " . $_FILES["file"]["name"] . "<br />"; echo "Type: " . $_FILES["file"]["type"] . "<br />"; echo "Size: " . ($_FILES["file"]["size"] / 1024) . " Kb<br />"; echo "Temp file: " . $_FILES["file"]["tmp_name"] . "<br />"; if (file_exists("upload/" . $_FILES["file"]["name"])) { echo $_FILES["file"]["name"] . " already exists. "; } else { $xv3= (int)gmdate('U'); move_uploaded_file($_FILES["file"]["tmp_name"], $xv3."_".$xv2.".wav"); echo "Stored in: " . "upload/" . $_FILES["file"]["name"]; } } } $command1 = "examples/SimpleClient rules rules.gram 1 $xv3."_".$xv2.".wav 127.0.0.1"; $output = exec('echo "'.$command1.'" > command.sh'); // echo '#1:'.$output; $output = exec("sh command.sh >".$xv2.".txt"); // *** this file contains the speech text **** $output = exec("grep 'Interpretation String:' ".$xv2.".txt > ".$xv2."0.txt"); $output = exec("awk '{ if (NR==1) print $0 }' ".$xv2."0.txt >".$xv2."1.txt"); $output = exec("awk '{ if (NR==2) print $0 }' ".$xv2."0.txt >".$xv2."2.txt"); $output = exec("awk '{ if (NR==3) print $0 }' ".$xv2."0.txt >>".$xv2."2.txt"); $output = exec("awk '{ if (NR==4) print $0 }' ".$xv2."0.txt >>".$xv2."2.txt"); $output = exec("sed -s 's/ Interpretation String: /{\"gram\":[{\"reply\":\"/g' ".$xv2."1.txt >".$xv2."3.txt"); $output = exec("sed -s 's/ Interpretation String: /\"},{\"reply\":\"/g' ".$xv2."2.txt >/".$xv2."4.txt"); $output = exec("tr -d '\n' < ".$xv2."3.txt >".$xv2."5.txt"); $output = exec("tr -d '\n' < ".$xv2."4.txt >".$xv2."6.txt"); $output = exec("echo '\"}]}' >> ".$xv2."6.txt"); $output = exec("cat ".$xv2."6.txt >> ".$xv2."5.txt"); ?>
– snip
the php returns a json formated list of the 4 top actions.. however, if you look at
the original file.txt produced, it contains the words spoken.. so take it from there.This is the the file.txt when I said "Go inside", see the 4 possible things my grammar logic
thinks I could be trying to tell the computer to do
:access Loaded Grammar! Loaded Audio! Decode returned! Alternative 1: Interpretation 1 of 1: Grammar Label : rules Input Sentence : go inside Interpretation String: inside Interpretation Score : 999 Alternative 2: Interpretation 1 of 1: Grammar Label : rules Input Sentence : go inside house Interpretation String: inside Interpretation Score : 999 Alternative 3: Interpretation 1 of 1: Grammar Label : rules Input Sentence : go inside the Interpretation String: inside Interpretation Score : 999 Alternative 4: Interpretation 1 of 1: Grammar Label : rules Input Sentence : go inside the house Interpretation String: inside Interpretation Score : 999 Alternative 5: Interpretation 1 of 1: Grammar Label : rules Input Sentence : go inside well Interpretation String: inside Interpretation Score : 999
-
http://developer.appcelerator.com/question/128380/speech-to-text