Titanium Community Questions & Answer Archive

We felt that 6+ years of knowledge should not die so this is the Titanium Community Questions & Answer Archive

Speech to text possible?

I have an idea for a voice control app but can't find anything in the API's.

Any plans to implement it? Or any available workaround? (Besides Objective-C)

Thanks!

/J

— asked June 1st 2010 by Joacim Boive
  • control
  • iphone
  • mobile
  • speech
  • voice
0 Comments

7 Answers

  • I would also like to know. Or maybe a payed module?

    — answered June 21st 2010 by Richard Venneman
    permalink
    0 Comments
  • Yup! Still an issue…

    Any updates? Payed module also of interest.

    Thanks!

    /J

    — answered August 17th 2010 by Joacim Boive
    permalink
    0 Comments
  • Ok, this is where I'm at currently.

    Found a couple of online services that might help:

    http://create.spinvox.com/index.php?pageid=0
    http://www.voicecloud.com/api.php

    For my purposes Spinvox seems to fit the bill (also used by FaceBook).

    However… The wav file MUST be A-Law mono 8 kHz WAV file or a G.711 ?-Law 8 kHz mono WAV file BASE64 encoded to the request. Currently Titanium is unable to encode files (as far as I can tell), but it does have support for the A-law codec: Titanium.Media.AUDIO_FORMAT_ALAW

    I have been unable to find out if this is in stereo or mono (as ALAW allows both), but since it's intended use is for low bandwith devices it s-h-o-u-l-d be in mono. I've started a thread with this question here:
    http://developer.appcelerator.com/question/53501/audioformatalaw–mono-sound

    In my current app I only need very short voice recordings (a couple of letters) so maybe an online Base64 encoder can to the trick:

    http://www.motobit.com/util/base64-decoder-encoder.asp
    http://www.opinionatedgeek.com/dotnet/tools/base64encode/

    Should be able to create a plugin to Titanium using the iPhone SDK:
    http://stackoverflow.com/questions/392464/any-base64-library-on-iphone-sdk

    Tutorial:
    http://iphone.zcentric.com/2008/08/29/post-a-uiimage-to-the-web/

    Hopefully it will help someone. Please post your results!

    /J

    — answered August 17th 2010 by Joacim Boive
    permalink
    0 Comments
  • I was think if anyone knows a text to speech engine for titanium?

    — answered August 18th 2010 by Peter Lum
    permalink
    0 Comments
  • Any updates? I would be interested to pay for a good module..

    — answered January 8th 2011 by Peter Lum
    permalink
    1 Comment
    • Nothing yet…

      — commented January 10th 2011 by Joacim Boive
  • I got this to work using Lumenvox as a service- I purchased the LumenVox Lite Subscription License for Asterisk. (actually two subscriptions since you'll want to use the speech tuner $16/month)

    The code I created interfaces with the simpleclient example supplied with the kit. [the asterisk interface is a bonus ;-)]

    My appcelerator app sends the wav file to my linux host to process into words (speech to text) then process as it a grammar (text to action), giving me an action to trigger..

    It works well on wifi, with the server connected to that network directly.

    However, at 48k the wav files are too large to transmit over at&t's 3g and get the response in a reasonable amount of time.. the app works on 3g, but is too slow to use. I'd like to adjust the recording bit-rate to 8k (which is a rate Lumenvox is actually built to work with), but haven't found how in appcelerator— any ideas?

    I suspect that latency becomes part of the usability formula next..

    Here's some code to get you started…

    –snip

    var recorder=Ti.Media.createAudioRecorder({});
    recorder.compression = Ti.Media.AUDIO_FORMAT_ULAW;
    recorder.format = Ti.Media.AUDIO_FILEFORMAT_WAVE;
    
    var start = Ti.UI.createButton({title: "Record", top: 100, height:40, width:180});
    win1.add(start);
    start.addEventListener('touchstart', function(e) {
    Ti.Media.audioSessionMode = Titanium.Media.AUDIO_SESSION_MODE_RECORD;
      recorder.start();
      start.title = "Recording...";
      win1.backgroundColor='red';
    });
    start.addEventListener('touchend', function(e) {
      file = recorder.stop();
    Ti.Media.audioSessionMode = Titanium.Media.AUDIO_SESSION_MODE_PLAYBACK;  
      start.title = "Recorded: " + file.size;
      win1.backgroundColor='#FFFFFF';
      Ti.API.info("Recorded: " + file.size);
       xhr.onerror = function(e)
      {
        Ti.UI.createAlertDialog({title:'Error', message:e.error}).show();
        Ti.API.info('IN ERROR ' + e.error);
        win1.backgroundColor='#FFFFFF';
      };
    

    – snip

     Ti.API.info('sending http://'+server+'/uploadfile.php');
      xhr.open('POST','http://'+.server+'/uploadfile.php'');
      xhr.setRequestHeader("Content-Type", "audio/x-wav");
     var data_to_send = { 
                 "file": file.blob
            };
      // send the data
      xhr.send(data_to_send);
    

    – snip

    // php code to return a json file with up to 4 grammar rule outputs
    $xv2=path_file; // path+file (no .ext)
    
    ini_set('max_upload_filesize', 8388608);
      if ($_FILES["file"]["error"] > 0)
        {
        echo "Return Code: " . $_FILES["file"]["error"] . "<br />";
        }
      else
        {
        echo "Upload: " . $_FILES["file"]["name"] . "<br />";
        echo "Type: " . $_FILES["file"]["type"] . "<br />";
        echo "Size: " . ($_FILES["file"]["size"] / 1024) . " Kb<br />";
        echo "Temp file: " . $_FILES["file"]["tmp_name"] . "<br />";
        if (file_exists("upload/" . $_FILES["file"]["name"]))
          {
          echo $_FILES["file"]["name"] . " already exists. ";
          }
        else
          {
           $xv3=  (int)gmdate('U');
           move_uploaded_file($_FILES["file"]["tmp_name"],
          $xv3."_".$xv2.".wav");
          echo "Stored in: " . "upload/" . $_FILES["file"]["name"];
          }
        }
    }
    $command1 = "examples/SimpleClient rules rules.gram 1 $xv3."_".$xv2.".wav 127.0.0.1";
    $output = exec('echo "'.$command1.'" > command.sh');
    // echo '#1:'.$output;
    $output = exec("sh command.sh >".$xv2.".txt"); // *** this file contains the speech text ****
    $output = exec("grep 'Interpretation String:' ".$xv2.".txt > ".$xv2."0.txt");
    $output = exec("awk '{ if (NR==1) print $0 }' ".$xv2."0.txt >".$xv2."1.txt");
    $output = exec("awk '{ if (NR==2) print $0 }' ".$xv2."0.txt >".$xv2."2.txt");
    $output = exec("awk '{ if (NR==3) print $0 }' ".$xv2."0.txt >>".$xv2."2.txt");
    $output = exec("awk '{ if (NR==4) print $0 }' ".$xv2."0.txt >>".$xv2."2.txt");
    $output = exec("sed -s 's/    Interpretation String: /{\"gram\":[{\"reply\":\"/g' ".$xv2."1.txt >".$xv2."3.txt");
    $output = exec("sed -s 's/    Interpretation String: /\"},{\"reply\":\"/g' ".$xv2."2.txt >/".$xv2."4.txt");
    $output = exec("tr -d '\n' < ".$xv2."3.txt >".$xv2."5.txt");
    $output = exec("tr -d '\n' < ".$xv2."4.txt >".$xv2."6.txt");
    $output = exec("echo '\"}]}' >>  ".$xv2."6.txt");
    $output = exec("cat ".$xv2."6.txt >> ".$xv2."5.txt");
    
    ?>
    

    – snip

    the php returns a json formated list of the 4 top actions.. however, if you look at
    the original file.txt produced, it contains the words spoken.. so take it from there.

    This is the the file.txt when I said "Go inside", see the 4 possible things my grammar logic
    thinks I could be trying to tell the computer to do
    :

    access
    Loaded Grammar!
    Loaded Audio!
    Decode returned!
    Alternative 1:
      Interpretation 1 of 1:
        Grammar Label        : rules
        Input Sentence       : go inside
        Interpretation String: inside
        Interpretation Score : 999
    Alternative 2:
      Interpretation 1 of 1:
        Grammar Label        : rules
        Input Sentence       : go inside house
        Interpretation String: inside
        Interpretation Score : 999
    Alternative 3:
      Interpretation 1 of 1:
        Grammar Label        : rules
        Input Sentence       : go inside the
        Interpretation String: inside
        Interpretation Score : 999
    Alternative 4:
      Interpretation 1 of 1:
        Grammar Label        : rules
        Input Sentence       : go inside the house
        Interpretation String: inside
        Interpretation Score : 999
    Alternative 5:
      Interpretation 1 of 1:
        Grammar Label        : rules
        Input Sentence       : go inside well
        Interpretation String: inside
        Interpretation Score : 999
    
    — answered April 13th 2011 by Richard Rutenberg
    permalink
    0 Comments
  • http://developer.appcelerator.com/question/128380/speech-to-text

    — answered August 1st 2013 by pragnesh patel
    permalink
    0 Comments
The ownership of individual contributions to this community generated content is retained by the authors of their contributions.
All trademarks remain the property of the respective owner.