TODO

update docs to be clear that SpeechLion does not depend on Gnome, but is merely being developed and tested under it.

put bzr repo on website, maybe even launchpad?

release bug - make sure scripts/dosub is called on all docs before creating tarball. I think this is fixed... not sure though.

automatic loading of app-specific grammar based on the currently focused window. Look into Gnome AT-SPI (assistive technology service provider interface) (also has a CORBA interface may be accessible via Java CORBA bindings), GOK (Gnome onscreen keyboard). If Java JNI is needed to access some C/C++ code, looks like SWIG can generate the JNI code. Simple way: xprop -root _NET_ACTIVE_WINDOW to find window with focus. xprop -id <id> WM_CLASS WM_NAME to get its app name and window name. Can use xprop -spy -root ... to get updates as the focus changes rather than polling each time with xprop -root. For starters try polling in the outermost while 1 loop.

gmail mode - uses all browser commands too

more commands for launching different apps

refine spoken feedback - only give feedback when visual feedback not immediate, such as grammar switching. Or always give feedback when switching grammars, even if acknowledgements are off.

consider making a ubuntu package for speechlion - only problem is that there are no packages for sphinx4 or jython.

simpler selection of which microphone to use: selectMicrophone property or command-line: see http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#microphone_selection

test on Windows - think about way of managing and configuring windows- and linux-specific grammars

see if accuracy can be improved by tweaking sphinx configuration file and microphone settings.

dictation mode - try creating language model from sent emails and IM logs

commands for time and date status - probably spoken status is best

google search mode - want to be able to speak word list, language model not very important

numbers mode "one hundred two", etc.

use FreeTTS algorithms for pronunciation of words not in dictionary. Will Walker says somebody has created a patch to do this. This would allow for shell mode to create pronunciations of filenames in current directory, for example.

addenda dictionary for custom words -- look into voxapl usage in conf/sphinx.xml dictionary section - only supported in CVS version of S4... don't want to require that just yet.

define a word/command - say 'define word' and you are prompted to spell the word, which is basically a key sequence. Once it is correct, you say 'all done'. Then it prompts for the 'sound' of the word. You say the word 3 times. Using the pronunciation code that Will Walker has, it generates the phone set for the word. That phone set is added to the custom dictionary along with an ascii version of the word (since non-printing keys may be involved if it is a command). A corresponding entry in the command grammar is created with the exact sequence of keys. This could be used to add proper names to the dictionary on the fly, or shortcut commands on the fly. In the case of proper names, hopefully there is a way to add them to the dictation language model, at least in a crude way, even if the probabilities aren't there.

gui?

Popular Posts

Archives

Tags

RSS Feed