Wednesday, June 12, 2013

Control Your Computer Using Speech

This was inspired by the Iron Man series, i was really fascinated to see Tony Stark talking to his computer JARVIS and getting quick and witty replies. Though it is possible to get JARVIS developed, but it is out of the scope of this blog.Let's try and make a more simpler application that would do specific tasks like open a new window, or pause a video, or even type automatically as you say, or maybe scroll through texts and give intelligent feedback by cross referencing a particular knowledge base,all this just using voice commands

Prerequisites:

1. JDK 1.6 or higher
2. Eclipse IDE
3. Sphinx 4.0 

Steps:

1. Configure Sphinx 4.0 with JAVA Application
2. Custom Code for Specific Applications Written in C
3. Combine (C & Java)
4. Deployment

Step 1 : Configure Sphinx 4.0 with JAVA Application

Information regarding sphinx 4.0 can be found here http://cmusphinx.sourceforge.net/sphinx4/

Download the latest version of Sphinx sphinx4-1.0beta6-src.zip

See the video for more details on Setting up Sphinx 4.0 with Eclipse IDE


                            


Once you have done the basic setup as illustrated in the video above we need to change the grammar file as well as the dictionary to suit our application. 

The hello.gram file is located in 'HelloWorld.jar' file in 

'C:\sphinx4-1.0beta6\bin'. 

Extract the contents of the jar file into the same location and open the hello.gram file at location

C:\sphinx4-1.0beta6\bin\edu\cmu\sphinx\demo\helloworld


Open the hello.gram file using wordpad or similar editor 

Change it to look like the figure below and this click on save. You are done with the first part of configuration.


See the Notes Section to Know More About JSGF Grammar Formats

Now we need to modify the dictionary file found at 

C:\sphinx4-1.0beta6\lib\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz\dict

You will need to extract the file 
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar found under 

C:\sphinx4-1.0beta6\lib\

to get the dictionary file (cmudict.0.6D) mentioned above.

See the image below for more details .. 





Once you have opened it in wordpad the file should look something like this 


!EXCLAMATION-POINT   EH K S K L AH M EY SH AH N P OY N T
"OPEN-QUOTE          OW P AH N K W OW T
"CLOSE-QUOTE         K L OW Z K W OW T
"DOUBLE-QUOTE        D AH B AH L K W OW T
"END-OF-QUOTE        EH N D AH V K W OW T

"END-QUOTE           EH N D K W OW T

Followed with many such illustrations of different words..

Select all and Delete it..Now open a Notepad file and fill it as shown in the image below.



Save the Notepad file with any name on desktop. Now browse to 


and upload the Notepad file you just saved on your desktop in the step above..Click on choose file to upload the txt file and then click on 'Compile Knowledge Base' button. If the process was smooth you should see a new window that resembles the figure below

Click on Dictionary to get a new window with text resembling the cmudict.0.6D file

if you don't manage to follow the process of creating a dictionary then you can just copy and paste the following content to the 'cmudict.0.6D' file and click on save

COMMAND K AH M AE N D
DOWN D AW N
FACEBOOK F EY S B UH K
GMAIL G M EY L
GOOGLE G UW G AH L
MUTE         M Y UW T
PAUSE P AO Z
PLAY         P L EY
SOLVE S AA L V
UP         AH P
WRITE R AY T
YAHOO Y AA HH UW
YOUTUBE Y AW T Y UW B


you will now need to convert the folder WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz back to a .jar file the conversion process is as follows folder - > zip - > jar

Compress the folder to a .zip archive and then change the extension to .jar

now that the basic setup is completed.lets see how to use this newly configured sphinx library in our project.

Start Eclipse IDE, and Create a New Java Project, Name it HelloWorld and click on Next and add the external JARS as specified in the video above.
Instead of the HelloWorld.class file we will write our own code for HelloWorld.Java


Step 2 : Custom Code for Specific Applications Written in C


Imagine if you say up and you want to scroll up a page how would you do that? One way is to just automate a key press.i.e. make the software think that a key was press even though it wasn't physically. so we write a C/C++ code for the purpose. 

The below algorithm can be used to simulate a key press using C 

#include

main ()
{
// Code for Key Press
   Sleep(miliseconds)
// Code for Key Release
   break;
}

You can use a combination of key presses too. like 'Alt + F4' will close a window when you say 'Close' or you can scroll up using 'Page Up' when you say 'Scroll Up'

Virtual Key codes for almost all the keys are available on 

http://msdn.microsoft.com/en-us/library/windows/desktop/dd375731(v=vs.85).aspx


Step 3 : Combine C and Java

Java coding is done for Speech Recognition and when a specific speech pattern returns a specific result we either use Java to call external applications or use Compiled C (.exe) code to execute applications not triggered by Java. 


Step 4 : Deployment

Deploy the Compiled Java code to a .Jar or .exe file so it can be used on any system supporting Java Compilers, which is basically almost everything.

Checkout the video of the working project here ....

http://youtu.be/5vqFWqCK1-A