Voice Controlling Hexy

(This post is a bit rough, but I figured I should get it out there, since I won't be working on Hexy for a little while.)
Before we get started:
  1. The code used to make this all work is on Github. I'm new to ROS, so I don't claim that it's set up correctly.
  2. To get things to ROSLaunch nicely, I had to make some pretty hacky changes to how PoMoCo deals with directories. Basically, it now has to use full paths to every directory ( these paths are created at runtime using __file__ ). I haven't seen it fail yet, and I can't think of a specific failure mode, but it makes me very uncomfortable, and I'm looking for a way to change it.
  3. ROSPoMoCo isn't a full substitute for PoMoCo yet. Offsets, for instance, can't be set in ROS, though they are loaded from the .cfg file.

After a bit of hacking, I've ported most of PoMoCo over to run as a ROS Node. This node listens for new moves on the /moves topic and runs them if it is able to (If it can't find the move, it issues a Warning and ignores it).

At this point, it's time to take advantage of ROS to do neat stuff with Hexy. At first, I wasn't sure what exactly I wanted to do, but some Googling brought up this blog post, in which an iRobot create is controlled by voice. The package used to do this is perfect for controlling something like Hexy -- pocketsphinx will broadcast any recognized phrase on the \output topic, and the set of recognized phrases can be set with a text file.

Sadly, that tutorial is for ROS electric. I'm running ROS groovy on a much newer version of Ubuntu, so some of the instructions need tweaking. In the interest of passing on inspiration and aid, I've include the rough equivalent of the Pi Robot instructions below. Pi Robot deserves credit for these. I mean this section only as a tutorial, an I'm not claiming it's my original work.

First, grab the pocketsphinx gstreamer plugin.

$ sudo apt-get install gstreamer0.10-pocketsphinx
Next, grab the ROS wrapper for pocketsphinx. The ROS Wiki says this is located in an SVN repo at
http://albany-ros-pkg.googlecode.com/ but this repo says it is no longer maintained and recommends using https://github.com/mikeferguson/pocketsphinx.git instead. Since this repo is, as of the time of writing, actively maintained, I will use it.
cd into the appropriate ROS workspace ( it should be in your ROS_PACKAGE_PATH ) and run the following:

$ git clone https://github.com/mikeferguson/pocketsphinx.git
$ rospack profile
Everything should be set up now. To check, type the following:
roscd pocketsphinx
You should cd into the appropriate directory. If you get an error, check your  ROS_PACKAGE_PATH.
At this point, you should be able to demo pocketsphinx.
roslaunch pockersphinx voice_cmd.launch

Once it starts, try saying things like "forward", "backward", "move right", etc. You should see these phrases printed on the terminal shortly after you say them. (Note that pocketsphinx is quite sensitive to noise) These are being broadcast on recognizer/output.

Congrats! The setup work is done, and there's very little coding left to hook everything up. In fact, hacky translator I wrote takes only 51 lines of python, most being a dictionary to match phrases to the moves hexy knows how to do.

I've written a .launch file for all of the above, so assuming you've got my (pre-alpha) ROSPoMoCo package profiled, you should be able to plug in hexy and a mic, and type

$ roslaunch ROSPoMoCo voice-control.launch

After a few seconds to allow for the rosnodes to start, say "get up" or similar. Be warned that as of the time of writing, there is not a voice command / ROS service to disengage the servos. You'll need to pull the plug manually.

No comments:

Post a Comment