Robot Speech Recognition

Social interaction presents a serious component in Artificial Intelligence and the Robotics industry, and naturally makes the development of socially-interactive robots one of the most challenging projects for roboticists. Although robots have become more sophisticated during the last decade, a truly effective companion robot, perhaps somewhere along the lines of Rosie in “The Jetsons,” still remains to be seen.

Robot companions present a deeper challenge because in order to perform tasks, communication between a human and an assistant robot must be as natural as possible, and this necessitates difficult and very detailed work in several areas. Aside from the content of the message uttered by the user, the user’s gestures and his or her bodily motions should all be put into consideration. Robot speech recognition is not all that is needed.

The robot companion must be able to understand the meaning in relation to the context as represented by a specific task, place, object, action, a set of objects or other people involved. Noisy environments, accents, non-verbal messages, orders necessitating action, all require highly sophisticated audio and video technology, as well as a host of other components. If a mobile robot needs to understand a command that includes recognizing color, for example, it will need to be provided with color vision.

In some of the earliest models of robot speech recognition, researchers worked with very simple grammar sentence instruction, such as “Move,” “Go ahead” or “Turn left.” One example is VERBOT (Isolated word speaker dependent Speech recognition Robot), a hobbyist robot that was sold in the early 1980’s. Today however, researchers are working on complex grammar sentence instruction, which is what people normally use in their daily lives. Modern-day researchers are also focusing on speech and gesture recognition, interpretation, and fusion for improving Human-Robot Interaction (HRI) and/ or Human-Computer Interaction (HCI).

The goal of the service or companion robot is to help people in everyday life. As such, it is vital for the robot to be able to understand and communicate effectively with its human users. Robot Speech Recognition (SR) technology is the best way to facilitate communication between robot and humans, even for novice users without proper training. To accomplish this, it is ideal to use English rather than a programming language, as this simulates the most natural means of interaction. Because humans find it easiest and most sensible to use Natural Language (NL) in interacting with robots in the social context, roboticists have decided to make a Natural Language interface to improve Human-Robot Interaction.

Robot Speech Recognition

This interface is now starting to appear in several standard software application, and is recommendable for beginners to use because of its relatively easy-to-use features. Through Robot Speech Recognition, robots gain initial knowledge that they could build upon in order to effectively process the tasks they are programmed to accomplish.

Generally, robot Speech Recognition is the process of converting an acoustic signal, usually captured by microphone or a telephone, to a set of words. It has two basic but very important functions: i) To recognize the series of sounds and ii) to identify the word from the

sound. One of its primary components is the microphone, which can usually “hear” everything. The noisy data is handled by the Speech Recognition system, which depends strongly on the robot’s design. Another primary component is the speaker or loud speaker, which the robot can use to “speak” to the user.

There are several Speech Recognition software available in online shops, like the VRbot Speech recognition Module for Robonova and other robots, which is designed to easily add versatile voice command functionality to robots ($57 online). Although specifically designed for ROBONOVA-I and ROBOZAK, the VRbot module can also be used to efficiently implement speech recognition capabilities on virtually any host platform. The module is completely self-contained and interacts with the host through a simple, yet robust serial protocol, enabling Speech recognition on relatively low-power processors such as ATMEGA, PIC etc. The SR-06 Speech Recognition Kit ($100 online) is a stand-alone circuit that can recognize up to 40 user-selected words lasting one second each or 20 user- selected words or phrases lasting 2 seconds each. The speech recognition circuit is multilingual, so words to be trained for recognition may be in any language. And then there is the Tigal VoiceGP + DK-T2SI Speech recognition Development Kit ($426 online) for VoiceGP Module, which includes all of the hardware and software one needs to develop voice and speech recognition capabilities into a desired application. It also includes the necessary software to develop speaker independent vocabularies from text-based input in multiple languages including US English, UK English, German, French, Italian, LA Spanish, Korean and Japanese.

But even people without robots can take advantage of the benefits of Speech Recognition technology. Microsoft Windows Speech Recognition (which comes free with Windows Vista and Windows 7), for example, empowers users to interact with their computers by voice. It’s designed for people who want to significantly limit their use of the mouse and keyboard while maintaining or increasing their overall productivity. Using the program, one can dictate documents and emails in mainstream applications, use voice commands to start and switch between applications, control the operating system, and even fill out forms on the Web. Very similar is the Dragon speech recognition software ($45 - $85 online), which also makes it easier for people to become more productive on a computer. As the user talks, the software types, supposedly capturing thoughts and ideas faster. The human voice may also be used to create and edit documents or emails, launch applications, open files, and control the mouse, among other things. For Macintosh users, there’s the MacSpeech Dictate ($150 online), which testers say is as accurate as Dragon, although unlike Dragon files can't edit by spoken command. And for iPhone there is the web-based Siri.

Because of its potential, Speech Recognition technology promises to change the way we interact with machines, both robots and computers, and perhaps bring us closer to having companion robots that behave more like Rosie around the house.