Home / Special Report / Voice-recognition technology starting to come into its own

Voice-recognition technology starting to come into its own

Phenelope: “OK, I think I’m ready to start.”
Boss: “OK, let’s take a train from Detroit to Washington.”
Phenelope: “What route would you like to get from Detroit to Washington?”
Boss: “Let’s go via Toledo and Pittsburgh.”
Phenelope: “The terminal at city Scranton is delaying traffic due to localized heavy winds. The terminal at city Baltimore is delaying traffic due to localized heavy winds. An additional five hours will be needed to travel through them due to decreased visibility.”
Maybe this conversation isn’t quite so typical, however, since Phenelope is a computer.
According to local computer experts, the day is rapidly approaching when the common user interface is the human voice, not the keyboard and not the mouse.
The first voice-recognition computer systems were developed to allow disabled people to use computers, says Terry Martin, president of VOILA Technology Inc., a consulting, sales and development company specializing in hands-free/eyes-free computer operation. Blind since birth, Martin began the company with an initial objective of enabling blind people to get into the high-tech work force. His work has since expanded to development of voice-recognition systems for professionals who need to be able to dictate while performing job-related functions–pathologists and radiologists, for instance.
The technology has even evolved so that portable digital audio recorders can be brought almost anywhere for dictation, then brought back and plugged into the computer to generate transcribed text. In addition, computers can be programmed to respond to simple verbal cues, or macros. Macros can generate standard text or wording in response to a voice command such as “opening paragraph,” or they can be programmed to execute specific computer commands such as signing onto or off of the system.
“Although these systems were initially developed for people with disabilities, now manufacturers are going after commercial markets,” Martin says.
But the voice-recognition systems available today generally require considerable user and (yes!) computer training, as well as a specialized mode of speaking called “discreet speech,” which requires that the user pause after each utterance.
According to Martin, when a computer registers an utterance, it must compare it with an active vocabulary–generally 30,000 to 60,000 words–a process that occurs within one-sixth of a second and results in a processing speed of 60 words per minute.
Jean Barber, an analyst specialist at Eastman Kodak Co., has been using a voice-recognition computer developed especially to enable her to return to work after a year of disablement due to tendinitis. She uses the system for 65 percent to 70 percent of her inputting requirements.
“It was frustrating for me at first because it has to learn your voice. The more you use it, the more it learns your voice,” Barber says. “I’m always saying, “Now listen,’ to it.”
In addition, Barber’s computer had to be programmed to desensitize it to environmental sounds. Ringing phones, clearing throats and slamming doors all had to be added to the “non-react-to repertoire.” Barber says she and her computer have some “interesting” times when she has a cold, because it does not recognize her altered voice.
Why aren’t there more systems in use today? John O’Leary, voice-recognition manager for Wilmac Co., thinks it is because people either are not aware of the technology or have not looked into it in the past year.
“The initial system reviews were not good. It used to take 30 days to get up to speed,” O’Leary says.
Desktop systems developed 10 years ago cost between $80,000 and $90,000, and only produced 20 to 30 words per minute, a reputation that apparently detoured interest, O’Leary says. Even three years ago, voice-recognition productivity speeds maxed at 50 words per minute with a relatively low accuracy rate of approximately 90 percent.
Today’s systems, he says, output between 75 and 90 words per minute on high-end PCs with an accuracy rate of 98 percent to 99 percent. Costs range from approximately $695 (including software, microphone, headset and documentation) to $2,300 without an existing computer.
The talking computers on “Star Trek” may have sprung from some creative minds, but they really are not so far-fetched, says James Allen, a professor of computer science at the University of Rochester, where he heads the development of an experimental voice-recognition program.
“I’ve been involved in this for 20 years, but only recently has the field really moved forward,” Allen says.
Enter the conversational computer, aka Phenelope.
“You talk to it like a person, and it talks back,” says Allen, describing the continuous-speech technique used with the system. Systems like Phenelope do not require the use of discreet speech; instead, they enable conversational dialogue with “real-time” verbal responses. And if the system does not understand, it asks for clarification–the first evidence of a system that adapts to users instead of the other way around.
Currently, such systems can deal with limited subject matter. Train routing is Phenelope’s specialty, with travel-agency or telephone-shopping functions a possibility in the not-too-distant future.
The key to Phenelope’s success and the success of future continuous-speech applications is the computer’s programmed ability to decipher the speaker’s intention.
Allen gives this question as an example: Do you know what time it is?
“There could be different intentions behind this question,” he says. “The computer has to figure out what makes sense in context of the entire conversation.”
That said, the “Star Trek” vision is not quite around the corner just yet; Allen estimates it would take 30 to 40 years to encode much of human knowledge.
But in the meantime, “2001”’s HAL has some pals out there.
“I talk to my computer like they do on “Star Trek,”’ says Martin. “I call it HAL. Because I’m disabled, the computer is a vital part of my existence. Disabled people can’t just go from computer to computer.
“I say to it, “HAL, how are you feeling?’ It responds, “I’m fine.”’
(Mary Anne Donovan is a Rochester-area free-lance writer.)


Check Also

Aerial Stills 2014

Local colleges cautiously optimistic (access required)

Summer is typically a down time for colleges and universities, but local educators say this year has been anything but ...

Kevin J. Mulvehill

Changing of the guard at Phillips Lytle LLP (access required)

Kevin J. Mulvehill is the new office leader for the Rochester office of Phillips Lytle LLP. Mulvehill, 39, replaces Richard ...