Telephony LIVE

THE 2008 TELECOM SUMMIT

Introducing Telephony Live: The 2008 Telecom Summit -- the second annual, two-day conference from the editors of Telephony magazine.

Learn more

         Subscribe in NewsGator Online   Subscribe in Bloglines

Who's Talking Now?

more on the topic

More Related Articles

Captain Kirk couldn't type. He didn't need to. In his fictional world, the computer was smart enough to do that laborious task for him. He just told it what he needed, and in a flash, the computer spoke the answer. In comparison, the current interface with the computer-riddled world is primitive. Keyboards act as the interface to convert thoughts into a form the computer can understand.

Think about the computer sitting on your desk right now. How much of it is really computer and how much of it is input or output devices? The keyboard, monitor, printer, fax machine and speakers are all things you need to communicate with a computer. Even the laptop is mostly screen and keyboard. Going mobile with data is not going to be easy as long as you have to carry along all of the baggage. Some have said that the perfect wireless device will have a large screen, a full keyboard and will fit nicely in a small pocket. That just isn't going to happen. People don't need a better computer; they need a better way to interface with it.

COMPUTER INTERACTION You can talk to your computer today. Software programs such as those from IBM and Dragon allow you to dictate to your computer, and the words you speak magically appear on the screen. The technology is not perfect. Some words are converted incorrectly, formatting can be incorrect, and the programs take a huge processing byte out of your computer. But it can be done.

Having your computer talk to you is more complex and is an important part of effortless mobile computing. E-mail, for example, mostly is text. Using sophisticated services such as Portico, Webley or Wildfire, you can have your e-mail read to you and then respond.

However, even with these services, there are still hurdles to overcome. Portico "listens" to every sound you make. If you say "um," it thinks you mean it. It also has a bad habit of responding to background noise. Using Portico in a noisy environment is nearly impossible, and finding a quiet environment is not always an option. Although the voice is robotic and unnatural, it is understandable, and applications for this technology are huge.

While traveling, users can have e-mail read to them via any phone. Users can send responses via an attached media file. They also can filter the e-mail so they receive only the most important messages wirelessly. Although the service offers more features, such as a news-clipping service and the ability to keep an appointment calendar, right now, access to e-mail is the No. 1 application for such services.

General Magic offers a free version of Portico, an e-mail service called MyTalk. Although it doesn't offer all of the bells and whistles of the pay service, the free service does let users check e-mail from any phone and respond by voice as well. Users even can make free long-distance calls via the service. The catch? The calls can last only two minutes, and users have to listen to an advertisement first. The MyTalk Web site also is advertiser-supported. Perhaps, like many Web-based services, the future virtual assistant will be a free service supported by advertising.

GOING VERTICAL E-mail access, however, is only the tip of the iceberg. There are specialized vertical applications for this technology as well.

Conita plans to apply text-to-speech technology to several vertical markets. Its products include V-Enterprise, V-Medical, V-Financial, V-Legal and V-Insurance. Each offers the standard virtual assistant capability that competitors traditionally have provided, but these also allow access to specialized databases and internal information via voice.

Consider the effect this technology could have on these vertical markets. Doctors could use Conita's service to access medical databases, check patient status, schedule procedures and keep up with all incoming voice and e-mail information. Attorneys could use Conita to access case law, access the vast Lexis legal library and schedule court dates. Insurance adjusters could meet the demands of a remote claims adjustment.

The technology also could be powerful for corporate users. V-Enterprise integrates with existing systems. Users could access a shared calendar remotely, access company databases to check on order status or place an order over the phone. They also could access sales applications that provide access to customer status, preferences, order history, contact information and reminders of important events.

Services such as those offered by Portico and Conita are part of the larger trend of unified messaging, which has huge potential. Ovum recently predicted that the fixed/mobile market currently is at $2 billion and is expected to soar to $35 billion by 2005. A big part of this trend is the migration of more traffic to mobile networks and customers' demands for a single place to check for all message types. Text-to-speech and voice-command technology will be an important piece of the puzzle to provide that functionality to end-users.

To the wireless industry, this trend means more airtime usage. As e-mail and Internet access becomes more embedded in the mainstream of business and personal users, voice-based access to those services will open the door to millions of users who otherwise would not attempt to tackle the learning curve. Not everyone is comfortable with a smart phone or wireless PalmPilot. Like the good captain, they just want to be able to tell the computer what they want and let the machine do the hard part. Given the choice, who wouldn't?

Better databases, less-expensive computer memory and more processing power have enabled linguists and phoneticists to implement more advanced solutions than possible with traditional text-to-speech (TTS) technology. Developers now use next-generation speech engines to create voice interfaces that lay the foundation for new applications such as e-mail and Web readers. These speech engines generate words by phonetic rules, so vocabularies are unlimited. The achievement of a truly natural-sounding human voice already is making current TTS applications much more compelling, but the future of the voice interface hinges on the computer's ability to interact with the user like a human would.

Computers must be able to generate questions to clarify what they've heard just as humans do. Until recently, all computer responses have been pre-recorded, which solved the problem of a realistic voice interface. But it also restricted the computer to answering only the questions the developer anticipated. The newest synthesizers enable the computer to generate any follow-up question because automatic speech recognition (ASR) is evolving as well. Next-generation ASR uses natural-language understanding, an artificial-intelligence-based technology both to recognize words and to understand their context.

There are two main TTS technologies: formant and concatenative synthesis. Formant synthesis models speech synthesis based on the way humans produce sound using their lungs and vocal chords. Concatenative systems use chips to store segments of recorded human speech in the form of phonemes, diphones and triphones, which are fragments and combinations of the smallest units of speech that distinguish one utterance from another.

Developers have realized that the larger the speech segments they use, the more natural the voice sounds. However, more memory is needed to store and access these segments.

The new concatenative speech synthesizers join larger segments, such as syllables, words and phrases, where there are several hundred thousand possible segment combinations to each unit. The challenge is to achieve the highest-quality speech with the smallest database and the least amount of processing. The computer must be able to find the best segments to use quickly and then glue these segments together in such a way that end-users don't hear the concatenation points.

Get Updates Via Email

related resources

popular articles

Want to use this article? Click here for options!
© 2008 Penton Media Inc.

Webcasts

WEBCAST

Telephony’s Inside Telecom Live: Building an efficient IPTV content supply chain

Find out! Watch Telephony's LIVE Webcast July 23, 2PM ET/11AM PT. Telephony will delve into what is required to create an efficient IPTV content supply chain. LEARN MORE or REGISTER NOW.

White Papers

WHITE PAPER

Intelligent Optical Control Plane Architectures

This paper explores the benefits of optical control plane functionality for service providers. DOWNLOAD NOW

Podcasts

PODCAST

A Telephony Podcast: Mobile’s virus threat

Gareth Maclachlan, CTO of AdaptiveMobile, speaks with Associate News Editor Sarah Reedy about the growing mobile virus threat.LISTEN

Blogs

BLOG

What happened at NXTcomm08

Recuperating from the big show, here are some reflections on some of the more prominent themes amid activity at the show... READ

E-Books

E-BOOK

READ E-BOOK: MANAGING THE CUSTOMER EXPERIENCE

This e-book explains how to keep your customers happy, reduce churn and strengthen profits. Sponsored by CA’s Wily Technology Division. READ NOW!

TV

TV

Interview with Jim Hansen of Embarq at NXTcomm08

Tune in to Telephony TV to watch an interview with Embarq's Jim Hansen at NXTcomm08. WATCH IT NOW.

  • Telephony Content
  • Telephony Content

NEWS & INSIGHTS

CURRENT ISSUE

TOOLS

more news

Global >>

MORE

Ethernet >>

MORE

Independent >>

MORE

IPTV >>

MORE

IMS >>

MORE

WiMax >>

MORE

VOIP >>

MORE

FTTX >>

MORE

Access >>

MORE

Broadband >>

MORE

Wireless >>

MORE

Software >>

MORE

Podcasts >>

MORE

Get Updates Via Email

Browse Issues

  • July 14, 2008
  • June 30, 2008
  • Jun 16, 2008
  • May 19, 2008
  • May 5, 2008
  • Apr 28, 2008
  • Apr 14, 2008