Exclusive New Research from the Telecom Leader

Survey stats * market share * real world deployments * and more

Now with two ways to buy…

      Subscribe in NewsGator Online   Subscribe in Bloglines   
   Comments

Winning enterprises over with speech to text

MyCaption combines voice recognition technology with human ears and deep integration into the handset to create an enterprise class text mobile speech-to-text platform

more on the topic

More Related Articles

MyCaption Chief Executive Officer Vipul Bhatt thinks it’s time to get serious about speech to text. The technology has been around for years powering auto-dictation and voice-search applications like Vlingo’s and voicemail-to-text services such as SpinVox’s, but most of those services have been targeted at consumers and prosumers. The most obvious target for speech-to-text technology, the enterprise, has been reluctant to embrace it for the simple reason that there is no enterprise-caliber platform to embrace, Bhatt said.

MyCaption is hoping to change that. The Bay Area startup has launched what it claims to be the first speech-to-text for the serious business user. It combines the network-based voice-recognition technology of all speech-to-text engines with human editors to refine translation. But Bhatt said its platform is more than just an auto-dictation service, describing the technology as voice-to-data rather than speech-to-text.

MyCaption relies on a robust middleware client that integrates closely with the productivity applications of a BlackBerry smartphone as well as the Outlook exchange server back in the office. A MyCaption user can set calendar entries and initiate new tasks as well as dictate email and personal memos up to 3 minutes in length. Once a voice message is converted, it is sent back to the phone for review, but the user can perform in-message editing all by voice. And once approved, the message or calendar entry isn’t sent through a proxy server or saved in a separate client but slotted right back into the BlackBerry productivity app and synched back to the enterprise server.

The BlackBerry client records speech and instructions and sends them as a file over the packet data network, where it is run through a voice-recognition engine and then passed on to a human editor. Due to the complexity of the messages, the vast majority of them have to be heard by human ears, Bhatt said.

“Honestly, we are considering doing away with the voice recognition,” Bhatt said. “Business users’ demands for accuracy are so high, we need the precision of a human editor to meet those standards.”

All speech-to-text applications rely on human translators to varying degrees to handle garbled phrases and nuances of speech that a speech engine simply can’t handle, but Bhatt said MyCaption has de-emphasized its speech-recognition technology for all but the most basic of messages. Whenever the speech engine encounters a sound or word order it does not recognize or even a homonym it cannot contextualize, it immediately shoots the message up to translator. For an enterprise user sending an email to an important client or booking a crucial meeting, that level of redundancy is key, Bhatt said.

But while the human element meets the accuracy requirements of an enterprise, it can also compromise its security requirements. A completely automated voice-to-text system is as secure as its transmission channels and server protection. With human translators, a live person is listening to what could be confidential communications. MyCaption has taken precautions to limit that exposure: Its human translators only receive the raw voice files and aren’t given any clue as to the individual identity, company or phone number of the person speaking. No messages are stored either in voice or text form—all are erased as soon as a transaction is finalized. But Bhatt acknowledged that companies in highly sensitive industries will continue to have security concerns. In the future, he said, MyCaption hopes to implement even further technological and methodological safeguards that will mask the context of the message without masking the content itself.

Want to use this article? Click here for options!
© 2009 Penton Media Inc.

  • Telephony Content


blog comments powered by Disqus
Get Updates Via Email
  • Telephony Content

related resources

popular articles

Webcasts

WEBCAST

Reduce Customer Churn and Cut Costs Webcast | July 22, 2009

Learn the best practices for online customer billing and service – how to implement a paperless bill, drive traffic to your web site, improve customer service.

REGISTER NOW

White Papers

WHITE PAPER

Automated End-to-End Managed Service Delivery. Sponsored by Ciena.

Ciena’s industry-leading CoreDirector Multiservice Optical Switch with FastMesh® has been used for efficient and robust core switching in the world’s largest networks. DOWNLOAD NOW

Podcasts

PODCAST

Wikimedia explores the phone as encyclopedia

Kul Wadhwa, head of business development, Wikimedia Foundation, discusses with senior editor Kevin Fitchard the Wikipedia’s future on the mobile phone. LISTEN

Blogs

BLOG

I-feature: Readers respond

As promised, a key component of Telephony’s new Interactive Featureis reader participation READ

E-Books

E-BOOKS

Next-Generation Now: Evolve your communications services in the post-recession world.

Read New eBook.

  • Telephony Content
  • Telephony Content

Recent Comments

Follow comments on Telephony

More ways to stay informed

Find us on Facebook

follow us on twitter

Browse Issues

  • June 1, 2009
  • October 1, 2008
  • April 1, 2009
  • March 1, 2009
  • February 1, 2009
  • January 1, 2009
  • December 1, 2008