Janne In Osaka: Google Voice

Friday, August 10, 2012

Google Voice

Google voice is interesting, I think, and surprisingly you soon. It lets you use your voice no just the web search is not for any text input. And with jelly bean, the newest version of android, the voice recognition can happen with your phone, with no network connection needed. Of course, as these paragraphs shows, recognition is not quite there yet at least not for a second language speak in such as myself.

Google Voice is interesting, I think, and surprisingly useful. It lets you use your voice not just for web searches but for any text input. And with Jellybean, the newest version of Android, the voice recognition can happen all on your phone, with no network connection needed. Of course, as this paragraph shows, the recognition is not quite there yet — at least not for second-language speakers such as myself.

The results above are pretty good, but I read a pre-written text and spoke slowly and clearly while in a quiet room, and I used the correction feature to select the right word when I could. With my usual fast, sloppy speech or in a noisy environment it works much worse. The biggest hurdle is ad-libbing your text. A large part of recognition is to match your speech to lots of existing texts and figure out what words you most likely meant to say. When you "umm" and "aaa" and "no, wait", break off mid-sentence, repeat yourself and mix words together then you completely ruin that. The result is probably even worse than if it didn't try to match to other texts at all.

With some way to train it, better recording and processing hardware and more familiarity on my part I'm sure it would get better. The ideal is to dictate an email in my usual hurried, broken English, while rushing through a crowded airport, with the phone still in my pocket. We're clearly still nowhere close.

Some people speculate that voice input will eventually completely replace the keyboard. That's not going to happen. You can't speak aloud for long stretches when at home, when on the train or in public without disturbing other people. Imagine the noise in a large office with dozens of people talking to their computers all day long. You couldn't work on anything the least bit confidential or sensitive without hiding in a separate room with a closed door.

A keyboard is faster than speaking; it is quiet and precise — try to dictate a research report and see the word salad that results from all the specialized vocabulary — and is essential for things such as programming or editing. The keyboard will remain the best choice for most kind of writing. True, you can get RSI from using a keyboard all day long — but you risk laryngitis if you spend all day using your voice.

While it'll never be the main input method, it'll be a good alternative for short texts, especially when a keyboard is impractical. If want to send a quick message about your schedule when moving through a terminal then it's faster to dictate than to stop and slowly peck it out on your phone. When you make a shopping list is far easier to read the items out loud as you rummage through the fridge than to write them down. We'll never have onlyone way to input things. Voice will coexist with touch screens, with gesture interfaces, with physical keyboards and probably with other modes we've yet to even see.

3 comments:

kamo19 August, 2012 18:05
Heh. My first thought on reading this was that EFL teachers should be fine with this, used to as we are at making each. Word. Stand. Alone. In. Its. Own. Sentence.

Which leads to the further thought that it might actually work better for 2nd language speakers, as they're generally far less likely to elide or use colloquialisms than native speakers, which are generally bigger barriers to comprehension.

Which ties in to a final thought concerning a book I've just read, which basically claims that English is on its way out as the world's lingua franca because machine translation will make it possible for everyone to converse in their native tongues ant let computers pick up the slack. I'm not sure how it ties in, admittedly, but it seems relevant. I'm sceptical about that claim, but don'T know whether this makes me more or less so...
ReplyDelete
Replies
Jan Moren19 August, 2012 19:26
Kamo, The voice input system (apparently "Google Voice" is the name of some separate phone app) has just been updated with Swedish among other new languages. And my own input is decidedly more accurate in Swedish than in English, despite English arguably being an easier target language.

Also, they have separate input systems for different English dialects (US, UK and so on), and have that for a reason. So my guess is that the accent and irregular pronunciation of us second and third-language speakers is at most only partially compensated for by our "textbook approach" when speaking.

I doubt English will cease to be the lingua franca - though I would guess that it will no longer be unopposed to the degree it's been for the past half-century. Machine translation is still translation; you lose a sense of immediacy that gives true speakers an edge. I have seen two other, related arguments that may be relevant:

- Lowest-common-denominator-English - something like Basic English, with domain-specific added vocabulary - will increasingly be the default, as more and more international conversations are between people none of whom are native speakers or proficient. That will paradoxically mean that being proficient or a native speaker may even become a slight drawback as they would tend to use vocabulary, slang and expressions not understood by their counterparts.

- As English as a second language spreads further, being at least bilingual will be expected in international business, education and other cross-border contexts. This will again tend to penalize native English speakers as they've had less of an incentive to learn a second language in the first place.
ReplyDelete
Replies
kamo26 August, 2012 10:58
Oh well, so much for that theory. Let's try another one instead ;)

Your last two points might actually be two flavours of the same thing. As you say, the vast majority of conversations in English now involve at least one non-native speaker (this one, for example, and in fact maybe 95% of all conversations I have in English right now).

Wikipedia has a Simple English listed as a language option, and it'd be possible to argue that Simple English is at the very least a dialect by itself. I know that when family and friends come to visit quite often they simply don't realise how much they need to adapt their native English in order to be understood (not just slower, but simple grammar and vocabulary, and no idioms).

Which is by way of saying that - maybe - 'being bilingual' could encompass speaking both Native English and Simple English (or English for International Communication, and many native speakers fall down even on that level.
ReplyDelete
Replies

Add comment

Comment away. Be nice. I no longer allow anonymous posts to reduce the spam.