I recently posted a blog discussing the capabilities of machine speech recognition and how that correlates to the idea of auto-transcription of in-depth interviews (you can find that here). But speech recognition is one thing and actual understanding is another i.e. what do the words mean, what is the context of what is being said or even implied?
Many of us are now used to our 'smart assistants' (Alexa, Siri, OK Google etc.) recognising the verbal input we provide to them and then interpreting these correctly as instructions or questions and then carrying out the appropriate action or response. So as well as recognition there is 'understanding'. In this blog I want to consider a demonstration from Google at their recent keynote. Here Google are showcasing a next phase in the smart assistant sphere - namely an ability to action a communication on your behalf, to make contact, hold a conversation, and perform an action with another human contact.
Amazing huh? The first example demonstrated shows Google Assistant calling a hair salon to make an appointment for a hair cut. The machine interacts with a human in a natural and fluid way. In the example there is no suggestion that the call recipient has either been informed, or is aware, that the caller they are dealing with is non-human. The interaction is natural (mostly - "a woman's hair cut" is a bit jarring). The caller handles the unknown flow of the conversation in a natural manner to result in the desired outcome. i.e. a confirmed appointment booking for the desired service. The second example is arguably even more impressive in that the conversation takes a bit of a different track. It appears the receivers first language might not be English and this is noticeable in both the speech that would not necessarily have been clear to me without subtitles, and the understanding of the machine interaction. I can relate to many similar examples where background noise of a busy restaurant, call quality, understanding of both parties etc. hasn't always made the intent so clear so has required a degree of interpretation and repetition to reach the answer/desired outcome. Here the machine handles a transaction that did not necessarily follow the expected path.
The Turing Test is widely known. It was established by Turing in 1950 as a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. Evaluators would hold text conversation with a partner and evaluate whether they were human or machine. If the evaluator could not reliably tell the machine from human then the machine has passed the test. The test itself is highly influential, widely criticised, and an important concept in AI.
The Turing Test hasn't yet been passed (although there are some claims to the contrary). The definition of what the Turing Test is actually seeking (intelligent response of a type equivalent to humans) isn't even universally agreed. It is, of course, possible to game the test as outlined in the simplest form and a series of responses might lead one to believe there is a human contributor on the other end when there is not. Chatbots are rife for example. But this is not passing the Turing Test in the true sense of the meaning. The machine is not providing an actual intelligent response. So has this Google demonstration passed the Turing Test, and more than that has it exceeded the Turing test by not being limited to simple text but rather speech conversation. The answer, I believe, is no. But it doesn't matter.
Why doesn't it matter? Well firstly this isn't an example designed to demonstrate the passing of the Turing Test. It is a demo designed to show how advances in speech recognition and AI can assist in simple assistance. A real world working example of a smart assistant performing service beyond that which your smart speaker/device can currently perform. If the human responder on the other end of the line had been tasked with trying to determine whether she were speaking to human or machine then I daresay that a few directed questions could have very quickly established this. In fact I would suggest that if she had been made aware that the call may have come in from either a human or a machine then she would have very likely surmised that the caller was not human. And that's ok. It doesn't matter. And it doesn't matter because the parameters of the conversation were reasonably defined and the outcome was the important thing e.g. the confirmation of haircut appointment.
So whilst this isn't evidence of machine intelligence of the type expected to pass the Turing test, it is still impressive in its application. And it opens up some interesting questions. Firstly for me is whether it is morally acceptable that a machine can pass itself off as human to an unsuspecting call receiver. Is that not a little exploitative? I might feel a slight degree of discomfort at the prospect of being 'duped' into believing I was interacting with a human rather than a machine. More so if I attempted to initiate a degree of small talk! Might it be incumbent upon the machine to announce themselves as such at the initiation of the call? Ultimately then it wouldn't matter much in the realm of the functions that google is demonstrating i.e. the booking of a service. Both parties would typically just be pleased to conduct the business in a smooth transaction. Obviously on this progression you foresee a time where communication would be between machines anyway, in which case a traditional telephone conversation would not be required.
But if we accept that machine <--> human conversation is acceptable (and it is already out there and has been for some time. For example in the form of, to give the article the Market Research slant, automated IVR telephone surveys) then it has made me consider the question this blog post set out to answer at the beginning. Is the death of the call centre on the horizon?
Well sorry to disappoint but obviously I don't know the answer to that. My most likely guess would be not in the near term. What the Google demo does do is show a potential future application for the technology. In instances where the interaction is defined within certain expected parameters then it is wholly possible that machines could conduct CATI interviews in a natural interview environment. At that point the Computer Assisted part of that acronym really does take on a whole different level of meaning! The technology could be wholly suited to this type of interaction, and you could consider that might be a similar case for much call centre usage. It's unclear how ready the technology google has demonstrated is at this point in time and clearly they'll control what they want us to see. The successes and not the mis-steps. But it demonstrates a future that won't be too far off and the applications of the technology will spread.
The other consideration is how this technology propagates. What other players are involved that can push similar service? Amazon surely must be in the mix. Apples Siri service has lagged behind Google and Amazon (in part because being primarily hardware focused they do not seek as heavily to monetize their users on their data input which likely slows down their machine learning ability). Who else is in this position or will be able to compete in this way in the future and how will that capability make its way to business use cases of the type we're considering here?
That's the unknown element right now. All that we do know is that the technology is seemingly getting to a place where this is all possible in theory. I don't foresee call centre closures in the near time off the back of this one controlled demonstration but it's another interesting future consideration for the market research industry.