Text Analytics and The Human Touch

text analysis

“Taking a human out of the driver’s seat requires putting a lot of humans to work behind the scenes.”

I read that phrase last week and it got me thinking that self-drive cars and text analytics have something fundamental in common.

The comment came from a fascinating article in Automotive News regarding self-driving vehicles and examining the need for human beings to train the machines. In the same way that the self-drive software needs humans to train it initially for what it should recognise as a person, a sign-post, roadworks, etc,  so text analytics also needs humans to train a lexicon so it can interpret the context of verbatim comments accurately.

The article cited: “The artificial intelligence in computers that operate self- driving vehicles is developed using vast amounts of data collected from public road tests. But to be useful, the data must be extensively labeled — a process known as annotation that can require hundreds of man-hours for a single hour of data collected.”

The same is true of text analytics. A machine needs to be taught what phrases mean and only a human can do this. Phrases are labelled (or ‘coded’ as we call it) for it to be useful.  Once a phrase has been coded once, the machine can correctly code it in the future. This is labour intensive but necessary to produce accurate results.

Many consider text analytics as a wholly automated process and for some providers, it is. However, Feedback Ferret’s text analysis engine is trained and updated on a daily basis by humans which makes it world leading in its level of accuracy.

The article in Automotive News quotes Sameep Tandon, CEO of self-driving start-up, Drive.ai in California who says: “We’re very likely going to need some form of data annotation in the long term. New situations will come up in the future that cars today would not regularly see.”

New situations will arise for self-driving cars as will new language for text analytics. Our Lexicon Team monitors the quality and accuracy of all automated text analysis results and carries out continuous updates to the text coding. After each update, all historical data is reprocessed. This ensures that every customer comment in the database is updated to the latest version.

To ensure verbatim comments are correctly coded and categorised for text analytics, some companies do it the old-fashioned way — by hand. This method is very labour intensive, costly and time consuming. Other companies use automated Natural Language Processing methods. Whilst these are fast and cost effective, in side by side comparisons, NLP methods do not always produce consistently accurate results.

We prefer a hybrid method, automatically processing the data but keeping humans in the loop to train and update our Lexicon, enabling it to learn and improve all the time. As a result, Feedback Ferret can achieve high accuracy across the total set of Reporting Topics.

The holy grail is of course when machines can learn and adapt to new language and situations that enable humans to be cut out of the loop. But when that will be is anyone’s guess.