Natural Language Processing (Engels)

Hieronder staat een stukje uitleg over wat de termen betekenen (NLP, NLU, NLG). Al deze termen zijn belangrijk bij een chatbot en worden ook gebruikt. Deze uitleg heb ik gepost zo dat de termen die gebruikt worden beter te begrijpen zijn. Het stuk komt uit de FAQ van Aspect.

Natural Language Processing (NLP) is a discipline in the field of Artificial Intelligence. It covers all aspects of handling “natural languages” – such as English or Mandarin – with a computer, vs. artificial languages such as programming languages (Java, C, …)

Natural Language Understanding (NLU) is a sub discipline of NLP, with the focus on programming a computer to interpret natural language so it can act upon it (e.g. to then translate it to another language, or execute commands, or simply converse with the user). The opposite is Natural Language Generation (NLG), which deals with programming a computer to generate language, e.g. when giving answers to a user, or creating texts automatically based on structured information (think software that writes news articles based on a database of information).

NLU and NLG (or some form of response generation) are needed to build fully conversational chatbot applications, as the user expectation sometimes is as high as assuming they can converse with the computer the same way they can converse with another human being. Some therefore argue that “chatbot” is a misleading term for many of today’s implementations that rely on buttons to advance the dialog.

In order to properly answer or comment on a statement or a question by the user, the intent of a message, all entities mentioned (such as things, people, situations), and optionally the sentiment of the message need to be extracted and analyzed. Various techniques can be applied to achieve NLU, such as

A neural net trained on hundreds of thousands of similar messages that are manually tagged to teach the computer their meaning
a strictly rules-based approach that uses linguistic knowledge such as part-of-speech and other lexical and syntactical information about words and phrases of a language
a hybrid solution
Keyword or simply word spotting is a simplified version of the rules-based approach where the computer simply checks if key words that carry meaning can be found in a message, and if so, classifies the entire message accordingly. While this approach can create first results very quickly, it can also produce misclassifications easily. As an example, consider the linguistic nuance that makes up the difference between “Can I read an electronic book on my flight?” Vs “Can I book a flight?”, which each contain key words, but require completely different answers. Only by knowing that “flight” is a direct object of the verb “book” can you tell this message apart from the first message, where “book” is used as a noun and is thus in a different relationship with “flight.”