The number of expressions that reflect the same user intent can be incredibly large, sometimes numbering 100+ phrases. Chatbot developers are not able to predict every single sentence or expression (whether they are long or short). Therefore, it is imperative to check, using automated tests, how the chatbot will handle new language when classifying non-training data.
Suppose we have two categories regarding different insurance questions: life insurance and vehicle insurance. While obviously they are linked as both are types of insurance, the chatbot needs to be trained to understand that though similar, there is a significant difference between the two concepts. Let’s also assume that in these categories we have the following training data:
- Can I get information on life insurance?
- Could I get some information about life insurance?
- Are you able to provide information about life insurance?
- I am looking for information about life insurance, can you help?
- Tell me about life insurance!
- Life insurance info!
- Do you know anything about life insurance?
- Will you tell me something about life insurance?
- Can you pass on information on vehicle insurance?
- I am looking to get information about vehicle insurance.
- Vehicle insurance info please.
- Can I get information about vehicle insurance?
- Explain to me your vehicle insurance!
- Give me info re vehicle insurance!
- Gimme something about vehicle insurance please.
- Tell me something about vehicle insurance.
Each category contains 15 phrases. Tests that can verify the correctness of classification on unpaired data may take the form of:
- Test_1: [Life insurance: Can I get information on life insurance?],
- Test_2: [Life insurance: Could I get some information about life insurance?],
- Test_3: [Life insurance: Are you able to provide information about life insurance?],
- Test_4: [Life insurance: I am looking for information about life insurance, can you help?],
- Test_5: [Life insurance: Tell me about life insurance!],
- Test_6: [Vehicle insurance: Can you pass on information on vehicle insurance?],
- Test_7: [Vehicle insurance: I am looking to get information about vehicle insurance.],
- Test_8: [Vehicle insurance: Vehicle insurance info please.],
- Test_9: [Vehicle insurance: Can I get information about vehicle insurance?],
- Test_10: [Vehicle insurance: Explain to me your vehicle insurance!],
Once testing has finished and results indicate that some phrases have been correctly identified, these phrases should be entered as learning data and new test data should be added. This should be repeated until the chatbot correctly identifies the test data. Essentially, by continuously including new phrases to language the chatbot has already learnt, one can increase the chatbot’s comprehension by re-enforcing notions of synonyms and paraphrasing, so that the chatbot is proficient at understanding a broad range of language.
It is important that the learning data in each category is consistent, that is, consistent with the purpose of the category. For chatbot content writers and testers, this means that phrases need to be on point and accurately correspond to the category they are placed. Moreover, if some of the learning phrases that match the category of life insurance also fall into a different category, it means the chatbot will be unable to grasp the correct meaning and this language will forever confuse the chatbot. Using phrases in more than one category is one of the gravest errors content creators can commit. Such errors should be caught as soon as possible by means of well-constructed reports and careful observation, because it has an incredibly detrimental effect on a chatbot’s training.
Read other articles in the series Technically Speaking:
- Part I Reading Between the Lines: Checking the Accuracy of Chatbot Phrases
- Part II Chatbots: The Cutting-Edge Synergy of Human Genius and Imagination with Technological Precision and Efficiency
- Part III Reflections on Testing Natural Language, Natural Language Processing and Dealing with Language Peculiarities