Saturday, April 19, 2008

Building a chat bot or talking machine

I am writing this blog for the people who want to venture in the world of Artificial Intelligence.
The scope of this post will be to educate people about building chatter bots, incorporating intelligence into machine, natural language processing, knowledge representation techniques.

I will cover various aspect of artificial intelligence like natural language processing, knowledge representation, machine learning, logic representation, context free grammar, feature based grammar etc by explaining step by step mechanism and algorithms.
The purpose of this article is to provide people path to explore the world of artificial intelligence and chat bots.

In order to build a chat bot or talking machine we need following ingredients.

1. Language vocabulary with categorization of words.
2. Algorithms to parse the input statement.
3. Domain knowledge representation.
4. Logic or fact representation.
5. Mechanism to generate output statement.

Now lets us explore every aspect one by one.
1. Language vocabulary with categorization of words.

Before venturing into natural language processing one must insure that he has a word database with categorization of words like noun, pronoun , verb etc. Wordnet build by Prinston
University is such a data base. A java based application with databasae can be downloaded from
http://sourceforge.net/project/showfiles.php?group_id=131896&package_id=231561&release_id=509765



Once you have a word database with categorization of words you are ready to go ahead.

2. Algorithms to parse the input statement.

Natural language parsing is a process of understanding the input statement. There are few steps involved in natural language processing, these are
1. Morphological analysis: It involves validating input statement against spelling and grammar.
2. Lexical analysis: It involves understanding the category of component words like noun, pronoun, adjective. We can have our own category according to our grammar.
3. Syntactic analysis: It involves determining relationship between components.
4. Semantic analysis: Mapping of components to the domain objects.

In order to do above mentioned 4 steps we need to decide the grammar that we want to use.
The grammar that I have used is combination of context free grammar and feature based grammar.

Context free grammar:
A context free grammar is a collection of productions of the format S -> NP VP, where components NP and VP in turn can have components. Take an example of following grammar which I have designed for question, answer system.

Example Grammar: G1
Ex sentences:
What is KM ? What is your name? How many friends do you have?
Rules:
S--> QP VP
VP --> HVP ObP
ObP --> Ob At
QP --> QW QWP

Where:

QP - Question Phrase like “how much” etc.
HVP - Helping verb phrase like is, am , are etc.
ObP - Object phrase like “is your name”.
Ob – Object phrase.
At – attribute like “name, age, sex”.
QW - Question word like “ what , where etc.
QWP - Quantifier phrase “ many, much” etc.
Det - Determinar or article “a, an, the”.
O – Object ( any object in context).


Using above grammar we can understand statement like “what is your name”, “ who is your father”.

Now let us see Feature based grammar.

Feature based grammar.

In feature based grammar we assign various attributes to the words then we determine their meaning
For example word “those” use alias “w”

w.spelling = “those”
w.number = “plural”
w.pos = preposition. (part of speech).

I will not be describing feature based grammar in great detail since I have used context free grammar for the example in this tutorial.


Now I will explain the algorithm to parse the input statement.

This algorithm is called Left corner parsing. This is hybrid of top down and bottom up parsing.
It is based on assumption that you have a class to represent tree with following methods.
getPeer, getFather, addLeftChild, addRightChild. This algorithm does not do morphological analysis.

It uses the grammar G1.

Terminal element: One who can be broken further.


Algorithm:


While(true){
Make S as current element and root of tree.
// current element in this algorithm means current grammar element like S, VP, ObP
Make first word of input sentence as current word.
While(current element is not a terminal element){
Break current element into components.
Make left component as left child and right component as right child.
Make left child as current element
Continue;
}
If(current element is left child of tree and a terminal element){
If(current word is of the type of current element) {
Make this word as left child of current tree node. Take the current word pointer to next word.
If(current tree node has peer node)
Make peer node as current tree node;
Else
Make father’s peer as current tree node:
}
Else{
If(current tree node has peer node)
Make peer node as current tree node;
Else
Make father’s peer as current tree node:
}
}
If(current element is right child of tree and a terminal element){
If(current word is of the type of current element){
Make this word as left child of current tree node.
Take the current word pointer to next word.
If(current tree node’s father’s peer is it’s father itself)
Terminate algorithm;
Else
Make father’s peer as current tree node:
}
Else{
If(current tree node’s father’s peer is it’s father itself)
Terminate algorithm;
Else
Make father’s peer as current tree node:
}
}
}// end of algorithm

Taking “What is your name” as a input statement and using grammar G1. We get tree as follows.



So far we have done lexical and syntactic analysis. Now we have to do Semantic analysis.
In order to do semantic analysis we need to find way of knowledge representation, which is our next topic.

3. Domain knowledge representation.
There are following ways of knowledge representation.
a. Database
b. Se mantic network.
c. Frames.
I find frames as most natural way of reprehensive knowledge using multi way graph data structure. I will not discuss way to represent knowledge. There are a lot of online resources available for knowledge representation purpose.


4. Fact and logic representation.

In order to present facts and logic we use lambda calculus, first order logic and preposition logic. If you remember Boolean algebra you can understand these terms.
I hope you can read internet for knowledge regarding these topics.
5. Answer generation.
In order to generate answer we will have to use some template based system. For a certain category of input sentence we can have a certain type of answer template .
For example question: “what is your name” we can have answer template as

("Det", "O", "At" , "HVP", "value of asked attribute")

Meaning of the element are there in grammar G1.

Now we can map question “what is your name” to answer “my name is Mukesh”.
Meaning of input sentence can be found out by generating the tree.


1 comment:

Kavi Priya said...

Well written articles like yours renews my faith in today's writers. The article is very informative. Thanks for sharing such beautiful information.
Chatbot Developers in Dubai
Facebook Chatbot in Dubai
AI Chatbot in Dubai
Artificial intelligence Company in Dubai