A Virtual Robotic Agent that learns Natural Language Commands
J. Kontos, I. Malagardi and M. Bouligaraki
5th European Systems Science Congress. October.
Heraklion Crete. Res Systemica Volume 2 Special Issue. October 2002. ((http:/www.afscet.asso.fr/resSystemica/accueil.html).
(abridged)
1. Introduction
The present paper presents the design and implementation of a motion command understanding system with a learning interface for the communication with its user. The system is related to learning human-robot system as reviewed in [3].This work is part of our NLP project [1],[2]. The system described here accepts Greek and English as the natural language of communication of the user with the system and the execution of motion commands expressed in natural language. The system could be applied to the communication between a user and an artificial agent, which exists in a virtual environment and accepts commands and knowledge about the objects and the actions possible in this environment. The commands are phrased in Greek or English natural language and they express three kinds of actions. The first kind of action is change of position e.g. the movement of an object, the second kind is change of state e.g. the opening or closing of some objects and the third kind is the change of a relation between objects e.g. to placement of an object on top or inside another object. When the system is given a command like: “open the door”, “open the bottle”, “close the box”, “put the book on the desk” etc. which specifies a task, the system has “to understand” the command before attempting to perform the task. Understanding such commands requires understanding the meaning of actions such as “open”, “close”, “put” and the meaning of prepositional words such as “on”. The meanings of the constituents must be combined and the meaning of the sentence as a whole must be obtained taking into consideration knowledge of the environment or “microcosm” where the commands must be performed. The execution of a command by the agent may initially be wrong. The interaction of the agent with the human user results in learning by the agent the correct way of executing a given command. The main contribution of the present paper is based on the ability of the system implemented to learn from its user to understand and execute correctly motion commands that go beyond its initial capabilities. This learning takes place in cases when the system faces the problem of unknown words, of unknown senses of words or underspecified positions of objects. The system was implemented with Prolog which has some simple facilities for computer graphics. Using these facilities the system displays a room with a door, some furniture and some manipulable objects. These are objects such as a table, a door, a desk, a bottle, a box and a book. The bottle, the box and the book are examples of manipulable objects. It is supposed that there is an invisible agent in the room, who can move around and execute the user’s motion commands. These commands may refer directly or indirectly to the movement of specific objects or the change of their state. The agent knows the names of these objects and their position in the room displayed on the screen. The agent also knows how to execute some basic commands.
2. Motion Verbs
Motion may be specified by a verb either directly or indirectly. These verbs we call “motion verbs” and see [5] for a different approach. The simplest way to specify motion of an object is by using a verb that specifies motion directly. An example verb is “move” as used in the sentence “move the box”. This sentence implies that a motion must be executed by the system with the box as the affected object. Indirect specification of motion can be done in two ways: either in terms of geometric goals, or in terms of a force. Indirect specification of motion in terms of a goal involving physical relationship among objects is quite common. Consider the command “put the bottle on the table”. This command requires that a physical object be moved i.e., “the bottle” with a goal to establish the physical relationship of “on” between it and another physical object i.e., “the table”. Performance of such an instruction demonstrates that the goal of establishing a physical relationship drives the motion of the first object. For verbs such as “put” that specify motion in terms of a geometric goal, properties of the objects that participate in the underlying action are of crucial importance. Indirect specification of motion in terms of a force uses verbs such as “push” and “pull”. Objects affected by motion commands may be also specified either directly or indirectly. Direct specification is based on names of objects known to the system such as box, table, etc. Indirect specifications can be accomplished using complex noun phrases such as “the book on the desk”. In [6] the representation of the meaning of motion verbs was addressed. Their ideas have been implemented as a component of a system that accepts natural language commands as input and produces graphical animation as output. They used a fixed lexicon which they represented manually using their representation method. They state that their long -term goal is to investigate how semantic information for motion verbs can be automatically derived from machine readable dictionaries. They also state that at present their system has no feedback from the graphical animation system to the semantic processor. Finally their system has no learning capabilities. Our system exhibits some novel features for treating motion verbs i.e. the creation of its lexicon is accomplished automatically using a machine readable dictionary, learning of the correct interpretation of commands with more than one meaning is accomplished using machine learning by supervision that are techniques based on visual feedback. One source of the multiplicity of meaning of a command is the multiplicity of the senses of a word as recorded in a machine-readable dictionary. Another source is the possibility of an object to be placed on a surface in different ways. When the user submits a command, the agent, in order to satisfy the constraints of the verb’s meaning, may ask for new information and knowledge about objects and verbs, which may be used in the future. A machine-readable dictionary with possibly ambiguous entries is used, which provides the analysis of complex verbs into basic ones. In particular, in the case of Greek about 600 motion verbs were analysed automatically in terms of about 50 basic verbs. Finally every time a command is executed which is amenable to more than one interpretation the system allows the user to observe the graphical output and state its approval or disapproval which helps the system to learn by supervision.
3. System Architecture and Operation
The system is composed of a number of modules each one performing a different task.
These modules are:
Machine readable dictionary processor
Lexical processor Syntactic processor
Semantic processor Basic motion command processor
Graphics processor
Learning module
The operation of these modules is supported by a number of databases. These are:
Machine readable dictionary
Basic Lexicon
Stems Base
Objects Attributes Base
Knowledge Base
The user enters his commands in natural language, with the keyboard. The commands are imperative sentences with one motion verb which declares the action and some other words like nouns, prepositions or adverbs complementing the main motion verbs. Prior to the syntactic and semantic analysis of the sentence the system checks if each word of the sentence belongs to its lexicon. Stemming is used at this stage because of the complex morphology of the Greek words. When the command contains a word unknown to the system then the system produces a message to the user asking for information about the unknown word and terminates the processing of the present command. After having recognized all the words in a command, the system performs the syntactic analysis of it. If the input sentence is syntactically correct, the system recognizes the motion verb, the object or objects and the adverb or preposition related to the verb. After this the module for “processing of motion commands” tries to satisfy all the constrains and the conditions for the specified motion. This processing requires searching in the knowledge base from where the system retrieves information about the object’s properties (e.g. position, weight, size, state, subparts etc.). At this point, when some information is unavailable or ambiguous, the system interacts with the user in order to acquire the missing knowledge. There are two different types of questions that the system asks. The first type includes questions for which there is no information in the knowledge base and the user must supply it. The second type refers to questions which demand a Yes or No answer. This happens when more than one interpretation of an input command is possible and the system cannot decide which is the correct one. In these cases, the system, using the machine learning mechanism, suggests each time one of the different solutions and requests an answer from the user. The Yes or No answers generate appropriate entries in the knowledge base and can be used next time a similar command is submitted by the user without requesting any more information. This process is based on the “learning by taking advice” technique of machine learning. In the following section some examples of operation of the system implemented will be presented. Prolog predicates that retrieve each object’s position (i.e. it’s coordinates) from the object database have been implemented. Each predicate subsequently calculates the coordinates of the specified points that constitute the shape of the object’s design. A more general predicate redesigns all the objects after the processing of the user’s command. All the knowledge and the current state of each object can be saved in external files, which are available for future use through the menu options of the interface.
4. Examples of Operation of the System
Suppose that the user enters the command “open the door”. The system isolates the words of the command and recognizes the verb “open” and the noun phrase “the door”. The verb “open” appears in the lexicon with a number of different definitions. E.g. in the LDOCE [4] we find among others the senses of “open” a: to cause to become open, b: to make a passage by removing the things that are blocking it. The Greek dictionary we used contains similar sets of senses for this verb and the sense selection mechanism is practically the same for the two languages. The only difference is the wording of the sense selection rules for the two languages where the objects and their properties have different names. The system selects the sense “b” because it knows that a door blocks a passage. The next decision the system has to take concerns the way the opening action is executed. The system finds in the knowledge base that there are two alternative ways of interpreting the verb “open”, using either a “push” or a “pull” basic motion. Then, it selects the first one and asks the user if this is the right one. If the answer is “No”, the system selects the next available interpretation and prompts again the user for an answer. When the answer is “Yes”, a fact is recorded in the knowledge base which denotes that for the verb “open” and the object “door” the appropriate motion is e.g. “pull” in case that the “Yes” answer was given for the “pull” interpretation. The second example refers to the movement of a book that exists in the microcosm of the system. When the command “put the book on the desk” is given, the system searches the knowledge database to find a side of the book that can be used as a base for it. The book has 6 sides and when the system selects one of them it presents graphically the book on the desk having this side as base. Then, it asks the user if the result of the motion is the correct one. When the user enters a “Yes” answer, this is recorded in the knowledge base and the process terminates. When the user enters a “No” answer, the process continues trying sequentially all the available sides of the book until a “Yes” answer is given by the user. The graphical user interface which was implemented, was very helpful during the development. It was easier to see the result on the screen, graphically rather than reading lists of the knowledge database to find the changes that were recorded during the program execution and the machine learning process. 5. The Implementation of the ID3 Algorithm with Logic Programming A program for the implementation of ID3 Algorithm was used for the learnig applications. Indicatively we are referring to some sections of the program which corresponds to the main parts of the ID3 Algorithm. The problem is to determine a decision tree that on the basis of answers to questions about the non-category attributes predicts correctly the value of the category attribute. Usually the category attribute takes only the values {true, false}, or {success, failure}, or {Yes, No}, or something equivalent. In any case, one of its values will mean failure.
The computation of the entropy has been implemented with the following rule:
compute_set_entropy(Data,Entropy):- count_positive(Data,num),length(Data,Dnum), Pp=Pnum/Dnum, Pn=1-Pp, xlogx(Pp,PpLogPp), xlogx(Pn, PnLogPn), Temp=PpLogPp+PnLogPn, Entropy = -Temp.
Were Data is the input file and Entropy is the value of the entropy for the data. The predicate “count_positive” indicates the number of the examples βρίσκει τον αριθμό των παραδειγμάτων στα δεδομένα που ανήκουν στην κατηγορία που πρέπει να αναγνωρίζεται μετά την μάθηση με τους κανόνες:
count_positive([],0). count_positive([dat("P",_)|More],Pnum):-!,count_positive(More,Pnum1),Pnum=Pnum1+1. count_positive([dat("N",_)|More],Pnum):-count_positive(More, Pnum).
The predicate 'length' βρίσκει το συνολικό αριθμό των παραδειγμάτων στα δεδομένα με τους κανόνες: length([],0). length([Dat|Moredat], Dnum):- !, length(Moredat,Dnum1), Dnum=Dnum1+1.
Το γινόμενο 'xlogx' υπολογίζεται με τους κανόνες: xlogx(X,N):- X=0.0E+00, !, N=0. xlogx(X,N):- N=X*log(X).
The predicate select_minimal_entropy δέχεται μία λίστα από τριάδες της μορφής: (attribute, partition-induced-by-that-attribute, resulting-entropy) και βρίσκει την ιδιότητα (attribute) που δίδει τον διαχωρισμό με την μικρότερη εντροπία και αυτό τούτο τον διαχωρισμό δεδομένης της δομής “c=c(attr,partiton,entropy)” με τους κανόνες: select_minimal_entropy([c(Attr,Partition,Entropy)|MorePartitions],BestAttr,BestPartition):- select_minimal_entropy_aux(MorePartitions,c(Attr,Partition,Entropy),BestAttr,BestPartition). select_minimal_entropy_aux([],c(Attr,Partition,_),Attr,Partition). select_minimal_entropy_aux([c(Attr1,Partition1,Entropy1)| MorePartitions], c(_,_,Entropy),BestAttr,BestPartition):- Entropy1
References
Kontos, J., Malagardi, I., and Trikkalidis, D. (1998). NaturalLanguage Interface to an Agent. EURISCON ’98 ThirdEuropean Robotics, Intelligent Systems & ControlConference Athens. Published in Conference Procedings “Advances in Intelligent Systems:Concepts, Tools and Applications” (Kluwer)
Kontos, J., Malagardi, I. (1998). Question Answering andInformation Extraction from Texts. EURISCON ’98Third European Robotics, Intelligent Systems &Control Conference. Athens. Published in ConferenceProcedings “Advances in Intelligent Systems:Concepts, Tools and Applications” (Kluwer). ch. 11,pp. 121-130.
Klingspor, V., Demiris, J., Kaiser, M. (1997). Human-Robot Communication and Machine Learning. AppliedArtificial Intelligence, 11 pp. 719-746.
LONGMAN DICTIONARY OF CONTEMPORARYENGLISH. (1978). The up-to-date learning dictionary. Editor-in-Chief Paul Procter. Longman group Ltd. UK.
Levin, B., (1992). English Classes and Alternations. APreliminary Investigation. The University of ChicagoPress. Chicago London.
Kalita, J., K. and Lee, J., C. (1997). An Informalanalysis of Motion Verbs based on Physical Primitives.Computational Intelligence, Vol. 13, N.1, pp. 87-125.
Kontos, J. and Malagardi, I. (1999). A Learning Natural Language Interface to an Agent. Proceedings of Workshops of Machine Learning ACCAI 99, Chania. Crete. Hellas.
Malagardi, I. (2001). The Acquisition of World Knowledge and Lexical Combination. HERCMA 2001, 5th Hellenic European Research on Computer Mathematics & its Applications Conference. Athens. Hellas.