Tuesday, October 26, 2010

Computer Reads And Understands Text, A.I. component

One of the things I have spent some time thinking about is the way computers communicate with humans. Throughout the history of modern computers, it has been a dream for being able to program a computer in a natural human language.

In the past, I have known of some systems that use xml to simulate a computer that understands grammar and syntax of the english language, and is therefore able to read, understand, and even reply to basic questions, when it knows the answer.

There have also been competitions such as programing a computer in this way, then people try to guess whether they are paired witha human on the other side, or whether it's a fully automated system they are communicating with.


In considering how to do this, I thought of making a "compiler" that treats the english language as a programming language of sorts.

We begin by parsing a sentence, breaking it down to individual words. There will actually be a class definition called "word", and it's data member is a string equal to that word, along with functions and operators that will help determine how words interact with one another.

In order to make a learning engine, we might need a system that has the best of both dynamic aspects as well as hard coded aspects, thus the system begins with most common words already having their definitions. For example, we would have a list of prepositions and code modeling how they modify their objects, another list for adverbs, adjectives, conjunctions, and pronouns, etc.

Together, the functionality of these "word" objects would convey the meaning of a sentence in a form which the computer can "understand," both individually and in it's context within a paragraph, and then the meaning of the paragraph within a chapter, section, or sub-section of a book.


The ultimate goal would be:

1) The ability of the software to identify the most important and relevant information from an article, and sumarize the article in it's own words, without further human input.

2) The ability of the software to read, understand, and combine factual inputs from multiple articles or books by one or more authors, and generate accurate, factual reports, both in it's own words and using quoted references when needed.



This implies "teaching" the machine to a relevant skill level with respect to the content of the articles, and with respect to the language involved.


For example, if it is reading a math or science article, and the derivative is mentioned, it needs to know from context that it is talking about the mathematical construct used to calculate the rate of change, along with it's various other applications and definitions. But it also needs to know that, under the context, the paragraph is mentioning the derivative, and not telling the computer to solve for a derivative. On the other hand, if the paragraph actually is telling the reader to solve for a particular derivative, the computer needs to be able to figure that out too.


The text is the name of the word, and the code would simulate the applications, function, and meaning of the word given it's context.


I intend to do this with PHP due to the flexibility and power of PHP's parsing and data architectures.


Obviously, I am dealing with not only storing text, but obtaining, storing, and understanding, and then applying all knowledge to a generalized problem or a specific problem, such as helping humans find relevant articles, books, or journals on the internet, and quickly summarizing or even explaining them to the reader. It would be sort of like Google, except that it isn't merely indexing or finding connections or common phrases, but actually understanding and applying the knowledge of the articles, books, or journals it reads.

In theory, you could start it off the same way humans are trained from toddler upward, with story books, readers, history, and science texts of each successive grade level. A huge advantage to the machine would be permanent file access to multiple texts, dictionary, and encyclopedia on all topics.


This is an extremely large undertaking, bordering on true A.I. in many respects, but I believe it is something that is doable, as it is a matter of pure logic and data storage.

No comments:

Post a Comment