Word tagger, noun, adjective, and so on


Basically what I am needing is something that can 'tag' words.


If you take a look at http://markwatson.com/commercial/fasttag/README.txt you should understand what I mean by that, basically I want to do that, but not in java, I want add it into an app of my own. Please do not say I can use that also, because... well it doesn't work for some reason.


The problem I am having is how do I lookup each word. I could load the dictionary into RAM but I have no idea how big the dictionary will get, I could have all the words and their possible tags put into a mysql database, but that would be many queries per sentence, and there will be a lot of sentences. I could... no I'm just going to say there's no way I am scanning through a flatfile for each word.


How would you guys do it? If I have missed a possible option please feel free to share.


For folks wondering what I am wanting this for, tis for a quite basic AI, to get it to learn how to construct proper sentences based on what other people say, and then eventually construct sentences of its own.

Use a combination of a hashtable and database lookup - look in the hashtable, if what you're looking for doesn't exist, look in the database. To stop the hashtable getting too big, either set a maximum size (i.e. number of elements) or have a time to live, or both. For example, when retrieving a value, you set a "last used" time equal to "Now". If you hit the maximum size (or you do a garbage collection and expire items), loop through each element and remove the oldest (or ones that are older than the expiry).


What language are you writing this in?

I like the way you think


As I only know python I'll be using that, so many people dislike it sad.gif


Thanks, also

