machine learning - lexical-level similarity word clustering tool -


is there open software toolkit compares lexcial-level similarities among words , group similar words together? example, blue jean, blue jeans, , blue jea (miss-spelled) should grouped together? don't need semantic similarity here.

try natural language toolkit http://nltk.org/

here's rather abstract treatment of brown clustering algorithm http://www.cs.columbia.edu/~cs4705/lectures/brown.pdf

the standard similarity metric between words levenstein distance http://en.wikipedia.org/wiki/damerau%e2%80%93levenshtein_distance


Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -