Commit Graph

63 Commits

Author SHA1 Message Date
maxirmx
fe9d2e2c4e Fixing encode issue #2 2015-10-21 15:36:21 +03:00
maxirmx
e4a1726f77 Fixing encoding issue
UTF-8
2015-10-21 14:16:37 +03:00
Matthew Honnibal
5332c0b697 * Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130 2015-10-09 18:54:40 +11:00
Matthew Honnibal
24ed3fc25c * Check file existance before opening in lemmatizer 2015-09-13 10:45:21 +10:00
Matthew Honnibal
631c843ed1 * Don't look for index.adv in le,matizer 2015-09-12 06:03:44 +02:00
Matthew Honnibal
7c660c5efc * Use dict.get in lemmatizer 2015-09-10 14:51:39 +02:00
Matthew Honnibal
64d71f8893 * Fix lemmatizer 2015-09-08 15:38:03 +02:00
Matthew Honnibal
f0a7c99554 * Relax rule-requirement in lemmatizer 2015-08-27 10:26:19 +02:00
Matthew Honnibal
0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal
c5a27d1821 * Move lemmatizer to spacy 2015-08-25 15:47:08 +02:00
Matthew Honnibal
e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal
99bbbb6feb * Work on morphological processing 2014-12-08 21:12:15 +11:00
Matthew Honnibal
7b68f911cf * Add WordNet lemmatizer 2014-12-08 01:39:13 +11:00