Commit Graph

12 Commits

Author SHA1 Message Date
ines
417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
Ines Montani
112c5787eb Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines
4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
Gyorgy Orosz
f0c3b09242 More robust Hungarian tokenizer. 2017-05-31 22:28:40 +02:00
Gyorgy Orosz
8c0b4b850e Fixed emoji handling for Hungarian 2017-05-30 21:34:46 +02:00
ines
924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
ines
9f0fd5963f Reorganise Hungarian punctuation rules 2017-05-09 00:01:59 +02:00
ines
a91278cb32 Rename _URL_PATTERN to URL_PATTERN 2017-05-09 00:00:00 +02:00
ines
73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines
ae99990f63 Fix formatting 2017-05-08 22:23:48 +02:00
ines
f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00