ines
|
417d45f5d0
|
Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
|
2017-10-11 02:24:58 +02:00 |
|
ines
|
0c2343d73a
|
Tidy up language data
|
2017-10-11 02:22:49 +02:00 |
|
Ines Montani
|
112c5787eb
|
Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
|
2017-06-04 22:37:51 +02:00 |
|
ines
|
4c643d74c5
|
Add norm exceptions to other Language classes
|
2017-06-03 22:29:21 +02:00 |
|
Gyorgy Orosz
|
f0c3b09242
|
More robust Hungarian tokenizer.
|
2017-05-31 22:28:40 +02:00 |
|
Gyorgy Orosz
|
8c0b4b850e
|
Fixed emoji handling for Hungarian
|
2017-05-30 21:34:46 +02:00 |
|
ines
|
924e8506de
|
Move Defaults subclass to module scope (necessary for pickling)
|
2017-05-20 19:02:27 +02:00 |
|
ines
|
9f0fd5963f
|
Reorganise Hungarian punctuation rules
|
2017-05-09 00:01:59 +02:00 |
|
ines
|
a91278cb32
|
Rename _URL_PATTERN to URL_PATTERN
|
2017-05-09 00:00:00 +02:00 |
|
ines
|
73b577cb01
|
Fix relative imports
|
2017-05-08 22:29:04 +02:00 |
|
ines
|
ae99990f63
|
Fix formatting
|
2017-05-08 22:23:48 +02:00 |
|
ines
|
f46ffe3e89
|
Move language data to /lang module
|
2017-05-08 20:00:40 +02:00 |
|