Ines Montani
|
de997c1a33
|
Merge pull request #842 from magnusburton/master
Added regular verb rules for Swedish
|
2017-02-17 11:18:20 +01:00 |
|
Magnus Burton
|
41fcfd06b8
|
Added regular verb rules for Swedish
|
2017-02-17 10:04:04 +01:00 |
|
ines
|
aa92d4e9b5
|
Fix unicode regex for Python 2 (see #834)
|
2017-02-16 23:49:54 +01:00 |
|
ines
|
44de3c7642
|
Reformat test and use text_file fixture
|
2017-02-16 23:49:19 +01:00 |
|
ines
|
3dd22e9c88
|
Mark vectors test as xfail (temporary)
|
2017-02-16 23:28:51 +01:00 |
|
ines
|
85d249d451
|
Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660 .
|
2017-02-16 23:26:25 +01:00 |
|
ines
|
ea05f78660
|
Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f , reversing
changes made to f6b69babcc .
|
2017-02-16 15:27:12 +01:00 |
|
Raphaël Bournhonesque
|
06a71d22df
|
Fix test failure by using unicode literals
|
2017-02-16 14:48:00 +01:00 |
|
Raphaël Bournhonesque
|
3ba109622c
|
Add regression test with non ' ' space character as token
|
2017-02-16 12:23:27 +01:00 |
|
Raphaël Bournhonesque
|
e17dc2db75
|
Remove useless import
|
2017-02-16 12:10:24 +01:00 |
|
Raphaël Bournhonesque
|
3fd2742649
|
load_vectors should accept arbitrary space characters as word tokens
Fix bug #834
|
2017-02-16 12:08:30 +01:00 |
|
ines
|
f08e180a47
|
Make groups non-capturing
Prevents hitting the 100 named groups limit in Python
|
2017-02-10 13:35:02 +01:00 |
|
ines
|
fa3b8512da
|
Use consistent imports and exports
Bundle everything in language_data to keep it consistent with other
languages and make TOKENIZER_EXCEPTIONS importable from there.
|
2017-02-10 13:34:09 +01:00 |
|
ines
|
21f09d10d7
|
Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322 .
|
2017-02-10 13:17:05 +01:00 |
|
ines
|
f02a2f9322
|
Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c , reversing
changes made to b0ccf32378 .
|
2017-02-09 17:07:21 +01:00 |
|
Raphaël Bournhonesque
|
309da78bf0
|
Merge branch 'master' into tokenizer_exceptions
|
2017-02-09 16:32:12 +01:00 |
|
Raphaël Bournhonesque
|
4ce0bbc6b6
|
Update unit tests
|
2017-02-09 16:30:43 +01:00 |
|
Raphaël Bournhonesque
|
5d706ab95d
|
Merge tokenizer exceptions from PR #802
|
2017-02-09 16:30:28 +01:00 |
|
ines
|
654fe447b1
|
Add Swedish tokenizer tests (see #807)
|
2017-02-05 11:47:07 +01:00 |
|
ines
|
6715615d55
|
Add missing EXC variable and combine tokenizer exceptions
|
2017-02-05 11:42:52 +01:00 |
|
Ines Montani
|
30a52d576b
|
Merge pull request #807 from magnusburton/master
Added swedish lemma rules and more verb contractions
|
2017-02-05 11:34:19 +01:00 |
|
Magnus Burton
|
19c0ce745a
|
Added swedish lemma rules
|
2017-02-04 17:53:32 +01:00 |
|
Michael Wallin
|
d25556bf80
|
[issue 805] Fix issue
|
2017-02-04 16:22:21 +02:00 |
|
Michael Wallin
|
35100c8bdd
|
[issue 805] Add regression test and the required fixture
|
2017-02-04 16:21:34 +02:00 |
|
ines
|
0ab353b0ca
|
Add line breaks to Finnish stop words for better readability
|
2017-02-04 13:40:25 +01:00 |
|
Michael Wallin
|
1a1952afa5
|
[finnish] Add initial tests for tokenizer
|
2017-02-04 13:54:10 +02:00 |
|
Michael Wallin
|
f9bb25d1cf
|
[finnish] Reformat and correct stop words
|
2017-02-04 13:54:10 +02:00 |
|
Michael Wallin
|
73f66ec570
|
Add preliminary support for Finnish
|
2017-02-04 13:54:10 +02:00 |
|
Ines Montani
|
65d6202107
|
Merge pull request #802 from Tpt/fr-tokenizer
Adds more French tokenizer exceptions
|
2017-02-03 10:52:20 +01:00 |
|
Tpt
|
75a74857bb
|
Adds more French tokenizer exceptions
|
2017-02-03 13:45:18 +04:00 |
|
Ines Montani
|
afc6365388
|
Update regression test for #801 to match current expected behaviour
|
2017-02-02 16:23:05 +01:00 |
|
Ines Montani
|
012f4820cb
|
Keep infixes of punctuation + hyphens as one token (see #801)
|
2017-02-02 16:22:40 +01:00 |
|
Ines Montani
|
1219a5f513
|
Add = to tokenizer prefixes
|
2017-02-02 16:21:11 +01:00 |
|
Ines Montani
|
ff04748eb6
|
Add missing emoticon
|
2017-02-02 16:21:00 +01:00 |
|
Ines Montani
|
13a4ab37e0
|
Add regression test for #801
|
2017-02-02 15:33:52 +01:00 |
|
Raphaël Bournhonesque
|
85f951ca99
|
Add tokenizer exceptions for French
|
2017-02-02 08:36:16 +01:00 |
|
Matvey Ezhov
|
32a22291bc
|
Small Doc.count_by documentation update
Current example doesn't work
|
2017-01-31 19:18:45 +03:00 |
|
Ines Montani
|
e4875834fe
|
Fix formatting
|
2017-01-31 15:19:33 +01:00 |
|
Ines Montani
|
c304834e45
|
Add missing import
|
2017-01-31 15:18:30 +01:00 |
|
Ines Montani
|
e6465b9ca3
|
Parametrize test cases and mark as xfail
|
2017-01-31 15:14:42 +01:00 |
|
latkins
|
e4c84321a5
|
Added regression test for Issue #792.
|
2017-01-31 13:47:42 +00:00 |
|
Matthew Honnibal
|
6c665b81df
|
Fix redundant == TAG in from_array conditional
|
2017-01-31 00:46:21 +11:00 |
|
Ines Montani
|
19501f3340
|
Add regression test for #775
|
2017-01-25 13:16:52 +01:00 |
|
Ines Montani
|
209c37bbcf
|
Exclude "shell" and "Shell" from English tokenizer exceptions (resolves #775)
|
2017-01-25 13:15:02 +01:00 |
|
Raphaël Bournhonesque
|
1be9c0e724
|
Add fr tokenization unit tests
|
2017-01-24 10:57:37 +01:00 |
|
Raphaël Bournhonesque
|
1faaf698ca
|
Add infixes and abbreviation exceptions (fr)
|
2017-01-24 10:57:37 +01:00 |
|
Raphaël Bournhonesque
|
cf8474401b
|
Remove unused import statement
|
2017-01-24 10:57:37 +01:00 |
|
Raphaël Bournhonesque
|
902f136f18
|
Add support for elision in French
|
2017-01-24 10:57:37 +01:00 |
|
Ines Montani
|
55c9c62abc
|
Use relative import
|
2017-01-23 21:27:49 +01:00 |
|
Ines Montani
|
0967eb07be
|
Add regression test for #768
|
2017-01-23 21:25:46 +01:00 |
|