Commit Graph

4341 Commits

Author SHA1 Message Date
Matthew Honnibal
26446aa728 Avoid loading all French exceptions on import
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00
ines
7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines
0e2e331b58 Convert exceptions to Python list 2017-02-24 18:22:40 +01:00
ines
51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
ines
056b2466e3 Add book and tutorial 2017-02-24 17:39:27 +01:00
ines
52aebfa06f Fix path in gitignore 2017-02-24 17:39:02 +01:00
ines
67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines
30ce2a6793 Exclude "shed" and "Shed" from tokenizer exceptions (see #847) 2017-02-18 14:10:44 +01:00
Ines Montani
9c04d97e22 Update CONTRIBUTING.md 2017-02-18 12:47:41 +01:00
Ines Montani
a3a3796ecd Update CONTRIBUTING.md 2017-02-18 12:42:35 +01:00
Ines Montani
936de72ffc Update CONTRIBUTING.md 2017-02-18 12:42:11 +01:00
Matthew Honnibal
f028f8ad28 Remove unfinished examples 2017-02-18 11:04:41 +01:00
Matthew Honnibal
c031c677cc Remove unused model_dir option
As noted in #845, the `model_dir` argument was not being used. I've removed it for now, although it would be good to have this option restored and working.
2017-02-18 10:38:22 +01:00
Ines Montani
724e51ed47 Update CONTRIBUTING.md 2017-02-17 14:07:48 +01:00
Ines Montani
de997c1a33 Merge pull request #842 from magnusburton/master
Added regular verb rules for Swedish
2017-02-17 11:18:20 +01:00
Magnus Burton
41fcfd06b8 Added regular verb rules for Swedish 2017-02-17 10:04:04 +01:00
ines
aa92d4e9b5 Fix unicode regex for Python 2 (see #834) 2017-02-16 23:49:54 +01:00
ines
44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
Ines Montani
49a102aff3 Merge pull request #841 from jondoughty/patch-1
Updated Token class documentation
2017-02-16 23:47:51 +01:00
Ines Montani
e7d89001bc Merge pull request #839 from nycmonkey/patch-1
Fix typo in IOB integer to letter map
2017-02-16 23:47:30 +01:00
ines
3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines
85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
Matthew Honnibal
2f82d68430 Disable sdist setting for now while investigate server problem. 2017-02-16 23:12:22 +01:00
Matthew Honnibal
49cf28e4c6 Fix Travis.yml 2017-02-16 23:04:41 +01:00
Jon Doughty
12a8757343 Update token.jade 2017-02-16 10:55:33 -08:00
nycmonkey
8946a2a496 Fix typo in IOB integer to letter map
ent_iob value for an ent.iob_ value of 'B' should be 3, not B
2017-02-16 13:49:57 -05:00
Matthew Honnibal
c744ce4b6d Fix bad change to cythonize.py script, re subprocess call 2017-02-16 19:01:25 +01:00
ines
ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Matthew Honnibal
0836cbe064 Pass shell to cythonize.py. See Issue #791 2017-02-17 01:06:06 +11:00
Matthew Honnibal
071d11cb35 Pass environment to Cythonize script. Closes #791 2017-02-17 01:04:16 +11:00
Ines Montani
7d8c9eee7f Merge pull request #836 from raphael0202/load_vectors (closes #834)
load_vectors should accept arbitrary space characters as word tokens
2017-02-16 14:52:40 +01:00
Raphaël Bournhonesque
06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque
3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
ines
f6b69babcc Fix years in footer 2017-02-16 12:14:35 +01:00
Ines Montani
4e673bfeea Merge pull request #833 from vaulttech/master (resolves #832)
Fixes example 3 of entity recognition (see issue #832)
2017-02-16 12:13:48 +01:00
Raphaël Bournhonesque
e17dc2db75 Remove useless import 2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque
3fd2742649 load_vectors should accept arbitrary space characters as word tokens
Fix bug  #834
2017-02-16 12:08:30 +01:00
John Gamboa
e31894b800 Fixes example 3 of entity recognition (see issue #832) 2017-02-16 11:19:53 +01:00
Ines Montani
813989940e Merge pull request #821 from knub/patch-1
Fix error in pipeline loading documentation
2017-02-10 17:24:44 +01:00
ines
f08e180a47 Make groups non-capturing
Prevents hitting the 100 named groups limit in Python
2017-02-10 13:35:02 +01:00
ines
fa3b8512da Use consistent imports and exports
Bundle everything in language_data to keep it consistent with other
languages and make TOKENIZER_EXCEPTIONS importable from there.
2017-02-10 13:34:09 +01:00
ines
21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
Stefan Bunk
2bf19d4735 Fix error in pipeline loading documentation
The cell for the `vocab` parameter is not displayed, making it seem as if the explanation belongs to the previous param.
2017-02-10 12:06:55 +01:00
ines
f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Ines Montani
b95afdf39c Merge pull request #818 from raphael0202/tokenizer_exceptions
Add tokenizer exceptions for French
2017-02-09 16:41:21 +01:00
Raphaël Bournhonesque
309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque
4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
Raphaël Bournhonesque
5d706ab95d Merge tokenizer exceptions from PR #802 2017-02-09 16:30:28 +01:00
Ines Montani
b0ccf32378 Update CONTRIBUTING.md 2017-02-09 16:27:31 +01:00
ines
1b8719bf9a Adjust formatting and increment version 2017-02-08 21:33:22 +01:00