Commit Graph

635 Commits

Author SHA1 Message Date
Matthew Honnibal
70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal
3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
ines
42305bc519 Remove unnecessary test 2017-04-23 21:21:41 +02:00
ines
012ea594d1 Add file for misc tests 2017-04-23 21:06:51 +02:00
ines
83f66947dc Rename test_download to test_cli 2017-04-23 21:06:50 +02:00
Matthew Honnibal
874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal
5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal
040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
Ben Eyal
e90e8a3f10 Enable test 2017-04-20 02:25:24 +03:00
ines
2bd89e7ade Tidy up Hebrew tests and test for punctuation (see #995) 2017-04-19 19:28:03 +02:00
ines
13d30b6c01 xfail lemmatizer test that's causing problems (see #546) 2017-04-16 21:18:39 +02:00
ines
0084466a66 Remove unused utf8open util and replace os.path with ensure_path 2017-04-16 20:37:45 +02:00
Matthew Honnibal
1dca7eeb03 Add unicode declaration on new regression test 2017-04-07 18:09:23 +02:00
ines
887827fc6a Merge branch 'develop' 2017-04-07 17:36:23 +02:00
ines
444dd511c5 Fix xpassing URL test case 2017-04-07 17:36:05 +02:00
ines
bf0f15e762 Add / to tokenizer infixes (resolves #891) 2017-04-07 17:30:44 +02:00
ines
00b9011a49 Fix whitespace 2017-04-07 17:29:59 +02:00
Matthew Honnibal
0513c43bf0 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-07 17:07:10 +02:00
Matthew Honnibal
cc36c308f4 Fix noun_chunk rules around coordination
Closes #693.
2017-04-07 17:06:40 +02:00
Matthew Honnibal
ab846256cf Merge pull request #966 from recognai/master
Prepare Spanish language for training models, including configuration, rich-UD tag map and tests
2017-04-07 16:12:29 +02:00
Matthew Honnibal
83dca920d4 Rename test #913 -> #957, comment
Make test for #957 reference correct bug. Add comment.

Previous commit closes #957.
2017-04-07 15:54:25 +02:00
Matthew Honnibal
5887383fc0 Add test for Issue #913: Hang from bad regex 2017-04-07 15:47:27 +02:00
oeg
c693d40791 feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests 2017-04-06 18:48:45 +02:00
Matthew Honnibal
cfff4e0f61 Improve test 2017-03-31 13:59:32 +02:00
Matthew Honnibal
e854f28304 Add test for Issue #758
Issue #758 occurs when no actions are available for a single token
doc after merging.
2017-03-31 13:26:25 +02:00
Matthew Honnibal
0fefdfcbda Merge pull request #935 from ericzhao28/master
Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862)
2017-03-30 02:51:24 +02:00
Eric Zhao
aafdf6ffb8 Add option to use label karg to determine ent_type in doc.merge 2017-03-28 23:35:03 -07:00
Matthew Honnibal
b94286de30 Fix regression test 2017-03-25 22:35:07 +01:00
Matthew Honnibal
4f400fa486 Prevent lemmatization of base nouns
Update lemmatizer's base-form check, for change in morphology class.
Closes #903.
2017-03-25 21:51:12 +01:00
Matthew Honnibal
4454c1b23f Block lemmatization of base-form adjectives
Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes #912.
2017-03-25 21:29:57 +01:00
Ines Montani
97cb4d5e3c Merge branch 'master' into master 2017-03-25 10:03:47 +01:00
Iddo Berger
da135bd823 add hebrew tokenizer 2017-03-24 18:27:44 +03:00
Matthew Honnibal
f40fbc3710 Add test for Issue #910: Resuming entity training 2017-03-23 23:38:57 +01:00
ines
f830213c4c Remove compatibility check test
Will only cause problems when incrementing version and not updating
table. Also depends on external URL, which is bad.
2017-03-20 13:20:26 +01:00
Ines Montani
b6ee241e26 Fix print statements 2017-03-20 11:46:37 +01:00
ines
fe0ff00fe1 Fix spacing 2017-03-19 11:55:37 +01:00
ines
5712da6095 Add regression test for #891 2017-03-19 11:48:01 +01:00
ines
aefb898e37 Add title-case version of morph rules (resolves #686) 2017-03-18 17:27:11 +01:00
ines
64ec17abc1 Pass xpassing tests and add xfails for failures 2017-03-18 17:20:46 +01:00
ines
d0b85faf69 Pass regression test for #401 (resolves #401)
Fixed in new English models.
2017-03-18 17:06:49 +01:00
ines
be9daefbdd Remove actual model downloading from tests 2017-03-18 17:01:10 +01:00
Matthew Honnibal
de0e6385b4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-18 16:17:28 +01:00
Matthew Honnibal
fe442cac53 Fix #717: Set correct lemma for contracted verbs 2017-03-18 16:16:10 +01:00
ines
ad934a9abd Add regression test for #693 2017-03-18 16:12:30 +01:00
ines
f57c616830 Add regression test for #704 and test new model (resolves #704)
(using new English model)
2017-03-18 16:04:14 +01:00
Matthew Honnibal
413138de79 Fix #719: Lemmatizer can no longer output empty string 2017-03-18 16:02:06 +01:00
ines
ab1451f997 Don't mark compatibility test as slow 2017-03-18 15:17:39 +01:00
ines
ec3e810662 Add directory cli and set up command line interface 2017-03-18 15:14:48 +01:00
Matthew Honnibal
6420f86f02 Merge changes to __init__.py 2017-03-17 19:51:45 +01:00
ines
0e533ad0cc Mark compatibility table test as slow (temporary)
Prevent Travis from running test test until models repo is published
2017-03-17 13:11:36 +01:00
Matthew Honnibal
a630726b13 Fix typo in tests 2017-03-16 20:50:36 -05:00
Matthew Honnibal
f98b30583f Fix tests 2017-03-16 19:48:00 -05:00
Matthew Honnibal
db51abf685 Fix tests 2017-03-16 18:53:47 -05:00
Matthew Honnibal
fea9fe08af Merge pull request #866 from juanmirocks/master
Fix lemmatization of OOV words
2017-03-16 23:37:36 +01:00
Matthew Honnibal
28bb546939 Merge pull request #883 from ericzhao28/master
Add `lower_` and `upper_` properties to `Span` class
2017-03-16 23:35:47 +01:00
Matthew Honnibal
8843b84bd1 Merge remote-tracking branch 'origin/develop-downloads' 2017-03-16 12:00:42 -05:00
ines
4cfc8ffbd2 Reformat pickle tests 2017-03-15 17:39:54 +01:00
ines
2a0fcf1354 Add tests for new download module 2017-03-15 17:39:43 +01:00
Matthew Honnibal
4cab8ac136 Update morph exceptions test 2017-03-15 09:31:34 -05:00
ines
42ba740dde Revert "Merge branch 'debug'"
This reverts commit 89b79d1178, reversing
changes made to 02bdf490a1.
2017-03-13 20:11:52 +01:00
ines
4c5f51e49e Update regression test 2017-03-13 15:16:11 +01:00
ines
02bdf490a1 Remove regression test to see if it caused pytest Travis error 2017-03-13 13:00:22 +01:00
ines
17018750ac Add regression test for #717 2017-03-13 12:58:22 +01:00
ines
2883ebfca2 Remove print statement 2017-03-13 12:30:42 +01:00
ines
98c13d8aa9 Add regression test for #401 2017-03-13 12:28:41 +01:00
ines
444d665f9d Add regression test for #686 2017-03-13 12:23:35 +01:00
ines
46b17e5b51 Add regression test for #719 2017-03-13 12:17:35 +01:00
ines
c8ae682ff9 Add regression test for #636 2017-03-13 12:08:31 +01:00
ines
337f9601f2 Add missing unicode declaration 2017-03-13 12:08:19 +01:00
ines
d70386ec6e Update docstring in #886 regression test 2017-03-13 12:00:38 +01:00
ines
51ba3ef0a8 Add regression test for #886 2017-03-13 11:44:58 +01:00
ines
1da29a7146 Use new Lemmatizer data and remove file import
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
ines
c89e30d1a3 Add test for English time exceptions ("1a.m." etc.) 2017-03-12 13:58:22 +01:00
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Em
9c809efc25 Removed mapStr 2017-03-11 16:23:26 -08:00
Matthew Honnibal
ea2592879f Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-11 11:13:37 -06:00
Em
426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
ines
10e29189ac Adjust URL testcases and xfail problems (instead of comment) 2017-03-10 14:22:50 +01:00
Matthew Honnibal
ea53647362 Merge branch 'develop' 2017-03-10 02:49:39 -06:00
Dan Rapp
123d3f2d38 Fix error in test case parameterization 2017-03-09 12:18:21 -07:00
Dan Rapp
b9307dfcd7 Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix 2017-03-09 11:42:14 -07:00
Dan Rapp
3b1df3808d Issue #840 - URL pattenr too broad 2017-03-09 11:39:39 -07:00
Matthew Honnibal
5b0b968d13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 15:03:10 +01:00
Matthew Honnibal
0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines
c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal
16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal
a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal
3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal
5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal
4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal
6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Aniruddha Adhikary
696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
ines
8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela
a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela
a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines
c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Matthew Honnibal
34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal
0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
ines
376c5813a7 Remove print statements from test 2017-02-24 18:26:32 +01:00
ines
7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines
51eb190ef4 Remove print statements from test 2017-02-24 17:41:12 +01:00
Matthew Honnibal
db5ada3995 Merge branch 'master' of https://github.com/explosion/spaCy 2017-02-24 14:28:12 +01:00
Matthew Honnibal
8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
ines
67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines
44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines
3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines
85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines
ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque
06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque
3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
ines
21f09d10d7 Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
This reverts commit f02a2f9322.
2017-02-10 13:17:05 +01:00
ines
f02a2f9322 Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
This reverts commit b95afdf39c, reversing
changes made to b0ccf32378.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque
309da78bf0 Merge branch 'master' into tokenizer_exceptions 2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque
4ce0bbc6b6 Update unit tests 2017-02-09 16:30:43 +01:00
ines
654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
Michael Wallin
35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
Michael Wallin
1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Ines Montani
afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani
13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque
85f951ca99 Add tokenizer exceptions for French 2017-02-02 08:36:16 +01:00
Ines Montani
e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani
c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani
e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins
e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Ines Montani
19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Raphaël Bournhonesque
1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Ines Montani
0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani
5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani
d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Matthew Honnibal
2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Ines Montani
50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani
e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani
116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz
92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz
b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00
Gyorgy Orosz
b03a46792c Better error handling 2017-01-14 22:09:29 +01:00
Ines Montani
332ce2d758 Update README.md 2017-01-14 21:12:11 +01:00
Gyorgy Orosz
9505c6a72b Passing all old tests. 2017-01-14 20:39:21 +01:00
Gyorgy Orosz
63037e79af Fixed hyphen handling in the Hungarian tokenizer. 2017-01-14 16:30:11 +01:00
Gyorgy Orosz
f77c0284d6 Maintaining compatibility with other spacy tokenizers. 2017-01-14 16:19:15 +01:00
Gyorgy Orosz
1be5da1ac6 Fixed Hungarian tokenizer for numbers 2017-01-14 15:51:59 +01:00
Ines Montani
a89e269a5a Fix test formatting and consistency 2017-01-14 13:41:19 +01:00
Ines Montani
3424e3a7e5 Update README.md 2017-01-13 15:54:54 +01:00
Ines Montani
49186b34a1 Mark lemmatizer tests as models since they use installed data 2017-01-13 15:12:07 +01:00
Ines Montani
138deb80a1 Modernise vector tests, use add_vecs_to_vocab and don't depend on models 2017-01-13 15:12:07 +01:00
Ines Montani
96f0caa28a Fix test name for consistency 2017-01-13 15:12:07 +01:00
Ines Montani
dc2bb1259f Add util function to add vectors to vocab 2017-01-13 15:12:07 +01:00
Ines Montani
db9b25663d Reformat add_docs_equal and add docstring 2017-01-13 15:12:07 +01:00
Ines Montani
62ce0a0073 Add README.md to tests to explain organisation and conventions 2017-01-13 15:11:18 +01:00
Ines Montani
38d60f6b90 Modernise serializer I/O tests and don't depend on models where possible 2017-01-13 02:24:56 +01:00