Commit Graph

319 Commits

Author SHA1 Message Date
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev
a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
ines
5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines
718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
ines
eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines
f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00
Matthew Honnibal
094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal
908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal
63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
Matthew Honnibal
4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal
391d5ef0d1 Normalize imports in regression test 2017-10-24 14:25:49 +02:00
Matthew Honnibal
b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal
a68d89a4f3 Add failing test for bug #1375 -- no out-of-bounds error for token.nbor() 2017-10-24 12:05:25 +02:00
Matthew Honnibal
490ad3eaf0 Check that empty strings are handled. Closes #1242 2017-10-21 00:52:14 +02:00
Matthew Honnibal
d8391b1c4d Fix #1434: Matcher failed on ending ? if no token 2017-10-20 16:49:36 +02:00
Matthew Honnibal
f111b228e0 Fix re-parsing of previously parsed text
If a Doc object had been previously parsed, it was possible for
invalid parses to be added. There were two problems:

1) The parse was only being partially erased
2) The RightArc action was able to create a 1-cycle.

This patch fixes both errors, and avoids resetting the parse if one is
present. In theory this might allow a better parse to be predicted by
running the parser twice.

Closes #1253.
2017-10-20 16:27:36 +02:00
ines
3516aa0cea Port over changes from #1389 2017-10-14 13:32:55 +02:00
ines
15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal
497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal
5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal
d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00
ines
51d7414e94 Make sure sents are a list 2017-06-05 12:30:13 +02:00
ines
a0f4592f0a Update tests 2017-06-05 02:26:13 +02:00
ines
3e105bcd36 Update tests 2017-06-05 02:09:27 +02:00
Matthew Honnibal
bb98d45a63 Fix tests 2017-06-04 16:00:44 -05:00
Matthew Honnibal
55d0621532 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 15:53:25 -05:00
Matthew Honnibal
5b9f116aca Update tests 2017-06-04 15:53:17 -05:00
ines
8a29308d0b Remove unused imports 2017-06-04 22:39:29 +02:00
ines
96867a24ae Fix typo 2017-06-04 22:36:40 +02:00
ines
20a7003c0d Update model fixtures and reorganise tests 2017-05-29 22:14:31 +02:00
Matthew Honnibal
fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
ines
fb0ff0272f xfail neural parser tests for now and remove test for deprecated method 2017-05-23 12:40:37 +02:00
Matthew Honnibal
5418bcf5d7 Resolve conflict on test 2017-05-23 04:37:16 -05:00
ines
e6acd3bbf2 Fix matcher tests and matcher docs 2017-05-23 11:36:02 +02:00
Matthew Honnibal
3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal
532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal
bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
ines
b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Matthew Honnibal
8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
ines
3c0f85de8e Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
ines
be5541bd16 Fix import and tokenizer exceptions 2017-05-08 16:20:14 +02:00
Matthew Honnibal
24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Matthew Honnibal
c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal
65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00
Matthew Honnibal
70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal
3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
Matthew Honnibal
874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal
5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal
040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
Matthew Honnibal
1dca7eeb03 Add unicode declaration on new regression test 2017-04-07 18:09:23 +02:00
ines
887827fc6a Merge branch 'develop' 2017-04-07 17:36:23 +02:00
ines
bf0f15e762 Add / to tokenizer infixes (resolves #891) 2017-04-07 17:30:44 +02:00
ines
00b9011a49 Fix whitespace 2017-04-07 17:29:59 +02:00
Matthew Honnibal
cc36c308f4 Fix noun_chunk rules around coordination
Closes #693.
2017-04-07 17:06:40 +02:00
Matthew Honnibal
83dca920d4 Rename test #913 -> #957, comment
Make test for #957 reference correct bug. Add comment.

Previous commit closes #957.
2017-04-07 15:54:25 +02:00
Matthew Honnibal
5887383fc0 Add test for Issue #913: Hang from bad regex 2017-04-07 15:47:27 +02:00
Matthew Honnibal
cfff4e0f61 Improve test 2017-03-31 13:59:32 +02:00
Matthew Honnibal
e854f28304 Add test for Issue #758
Issue #758 occurs when no actions are available for a single token
doc after merging.
2017-03-31 13:26:25 +02:00
Matthew Honnibal
b94286de30 Fix regression test 2017-03-25 22:35:07 +01:00
Matthew Honnibal
4f400fa486 Prevent lemmatization of base nouns
Update lemmatizer's base-form check, for change in morphology class.
Closes #903.
2017-03-25 21:51:12 +01:00
Matthew Honnibal
4454c1b23f Block lemmatization of base-form adjectives
Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes #912.
2017-03-25 21:29:57 +01:00
Matthew Honnibal
f40fbc3710 Add test for Issue #910: Resuming entity training 2017-03-23 23:38:57 +01:00
ines
fe0ff00fe1 Fix spacing 2017-03-19 11:55:37 +01:00
ines
5712da6095 Add regression test for #891 2017-03-19 11:48:01 +01:00
ines
aefb898e37 Add title-case version of morph rules (resolves #686) 2017-03-18 17:27:11 +01:00
ines
d0b85faf69 Pass regression test for #401 (resolves #401)
Fixed in new English models.
2017-03-18 17:06:49 +01:00
Matthew Honnibal
de0e6385b4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-18 16:17:28 +01:00
Matthew Honnibal
fe442cac53 Fix #717: Set correct lemma for contracted verbs 2017-03-18 16:16:10 +01:00
ines
ad934a9abd Add regression test for #693 2017-03-18 16:12:30 +01:00
ines
f57c616830 Add regression test for #704 and test new model (resolves #704)
(using new English model)
2017-03-18 16:04:14 +01:00
Matthew Honnibal
413138de79 Fix #719: Lemmatizer can no longer output empty string 2017-03-18 16:02:06 +01:00
Matthew Honnibal
db51abf685 Fix tests 2017-03-16 18:53:47 -05:00
Matthew Honnibal
fea9fe08af Merge pull request #866 from juanmirocks/master
Fix lemmatization of OOV words
2017-03-16 23:37:36 +01:00
ines
42ba740dde Revert "Merge branch 'debug'"
This reverts commit 89b79d1178, reversing
changes made to 02bdf490a1.
2017-03-13 20:11:52 +01:00
ines
4c5f51e49e Update regression test 2017-03-13 15:16:11 +01:00
ines
02bdf490a1 Remove regression test to see if it caused pytest Travis error 2017-03-13 13:00:22 +01:00
ines
17018750ac Add regression test for #717 2017-03-13 12:58:22 +01:00
ines
2883ebfca2 Remove print statement 2017-03-13 12:30:42 +01:00
ines
98c13d8aa9 Add regression test for #401 2017-03-13 12:28:41 +01:00
ines
444d665f9d Add regression test for #686 2017-03-13 12:23:35 +01:00
ines
46b17e5b51 Add regression test for #719 2017-03-13 12:17:35 +01:00
ines
c8ae682ff9 Add regression test for #636 2017-03-13 12:08:31 +01:00
ines
337f9601f2 Add missing unicode declaration 2017-03-13 12:08:19 +01:00
ines
d70386ec6e Update docstring in #886 regression test 2017-03-13 12:00:38 +01:00
ines
51ba3ef0a8 Add regression test for #886 2017-03-13 11:44:58 +01:00
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Matthew Honnibal
5b0b968d13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 15:03:10 +01:00
Matthew Honnibal
0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines
c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal
3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal
4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal
6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
ines
8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela
a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela
a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines
c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
Matthew Honnibal
0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
ines
7c1260e98c Add regression test 2017-02-24 18:22:49 +01:00
ines
67991b6e5f Add more test cases to #775 regression test to cover #847 2017-02-18 14:10:44 +01:00
ines
44de3c7642 Reformat test and use text_file fixture 2017-02-16 23:49:19 +01:00
ines
3dd22e9c88 Mark vectors test as xfail (temporary) 2017-02-16 23:28:51 +01:00
ines
85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines
ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque
06a71d22df Fix test failure by using unicode literals 2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque
3ba109622c Add regression test with non ' ' space character as token 2017-02-16 12:23:27 +01:00
Michael Wallin
35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
Ines Montani
afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani
13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Ines Montani
e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani
c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani
e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins
e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Ines Montani
19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Ines Montani
0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani
5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Matthew Honnibal
2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Ines Montani
50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani
e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani
e9e99a5670 Add regression test for #740 2017-01-12 22:57:38 +01:00
Ines Montani
6935d55409 Fix formatting 2017-01-12 22:56:20 +01:00
Ines Montani
9b4bea1df9 Tidy up and rename regression tests and remove unnecessary imports 2017-01-12 22:00:37 +01:00
Ines Montani
27482ebed8 Move matcher tests for #188 and #242 to regression tests
Modernise tests and remove unnecessary imports
2017-01-12 17:33:57 +01:00
Ines Montani
0a4dc632bd Update test to not create redundant Doc object 2017-01-12 17:33:18 +01:00
Ines Montani
51ef75f629 Fix regression test for #615 and remove unnecessary imports 2017-01-12 16:51:12 +01:00
Ines Montani
c3d4516fc2 Move test for #361 to regression tests 2017-01-12 16:51:12 +01:00
Ines Montani
359f73a96b Move test for #54 to regression tests 2017-01-12 12:25:51 +01:00
Ines Montani
c5914c6fe5 Fix and pass regression test for #736 2017-01-12 11:48:56 +01:00
Ines Montani
ec7739b76e Add regression test for #736 2017-01-12 11:12:44 +01:00
Ines Montani
c9671329dc Move test for #309 to regression tests 2017-01-11 23:52:13 +01:00
Ines Montani
3e6e1f0251 Tidy up regression tests 2017-01-10 19:24:10 +01:00
Ines Montani
c6e5a5349d Move regression test for #360 into own file 2017-01-04 00:49:31 +01:00
Ines Montani
59059fed27 Move regression test for #351 to own file 2017-01-04 00:47:11 +01:00
Matthew Honnibal
bdcecb3c96 Add import in regression test 2016-12-18 16:51:31 +01:00
Ines Montani
77cf2fb0f6 Remove unnecessary argument in test 2016-12-18 14:06:27 +01:00
Ines Montani
121c310566 Remove trailing whitespace 2016-12-18 14:06:27 +01:00
Matthew Honnibal
0595cc0635 Change test595 to mock data, instead of requiring model. 2016-12-18 13:28:51 +01:00
Matthew Honnibal
e01c1875ee Work on test for #615 2016-11-23 23:48:41 +01:00
Matthew Honnibal
e86f440ca6 Fix test for issue 617 2016-11-10 22:48:10 +01:00
Matthew Honnibal
a2c7de8329 spacy/tests/regression/test_issue617.py
Test Issue #617
2016-11-10 22:46:23 +01:00
Matthew Honnibal
3ea15b257f Fix test for 605 2016-11-06 11:59:26 +01:00
Matthew Honnibal
efe7790439 Test #590: Order dependence in Matcher rules. 2016-11-06 11:21:36 +01:00
Matthew Honnibal
75805397dd Test Issue #605 2016-11-06 10:42:32 +01:00
Matthew Honnibal
4a8a2b6001 Test #595 -- Bug in lemmatization of base forms. 2016-11-04 00:27:32 +01:00
Matthew Honnibal
72b9bd57ec Test Issue #588: Matcher accepts invalid, empty patterns. 2016-11-03 00:09:35 +01:00
Matthew Honnibal
3d6c79e595 Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents. 2016-11-02 23:40:11 +01:00
Matthew Honnibal
125c910a8d Test Issue #600 2016-11-02 23:24:13 +01:00
Matthew Honnibal
d8db648ebf Add __init__.py file for regression tests 2016-11-01 13:45:06 +01:00
Matthew Honnibal
6977a2b8cd Add test for Issue #589 2016-11-01 12:33:36 +01:00
Matthew Honnibal
7e5f63a595 Improve test slightly 2016-10-28 17:41:16 +02:00
Matthew Honnibal
782e4814f4 Test Issue #587: Matcher segfaults on particular input 2016-10-28 16:38:32 +02:00
Matthew Honnibal
afea6505f3 Test Issue 429: No valid actions for NER after matcher adds a new entity label. 2016-10-27 18:01:34 +02:00