Ines Montani
139428c20f
Set unique vector names in tests
2019-09-16 15:16:54 +02:00
Ines Montani
655b434553
Merge branch 'master' into develop
2019-09-12 11:39:18 +02:00
tamuhey
71909cdf22
Fix iss4278 ( #4279 )
...
* fix: len(tuple) == 2
* (#4278 ) add fail test
* add contributor's aggreement
2019-09-12 10:44:49 +02:00
Ines Montani
e82a8d0d7a
Merge branch 'master' into develop
2019-09-11 11:52:38 +02:00
Ines Montani
8f9f48b04c
Add GreekLemmatizer.lookup ( resolves #4272 )
2019-09-11 11:44:40 +02:00
Ines Montani
6279d74c65
Tidy up and auto-format
2019-09-11 11:38:22 +02:00
Matthew Honnibal
7b858ba606
Update from master
2019-09-10 20:14:08 +02:00
adrianeboyd
3780e2ff50
Flush tokenizer cache when necessary ( #4258 )
...
Flush tokenizer cache when affixes, token_match, or special cases are
modified.
Fixes #4238 , same issue as in #1250 .
2019-09-08 20:52:46 +02:00
Matthew Honnibal
1a65c5b7af
Update develop from master
2019-09-08 18:21:41 +02:00
Adriane Boyd
0f28418446
Add regression test for #1061 back to test suite
2019-09-04 20:42:24 +02:00
Ines Montani
419ae59c79
Make flaky test test_issue_1971_4 more explicit
2019-08-31 14:08:05 +02:00
svlandeg
7bec0ebbcb
failing unit test for Issue 4190
2019-08-28 14:16:34 +02:00
Matthew Honnibal
22250cf6b7
Make regression test less sensitive to tag-map stuff
2019-08-25 21:54:26 +02:00
Matthew Honnibal
bb911e5f4e
Fix #3830 : 'subtok' label being added even if learn_tokens=False ( #4188 )
...
* Prevent subtok label if not learning tokens
The parser introduces the subtok label to mark tokens that should be
merged during post-processing. Previously this happened even if we did
not have the --learn-tokens flag set. This patch passes the config
through to the parser, to prevent the problem.
* Make merge_subtokens a parser post-process if learn_subtokens
* Fix train script
* Add test for 3830: subtok problem
* Fix handlign of non-subtok in parser training
2019-08-23 17:54:00 +02:00
Sofie Van Landeghem
c417c380e3
Matcher ID fixes ( #4179 )
...
* allow phrasematcher to link one match to multiple original patterns
* small fix for defining ent_id in the matcher (anti-ghost prevention)
* cleanup
* formatting
2019-08-22 17:17:07 +02:00
Sofie Van Landeghem
de272f8b82
adding double match for optional operator at the end ( #4166 )
2019-08-21 22:46:56 +02:00
Sofie Van Landeghem
01c5980187
Serialize POS attribute when doc.is_tagged ( #4092 )
...
* fix and unit test for issue 3959
* additional unit test for manifestation of the same (resolved) bug
2019-08-21 21:59:30 +02:00
Sofie Van Landeghem
7539a4f3a8
use states[q] in while retry loop ( #4162 )
2019-08-21 21:58:04 +02:00
Ines Montani
f580302673
Tidy up and auto-format
2019-08-20 17:36:34 +02:00
Ines Montani
364aaf5bc2
Simplify test
2019-08-20 16:41:58 +02:00
Sofie Van Landeghem
68ee0384fd
Unit test for Issue 3879 ( #4153 )
...
* failing unit test for Issue #3879
* mark test as failing
2019-08-20 16:40:25 +02:00
Ines Montani
86cd7f0efd
Add regression test for #4120
2019-08-20 16:33:09 +02:00
Ines Montani
009280fbc5
Tidy up and auto-format
2019-08-18 15:09:16 +02:00
AJ Rader
2f3648700c
Correction of default lemmatizer lookup in English (Issue # 4104) ( #4110 )
...
* pytest file for issue4104 established
* edited default lookup english lemmatizer for spun; fixes issue 4102
* eliminated parameterization and sorted dictionary dependnency in issue 4104 test
* added contributor agreement
2019-08-15 11:39:10 +02:00
Sofie Van Landeghem
963ea5e8d0
Update lemma and vector information after splitting a token ( #4097 )
...
* fixing vector and lemma attributes after retokenizer.split
* fixing unit test with mockup tensor
* xp instead of numpy
2019-08-08 15:09:44 +02:00
Sofie Van Landeghem
ad09b0d6f3
fetch norm from lex if necessary for matching ( #4080 )
2019-08-05 23:51:04 +02:00
adrianeboyd
925a852bb6
Improve NER per type scoring ( #4052 )
...
* Improve NER per type scoring
* include all gold labels in per type scoring, not only when recall > 0
* improve efficiency of per type scoring
* Create Scorer tests, initially with NER tests
* move regression test #3968 (per type NER scoring) to Scorer tests
* add new test for per type NER scoring with imperfect P/R/F and per
type P/R/F including a case where R == 0.0
2019-08-01 17:15:36 +02:00
Sofie Van Landeghem
f7d950de6d
ensure the lang of vocab and nlp stay consistent ( #4057 )
...
* ensure the language of vocab and nlp stay consistent across serialization
* equality with =
2019-08-01 17:13:01 +02:00
Sofie Van Landeghem
7de3b129ab
Resolve edge case when calling textcat.predict with empty doc ( #4035 )
...
* resolve edge case where no doc has tokens when calling textcat.predict
* more explicit value test
2019-07-30 14:58:01 +02:00
Sofie Van Landeghem
ba02957c80
Fix dependency copy for as_doc ( #3969 )
...
* failing unit test for issue 3962
* attempt to fix Issue #3962
* create artificial unit test example
* using length instead of self.length
* sp
* reformat with black
* find better ancestor within span and use generic 'dep'
* attach to span.root if there is no appropriate ancestor
* comment span text
* clean up ancestor code
* reconstruct dep tree to keep same number of sentences
2019-07-23 18:28:54 +02:00
Ines Montani
a32b033b8c
Add regression test for #4002
...
Test that the PhraseMatcher can match on overwritten NORM attributes.
2019-07-22 14:18:24 +02:00
Falak Asad
ff1e73e35c
Bugfix/issue 3968 ( #3982 )
...
* Fix for issue-3968
* Added contributor agreement
* Made suggested changes
2019-07-18 00:20:32 +02:00
Ines Montani
073013f129
Auto-format [ci skip]
2019-07-17 12:34:13 +02:00
Ines Montani
62ff128888
Add regression test for #3951
2019-07-16 14:00:00 +02:00
Ines Montani
7f551050b1
Add regression test for #3972
2019-07-16 13:07:35 +02:00
Sofie Van Landeghem
ed774cb953
Fixing ngram bug ( #3953 )
...
* minimal failing example for Issue #3661
* referenced Issue #3661 instead of Issue #3611
* cleanup
2019-07-12 10:01:35 +02:00
Ines Montani
673c864a06
Fix doc.count_by functionality ( #3950 )
...
Fix doc.count_by functionality
2019-07-11 13:44:00 +02:00
Ines Montani
2426f4d44c
Fix default punctuation rules for splitting Hindi text ( #3948 )
...
Fix default punctuation rules for splitting Hindi text
Co-authored-by: yash <patadiayash@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2019-07-11 13:36:28 +02:00
svlandeg
349107daa3
cleanup
2019-07-11 13:09:22 +02:00
Matthew Honnibal
b40b4c2c31
💫 Fix issue #3839 : Incorrect entity IDs from Matcher with operators ( #3949 )
...
* Add regression test for issue #3541
* Add comment on bugfix
* Remove incorrect test
* Un-xfail test
2019-07-11 12:55:11 +02:00
Ines Montani
197cfd7ebc
Merge branch 'master' into pr/3948
2019-07-11 12:18:31 +02:00
Ines Montani
0b8406a05c
Tidy up and auto-format
2019-07-11 12:02:25 +02:00
yash
ae2d52e323
Add default encoding utf-8 for test file
2019-07-11 15:26:27 +05:30
yash
d5311b3c42
Add test file for issue ( #3625 ) and spacy contributor agreement
2019-07-11 14:53:14 +05:30
svlandeg
e080412385
tracked the bug down to PreshCounter.inc - still unclear what goes wrong
2019-07-11 01:53:06 +02:00
svlandeg
a89fecce97
failing unit test for issue #3869
2019-07-11 00:43:55 +02:00
Matthew Honnibal
465456edb9
Un-xfail test #3880
2019-07-10 14:01:17 +02:00
Matthew Honnibal
87f7ec34d5
Add test for #3880
2019-07-10 13:53:55 +02:00
Ines Montani
82045aac8a
Merge regression tests
2019-07-10 12:49:18 +02:00
Ines Montani
570ab1f481
Fix handling of old entity ruler files
...
Expected an `entity_ruler.jsonl` file in the top-level model directory, so the path passed to from_disk by default (model path plus componentn name), but with the suffix ".jsonl".
2019-07-10 12:14:12 +02:00