Sofie Van Landeghem
ba02957c80
Fix dependency copy for as_doc ( #3969 )
...
* failing unit test for issue 3962
* attempt to fix Issue #3962
* create artificial unit test example
* using length instead of self.length
* sp
* reformat with black
* find better ancestor within span and use generic 'dep'
* attach to span.root if there is no appropriate ancestor
* comment span text
* clean up ancestor code
* reconstruct dep tree to keep same number of sentences
2019-07-23 18:28:54 +02:00
Ines Montani
a32b033b8c
Add regression test for #4002
...
Test that the PhraseMatcher can match on overwritten NORM attributes.
2019-07-22 14:18:24 +02:00
BreakBB
3e370cf2ba
Add 'Prof.' to Englisch tokenizer_exceptions
2019-07-19 10:00:45 +02:00
Falak Asad
ff1e73e35c
Bugfix/issue 3968 ( #3982 )
...
* Fix for issue-3968
* Added contributor agreement
* Made suggested changes
2019-07-18 00:20:32 +02:00
Ines Montani
73565c6d9d
Rename function arguments
2019-07-17 14:29:52 +02:00
Matthew Honnibal
394e4d8058
Add docstring for spacy.gold.align
2019-07-17 13:59:17 +02:00
Ines Montani
073013f129
Auto-format [ci skip]
2019-07-17 12:34:13 +02:00
Ines Montani
62ff128888
Add regression test for #3951
2019-07-16 14:00:00 +02:00
Ines Montani
7f551050b1
Add regression test for #3972
2019-07-16 13:07:35 +02:00
Ines Montani
c0e29f7029
Merge pull request #3957 from sorenlind/danish-tokenizer-slash
...
Make Danish tokenizer split on forward slash
2019-07-12 18:19:22 +02:00
Matthew Honnibal
ef666656b3
Fix attrs alignment
2019-07-12 17:59:47 +02:00
Matthew Honnibal
c345c042b0
Fix symbol alignment
2019-07-12 17:48:38 +02:00
Ines Montani
7281026879
Increment version [ci skip]
2019-07-12 17:40:00 +02:00
Søren Lind Kristiansen
26aee70d95
Make Danish tokenizer split on forward slash
2019-07-12 15:20:42 +02:00
Matthew Honnibal
3bc4d618f9
Set version to v2.1.5
2019-07-12 13:26:12 +02:00
Sofie Van Landeghem
ed774cb953
Fixing ngram bug ( #3953 )
...
* minimal failing example for Issue #3661
* referenced Issue #3661 instead of Issue #3611
* cleanup
2019-07-12 10:01:35 +02:00
Matthew Honnibal
09dc01a426
Fix #3853 , and add warning
2019-07-11 14:46:47 +02:00
Matthew Honnibal
7369949d2e
Add warning for #3853
2019-07-11 14:46:47 +02:00
Ines Montani
673c864a06
Fix doc.count_by functionality ( #3950 )
...
Fix doc.count_by functionality
2019-07-11 13:44:00 +02:00
Ines Montani
2426f4d44c
Fix default punctuation rules for splitting Hindi text ( #3948 )
...
Fix default punctuation rules for splitting Hindi text
Co-authored-by: yash <patadiayash@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2019-07-11 13:36:28 +02:00
svlandeg
349107daa3
cleanup
2019-07-11 13:09:22 +02:00
svlandeg
0f0f07318a
counter instead of preshcounter
2019-07-11 13:05:53 +02:00
Matthew Honnibal
b40b4c2c31
💫 Fix issue #3839 : Incorrect entity IDs from Matcher with operators ( #3949 )
...
* Add regression test for issue #3541
* Add comment on bugfix
* Remove incorrect test
* Un-xfail test
2019-07-11 12:55:11 +02:00
Matthew Honnibal
e19f4ee719
Add warning message re Issue #3853
2019-07-11 12:50:38 +02:00
Ines Montani
197cfd7ebc
Merge branch 'master' into pr/3948
2019-07-11 12:18:31 +02:00
Ines Montani
d166756607
Fix test
2019-07-11 12:16:43 +02:00
Ines Montani
0b8406a05c
Tidy up and auto-format
2019-07-11 12:02:25 +02:00
yash
6751af3e78
Merge branch 'master' of https://github.com/yash1994/spaCy
2019-07-11 15:26:57 +05:30
yash
ae2d52e323
Add default encoding utf-8 for test file
2019-07-11 15:26:27 +05:30
Ines Montani
33ca0a036a
Merge branch 'master' into pr/3948
2019-07-11 11:55:54 +02:00
Matthew Honnibal
0491a8e7c8
Reformat
2019-07-11 11:49:36 +02:00
Matthew Honnibal
bd3c3f342b
Fix _serialize
2019-07-11 11:48:55 +02:00
yash
815f8d13dd
Fix default punctuation rules for hindi text ( #3625 explosion)
2019-07-11 15:00:51 +05:30
yash
d5311b3c42
Add test file for issue ( #3625 ) and spacy contributor agreement
2019-07-11 14:53:14 +05:30
svlandeg
e080412385
tracked the bug down to PreshCounter.inc - still unclear what goes wrong
2019-07-11 01:53:06 +02:00
svlandeg
a89fecce97
failing unit test for issue #3869
2019-07-11 00:43:55 +02:00
Matthew Honnibal
a388888074
Merge branch 'master' of https://github.com/explosion/spaCy
2019-07-10 22:54:17 +02:00
Matthew Honnibal
c6cb782758
Set version to 2.1.5.dev0
2019-07-10 22:54:09 +02:00
Sofie Van Landeghem
c4c21cb428
more friendly textcat errors ( #3946 )
...
* more friendly textcat errors with require_model and require_labels
* update thinc version with recent bugfix
2019-07-10 19:39:38 +02:00
Matthew Honnibal
b94c5443d9
Rename Binder->DocBox, and improve it.
2019-07-10 19:37:20 +02:00
Matthew Honnibal
3d18600c05
Return True from doc.is_... when no ambiguity
...
* Make doc.is_sentenced return True if len(doc) < 2.
* Make doc.is_nered return True if len(doc) == 0, for consistency.
Closes #3934
2019-07-10 19:21:42 +02:00
Matthew Honnibal
465456edb9
Un-xfail test #3880
2019-07-10 14:01:17 +02:00
Matthew Honnibal
87f7ec34d5
Add test for #3880
2019-07-10 13:53:55 +02:00
Ines Montani
4e04080b76
Only compare sorted patterns in test
...
Try to work around flaky tests on Python 3.5
2019-07-10 13:00:52 +02:00
Ines Montani
82045aac8a
Merge regression tests
2019-07-10 12:49:18 +02:00
Ines Montani
40cd03fc35
Improve EntityRuler serialization
2019-07-10 12:25:45 +02:00
Ines Montani
570ab1f481
Fix handling of old entity ruler files
...
Expected an `entity_ruler.jsonl` file in the top-level model directory, so the path passed to from_disk by default (model path plus componentn name), but with the suffix ".jsonl".
2019-07-10 12:14:12 +02:00
Ines Montani
874d914a44
Tidy up test
2019-07-10 12:13:23 +02:00
Ines Montani
ea2050079b
Auto-format
2019-07-10 12:03:05 +02:00
Ines Montani
6ba5ddbd5f
Merge pull request #3864 from svlandeg/feature/nel-wiki
...
Entity linking using Wikipedia & Wikidata
2019-07-10 11:25:41 +02:00