Commit Graph

11140 Commits

Author SHA1 Message Date
Matthew Honnibal
4632c597e7 Fix Pipe base class 2019-08-01 17:29:01 +02:00
Ines Montani
8718ca8b1f
Fix init_model if there's no vocab (closes #4048) (#4049) 2019-08-01 17:26:09 +02:00
adrianeboyd
925a852bb6 Improve NER per type scoring (#4052)
* Improve NER per type scoring

* include all gold labels in per type scoring, not only when recall > 0
* improve efficiency of per type scoring

* Create Scorer tests, initially with NER tests

* move regression test #3968 (per type NER scoring) to Scorer tests

* add new test for per type NER scoring with imperfect P/R/F and per
type P/R/F including a case where R == 0.0
2019-08-01 17:15:36 +02:00
Sofie Van Landeghem
f7d950de6d ensure the lang of vocab and nlp stay consistent (#4057)
* ensure the language of vocab and nlp stay consistent across serialization

* equality with =
2019-08-01 17:13:01 +02:00
Björn Böing
a83c0add2e Add links to tokenizer API docs to refer relevant information. (#4064)
* Add links to tokenizer API docs to refer relevant information.

* Add suggested changes

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-01 14:28:38 +02:00
Ejar
2cdf7d39e7 Corrected imported fucntion (#4062)
The example showed an incorrected import
2019-08-01 12:43:36 +02:00
Mohammed Daudali
23ec07debd Correct typo for AllenAI url on homepage (#4050)
* Typo fix for AllenAI url

Changed incorrect home page url for AllenAI from appenai.org to allenai.org

* Sign contributor agreement

* Change date format
2019-07-31 00:16:33 +02:00
Sofie Van Landeghem
7de3b129ab Resolve edge case when calling textcat.predict with empty doc (#4035)
* resolve edge case where no doc has tokens when calling textcat.predict

* more explicit value test
2019-07-30 14:58:01 +02:00
Ines Montani
fcd2f7f656 Fix version introducing Span.ents (closes #4045) [ci skip] 2019-07-30 10:32:33 +02:00
Matthew Honnibal
89c92c65fb Update version 2019-07-28 17:56:38 +02:00
Matthew Honnibal
06eb428ed1 Make pipe base class a bit less presumptuous 2019-07-28 17:56:11 +02:00
Matthew Honnibal
16b5144095 Don't raise NotImplemented in Pipe.update 2019-07-28 17:54:11 +02:00
Ines Montani
fc69da0acb
💫 Support simple training format in nlp.evaluate and add tests (#4033)
* Support simple training format in nlp.evaluate and add tests

* Update docs [ci skip]
2019-07-27 17:30:18 +02:00
Ines Montani
a3723f439c Fix formatting [ci skip] 2019-07-27 16:35:42 +02:00
Ines Montani
d5bce35fb1 Fix bug in Span.similarity when called via hook 2019-07-27 15:33:27 +02:00
Ines Montani
109b5e1798 Fix bug in Token.similarity when called via hook 2019-07-27 15:26:01 +02:00
Ines Montani
e000b5ed82 Also support "requirements" in model.json 2019-07-27 13:34:57 +02:00
Ines Montani
307ffe472d
Support custom language factory setting in meta.json (#4031) 2019-07-27 13:17:43 +02:00
Ines Montani
b7cd58c736 Tidy up and auto-format [ci skip] 2019-07-27 12:19:35 +02:00
Bae Yong-Ju
05fbf5d976 Fix error when Korean text contains regexp special characters. (#4022) 2019-07-25 17:53:33 +02:00
Ines Montani
bd39e5e630 Add "Processing text" section [ci skip] 2019-07-25 17:38:03 +02:00
Ines Montani
a5e3d2f318 Improve section on disabling pipes [ci skip] 2019-07-25 14:25:34 +02:00
Ines Montani
02e444ec7c Add section on special tokenizer component [ci skip] 2019-07-25 14:25:03 +02:00
Ines Montani
1fa6d6ba55 Improve consistency of docs examples [ci skip] 2019-07-25 14:24:56 +02:00
adrianeboyd
784a5f4284 Update GoldParse attributes in API docs (#4023)
* add `words`
* update name of entity list to `ner`

I think it might be a bit more consistent to have `ner` named `entities`
or `ents` (and `ents` is actually set somewhere to `None`, which is a
bit confusing), but it looks like renaming it would be a non-trivial
decision.
2019-07-25 12:14:02 +02:00
Matthew Honnibal
73e095923f 💫 Improve error message when model.from_bytes() dies (#4014)
* Improve error message when model.from_bytes() dies

When Thinc's model.from_bytes() is called with a mismatched model, often
we get a particularly ungraceful error,

e.g. "AttributeError: FunctionLayer has no attribute G"

This is because we're trying to load the parameters for something like
a LayerNorm layer, and the model architecture has some other layer there
instead. This is obviously terrible, especially since the error *type*
is wrong.

I've changed it to raise a ValueError. The error message is still
probably a bit terse, but it's hard to be sure exactly what's gone
wrong.

* Update spacy/pipeline/pipes.pyx

* Update spacy/pipeline/pipes.pyx

* Update spacy/pipeline/pipes.pyx

* Update spacy/syntax/nn_parser.pyx

* Update spacy/syntax/nn_parser.pyx

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>


Co-authored-by: Ines Montani <ines@ines.io>
2019-07-24 11:27:34 +02:00
Ines Montani
87fcf3141c
Merge pull request #4003 from svlandeg/feature/nel-fixes
API changes for Entity linking functionality
2019-07-23 23:17:07 +02:00
Paul O'Leary McCann
c8949ce88a Remove old comment (#4012)
Norwegian used to borrow from French but that doesn't appear to have
been true for a while now, so the comment that was here is no longer
relevant.
2019-07-23 23:10:06 +02:00
Sofie Van Landeghem
ba02957c80 Fix dependency copy for as_doc (#3969)
* failing unit test for issue 3962

* attempt to fix Issue #3962

* create artificial unit test example

* using length instead of self.length

* sp

* reformat with black

* find better ancestor within span and use generic 'dep'

* attach to span.root if there is no appropriate ancestor

* comment span text

* clean up ancestor code

* reconstruct dep tree to keep same number of sentences
2019-07-23 18:28:54 +02:00
svlandeg
4e7ec1ed31 return fix 2019-07-23 14:23:58 +02:00
svlandeg
a037206f0a use pathlib instead 2019-07-23 12:17:19 +02:00
svlandeg
400ff342cf replace assert's with custom error messages 2019-07-23 11:52:48 +02:00
svlandeg
cd6c263fe4 format offsets 2019-07-23 11:31:29 +02:00
svlandeg
20389e4553 format and bugfix 2019-07-22 15:08:17 +02:00
svlandeg
3e140534d9 format 2019-07-22 15:04:57 +02:00
svlandeg
b1911f7105 Errors.E146 for IO error when FP is null 2019-07-22 14:56:13 +02:00
svlandeg
5d544f89ba Errors.E145 for IO errors when reading KB 2019-07-22 14:36:07 +02:00
Ines Montani
a32b033b8c Add regression test for #4002
Test that the PhraseMatcher can match on overwritten NORM attributes.
2019-07-22 14:18:24 +02:00
svlandeg
ad65171837 Merge remote-tracking branch 'upstream/master' into feature/nel-fixes 2019-07-22 13:41:28 +02:00
svlandeg
76184374e2 test corner cases 2019-07-22 13:39:32 +02:00
svlandeg
9f8c1e71a2 fix for Issue #4000 2019-07-22 13:34:12 +02:00
Ines Montani
0be6c7c06c
Merge pull request #4001 from adrianeboyd/docs/german-tiger
Update annotation docs for German
2019-07-22 12:07:49 +02:00
Adriane Boyd
6c5044ed2a Update annotation docs for German
- minor formatting fixes
- remove STTS tags not used in Tiger
- update list of dependency relations to match tiger2dep
2019-07-22 11:59:03 +02:00
adrianeboyd
d2c474cbb7 Fix initial example in EntityRuler API docs (#3999) 2019-07-22 11:18:55 +02:00
svlandeg
dae8a21282 rename entity frequency 2019-07-19 17:40:28 +02:00
svlandeg
f75d1299a7 formatting 2019-07-19 14:52:45 +02:00
svlandeg
41fb5204ba output tensors as part of predict 2019-07-19 14:47:36 +02:00
Ines Montani
1167c303a0 Fix typos [ci skip] 2019-07-19 13:08:18 +02:00
svlandeg
21176517a7 have gold.links correspond exactly to doc.ents 2019-07-19 12:36:15 +02:00
Ines Montani
36062fba93
Merge pull request #3992 from BreakBB/docs
Minor fixes to the docs
2019-07-19 11:51:18 +02:00