Commit Graph

10520 Commits

Author SHA1 Message Date
Bae Yong-Ju
05fbf5d976 Fix error when Korean text contains regexp special characters. (#4022) 2019-07-25 17:53:33 +02:00
Ines Montani
bd39e5e630 Add "Processing text" section [ci skip] 2019-07-25 17:38:03 +02:00
Ines Montani
a5e3d2f318 Improve section on disabling pipes [ci skip] 2019-07-25 14:25:34 +02:00
Ines Montani
02e444ec7c Add section on special tokenizer component [ci skip] 2019-07-25 14:25:03 +02:00
Ines Montani
1fa6d6ba55 Improve consistency of docs examples [ci skip] 2019-07-25 14:24:56 +02:00
adrianeboyd
784a5f4284 Update GoldParse attributes in API docs (#4023)
* add `words`
* update name of entity list to `ner`

I think it might be a bit more consistent to have `ner` named `entities`
or `ents` (and `ents` is actually set somewhere to `None`, which is a
bit confusing), but it looks like renaming it would be a non-trivial
decision.
2019-07-25 12:14:02 +02:00
Matthew Honnibal
73e095923f 💫 Improve error message when model.from_bytes() dies (#4014)
* Improve error message when model.from_bytes() dies

When Thinc's model.from_bytes() is called with a mismatched model, often
we get a particularly ungraceful error,

e.g. "AttributeError: FunctionLayer has no attribute G"

This is because we're trying to load the parameters for something like
a LayerNorm layer, and the model architecture has some other layer there
instead. This is obviously terrible, especially since the error *type*
is wrong.

I've changed it to raise a ValueError. The error message is still
probably a bit terse, but it's hard to be sure exactly what's gone
wrong.

* Update spacy/pipeline/pipes.pyx

* Update spacy/pipeline/pipes.pyx

* Update spacy/pipeline/pipes.pyx

* Update spacy/syntax/nn_parser.pyx

* Update spacy/syntax/nn_parser.pyx

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>

* Update spacy/pipeline/pipes.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>


Co-authored-by: Ines Montani <ines@ines.io>
2019-07-24 11:27:34 +02:00
Ines Montani
87fcf3141c
Merge pull request #4003 from svlandeg/feature/nel-fixes
API changes for Entity linking functionality
2019-07-23 23:17:07 +02:00
Paul O'Leary McCann
c8949ce88a Remove old comment (#4012)
Norwegian used to borrow from French but that doesn't appear to have
been true for a while now, so the comment that was here is no longer
relevant.
2019-07-23 23:10:06 +02:00
Sofie Van Landeghem
ba02957c80 Fix dependency copy for as_doc (#3969)
* failing unit test for issue 3962

* attempt to fix Issue #3962

* create artificial unit test example

* using length instead of self.length

* sp

* reformat with black

* find better ancestor within span and use generic 'dep'

* attach to span.root if there is no appropriate ancestor

* comment span text

* clean up ancestor code

* reconstruct dep tree to keep same number of sentences
2019-07-23 18:28:54 +02:00
svlandeg
4e7ec1ed31 return fix 2019-07-23 14:23:58 +02:00
svlandeg
a037206f0a use pathlib instead 2019-07-23 12:17:19 +02:00
svlandeg
400ff342cf replace assert's with custom error messages 2019-07-23 11:52:48 +02:00
svlandeg
cd6c263fe4 format offsets 2019-07-23 11:31:29 +02:00
svlandeg
20389e4553 format and bugfix 2019-07-22 15:08:17 +02:00
svlandeg
3e140534d9 format 2019-07-22 15:04:57 +02:00
svlandeg
b1911f7105 Errors.E146 for IO error when FP is null 2019-07-22 14:56:13 +02:00
svlandeg
5d544f89ba Errors.E145 for IO errors when reading KB 2019-07-22 14:36:07 +02:00
Ines Montani
a32b033b8c Add regression test for #4002
Test that the PhraseMatcher can match on overwritten NORM attributes.
2019-07-22 14:18:24 +02:00
svlandeg
ad65171837 Merge remote-tracking branch 'upstream/master' into feature/nel-fixes 2019-07-22 13:41:28 +02:00
svlandeg
76184374e2 test corner cases 2019-07-22 13:39:32 +02:00
svlandeg
9f8c1e71a2 fix for Issue #4000 2019-07-22 13:34:12 +02:00
Ines Montani
0be6c7c06c
Merge pull request #4001 from adrianeboyd/docs/german-tiger
Update annotation docs for German
2019-07-22 12:07:49 +02:00
Adriane Boyd
6c5044ed2a Update annotation docs for German
- minor formatting fixes
- remove STTS tags not used in Tiger
- update list of dependency relations to match tiger2dep
2019-07-22 11:59:03 +02:00
adrianeboyd
d2c474cbb7 Fix initial example in EntityRuler API docs (#3999) 2019-07-22 11:18:55 +02:00
svlandeg
dae8a21282 rename entity frequency 2019-07-19 17:40:28 +02:00
svlandeg
f75d1299a7 formatting 2019-07-19 14:52:45 +02:00
svlandeg
41fb5204ba output tensors as part of predict 2019-07-19 14:47:36 +02:00
Ines Montani
1167c303a0 Fix typos [ci skip] 2019-07-19 13:08:18 +02:00
svlandeg
21176517a7 have gold.links correspond exactly to doc.ents 2019-07-19 12:36:15 +02:00
Ines Montani
36062fba93
Merge pull request #3992 from BreakBB/docs
Minor fixes to the docs
2019-07-19 11:51:18 +02:00
BreakBB
6d9a7c0749 Add '--silent' argument to bash example of CLI Info 2019-07-19 10:00:45 +02:00
BreakBB
3e370cf2ba Add 'Prof.' to Englisch tokenizer_exceptions 2019-07-19 10:00:45 +02:00
BreakBB
c8ba0f690d Fix --force parameter of CLI package 2019-07-19 10:00:45 +02:00
svlandeg
e1213eaf6a use original gold object in get_loss function 2019-07-18 13:35:10 +02:00
svlandeg
ec55d2fccd filter training data beforehand (+black formatting) 2019-07-18 10:22:24 +02:00
Falak Asad
ff1e73e35c Bugfix/issue 3968 (#3982)
* Fix for issue-3968

* Added contributor agreement

* Made suggested changes
2019-07-18 00:20:32 +02:00
svlandeg
d833d4c358 fixes in kb and gold 2019-07-17 17:18:26 +02:00
Ines Montani
a0acb1b3cd Also add infobox to API docs [ci skip] 2019-07-17 16:26:41 +02:00
Ines Montani
c3ead02ea5 Adjust wording [ci skip] 2019-07-17 16:06:25 +02:00
Ines Montani
57d7076a72
💫 Document spacy.gold.align (#3980)
💫 Document spacy.gold.align

Co-authored-by: Ines Montani <ines@ines.io>
2019-07-17 15:34:35 +02:00
Ines Montani
1d5ff3e455 Add infobox 2019-07-17 15:29:36 +02:00
Ines Montani
114cb18892 Improve wording 2019-07-17 15:27:53 +02:00
Ines Montani
7522beef9e Add "Things to try" prompts 2019-07-17 15:25:02 +02:00
Ines Montani
9f02e3c027 Adjust example
Not actually supported in this alignment interpretation
2019-07-17 15:13:50 +02:00
Ines Montani
1ea472468a Add usage docs for aligning tokenization 2019-07-17 15:08:33 +02:00
Ines Montani
f97a555445 Add API documentation 2019-07-17 14:30:04 +02:00
Ines Montani
73565c6d9d Rename function arguments 2019-07-17 14:29:52 +02:00
Matthew Honnibal
394e4d8058 Add docstring for spacy.gold.align 2019-07-17 13:59:17 +02:00
Ines Montani
fe0e1873a3 Update README.md [ci skip] 2019-07-17 12:34:31 +02:00