Commit Graph

1653 Commits

Author SHA1 Message Date
Matthew Honnibal
7de997c0a5 Update test 2020-06-13 23:11:45 +02:00
Matthew Honnibal
3eb8f3867e Update test 2020-06-13 23:05:16 +02:00
svlandeg
face0de74f fix MORPH conversion + enable unit test 2020-06-12 16:29:09 +02:00
svlandeg
880dccf93e entities on doc_annotation, parse links and check their offsets against the entities. unit test works 2020-06-12 15:47:20 +02:00
svlandeg
3aed177a35 fix ENT_IOB conversion and enable unit test 2020-06-12 11:30:24 +02:00
svlandeg
6a67a11682 adding tests for new example class (some still failing - WIP) 2020-06-11 17:43:40 +02:00
Matthew Honnibal
488727aee0 Start updating test 2020-06-09 23:58:28 +02:00
Matthew Honnibal
ccd332a9fc Update test stubs 2020-06-09 15:49:04 +02:00
Matthew Honnibal
f1189dc205 Draft tests for new Example class 2020-06-09 15:43:08 +02:00
Matthew Honnibal
c833ebe1ad Start tests for new example class 2020-06-09 15:29:05 +02:00
Matthew Honnibal
d9289712ba * Make GoldCorpus return dict, not Example
* Make Example require a Doc object (previously optional)

Clarify methods in GoldCorpus

WIP refactor Example

Refactor Example.split_sents

Fix test

Fix augment

Update test

Update test

Fix import

Update test_scorer

Update Example
2020-06-09 01:01:59 +02:00
Matthew Honnibal
084271c9e9
Remove GoldParse from public API
* Move get_parses_from_example to spacy.syntax

* Get GoldParse out of Example

* Avoid expecting GoldParse input in parser

* Add Alignment to spacy.gold.align

* Update Example object

* Add comment

* Update pipeline

* Fix imports

* Simplify gold_io

* WIP on GoldCorpus

* Update test

* Xfail some gold tests

* Remove ignore_misaligned option from GoldCorpus

* Fix Example constructor

* Update test

* Fix usage of Example

* Add deprecated_get_gold method on Example

* Patch scorer

* Fix test

* Fix test

* Update tests

* Xfail a test

* Fix passing of make_projective

* Pass make_projective by default

* Hack data format in Example.from_dict

* Update tests

* Fix example.from_dict

* Update morphologizer

* Fix entity linker

* Add get_field to TokenAnnotation

* Fix Example.get_aligned

* Update test

* Fix alignment

* Fix corpus

* Fix GoldCorpus

* Handle misaligned

* Format

* Fix missing import
2020-06-08 22:09:57 +02:00
Ines Montani
d93cbeb14f
Add warning for loose version constraints (#5536)
* Add warning for loose version constraints

* Update wording [ci skip]

* Tweak error message

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-06-05 12:42:15 +02:00
Matthew Honnibal
8411d4f4e6
Merge pull request #5543 from svlandeg/feature/pretrain-config
pretrain from config
2020-06-04 19:07:12 +02:00
Ines Montani
810fce3bb1 Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
svlandeg
eac12cbb77 make dropout in embed layers configurable 2020-06-03 11:50:16 +02:00
svlandeg
6504b7f161 Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config 2020-06-03 08:30:16 +02:00
svlandeg
c5ac382f0a fix name clash 2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf additional test with discard_oversize=False 2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c extending algorithm to deal better with edge cases 2020-06-02 22:05:08 +02:00
svlandeg
6208d322d3 slightly more challenging unit test 2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c using overflow buffer for examples within the tolerance margin 2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5 add test for minibatch util 2020-06-02 18:26:21 +02:00
svlandeg
e0f9f448f1 remove Tensorizer 2020-06-01 23:38:48 +02:00
Ines Montani
b7aff6020c Make functions more general purpose and update docstrings and tests 2020-05-30 15:18:53 +02:00
Ines Montani
e47e5a4b10 Use more sophisticated version parsing logic 2020-05-30 15:01:58 +02:00
Ines Montani
387c7aba15 Update test 2020-05-24 14:55:16 +02:00
Ines Montani
4465cad6c5 Rename spacy.analysis to spacy.pipe_analysis 2020-05-22 17:42:06 +02:00
Ines Montani
25d6ed3fb8
Merge pull request #5489 from explosion/feature/connected-components 2020-05-22 17:40:11 +02:00
Ines Montani
841c05b47b
Merge pull request #5490 from explosion/fix/remove-jsonschema 2020-05-22 17:39:54 +02:00
Ines Montani
569a65b60e Auto-format 2020-05-22 16:55:42 +02:00
Ines Montani
d844528c5f Add test for is_compatible_model 2020-05-22 16:55:15 +02:00
Ines Montani
12b7be1d98 Remove jsonschema from dependencies 2020-05-22 16:49:26 +02:00
Matthew Honnibal
f7f6df7275 Move to spacy.analysis 2020-05-22 16:43:18 +02:00
Matthew Honnibal
78d79d94ce Guess set_annotations=True in nlp.update
During `nlp.update`, components can be passed a boolean set_annotations
to indicate whether they should assign annotations to the `Doc`. This
needs to be called if downstream components expect to use the
annotations during training, e.g. if we wanted to use tagger features in
the parser.

Components can specify their assignments and requirements, so we can
figure out which components have these inter-dependencies. After
figuring this out, we can guess whether to pass set_annotations=True.

We could also call set_annotations=True always, or even just have this
as the only behaviour. The downside of this is that it would require the
`Doc` objects to be created afresh to avoid problematic modifications.
One approach would be to make a fresh copy of the `Doc` objects within
`nlp.update()`, so that we can write to the objects without any
problems. If we do that, we can drop this logic and also drop the
`set_annotations` mechanism. I would be fine with that approach,
although it runs the risk of introducing some performance overhead, and
we'll have to take care to copy all extension attributes etc.
2020-05-22 15:55:45 +02:00
Ines Montani
581bda9f98 Update senter test and auto-format 2020-05-21 20:17:14 +02:00
Adriane Boyd
132b2a6898 Merge remote-tracking branch 'upstream/master-tmp' into HEAD 2020-05-21 19:50:30 +02:00
Adriane Boyd
17ee9ab53a Fix _SP/POS=SPACE in strings serialization tests 2020-05-21 19:49:08 +02:00
Ines Montani
245f91df78 Fix merge issues 2020-05-21 19:42:13 +02:00
Ines Montani
631e20d0c6 Fix test and schemas 2020-05-21 19:01:02 +02:00
Ines Montani
24f72c669c Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
Ines Montani
c6ec19c844 Add missing declaration 2020-05-21 17:30:05 +02:00
Ines Montani
bd6353715a Merge branch 'master' into fix/travis-tests 2020-05-21 14:23:04 +02:00
Ines Montani
d8f3190c0a Tidy up and auto-format 2020-05-21 14:14:01 +02:00
Ines Montani
56de520afd Try to fix tests on Travis (2.7) 2020-05-21 14:04:57 +02:00
Adriane Boyd
4b229bfc22 Improve handling of NER in CoNLL-U MISC 2020-05-20 18:48:51 +02:00
Sofie Van Landeghem
7f5715a081
Various fixes to NEL functionality, Example class etc (#5460)
* setting KB in the EL constructor, similar to how the model is passed on

* removing wikipedia example files - moved to projects

* throw an error when nlp.update is called with 2 positional arguments

* rewriting the config logic in create pipe to accomodate for other objects (e.g. KB) in the config

* update config files with new parameters

* avoid training pipeline components that don't have a model (like sentencizer)

* various small fixes + UX improvements

* small fixes

* set thinc to 8.0.0a9 everywhere

* remove outdated comment
2020-05-20 11:41:12 +02:00
adrianeboyd
40e65d6f63
Fix most_similar for vectors with unused rows (#5348)
* Fix most_similar for vectors with unused rows

Address issues related to the unused rows in the vector table and
`most_similar`:

* Update `most_similar()` to search only through rows that are in use
according to `key2row`.

* Raise an error when `most_similar(n=n)` is larger than the number of
vectors in the table.

* Set and restore `_unset` correctly when vectors are added or
deserialized so that new vectors are added in the correct row.

* Set data and keys to the same length in `Vocab.prune_vectors()` to
avoid spurious entries in `key2row`.

* Fix regression test using `most_similar`

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-19 16:41:26 +02:00
Sofie Van Landeghem
f00de445dd
default models defined in component decorator (#5452)
* move defaults to pipeline and use in component decorator

* black formatting

* relative import
2020-05-19 16:20:03 +02:00
adrianeboyd
70da1fd2d6
Add warning for misaligned character offset spans (#5007)
* Add warning for misaligned character offset spans

* Resolve conflict

* Filter warnings in example scripts

Filter warnings in example scripts to show warnings once, in particular
warnings about misaligned entities.

Co-authored-by: Ines Montani <ines@ines.io>
2020-05-19 16:01:18 +02:00