Commit Graph

519 Commits

Author SHA1 Message Date
Matthew Honnibal
031673dc35 Update test 2020-06-22 16:08:01 +02:00
svlandeg
bf819ba302 Merge remote-tracking branch 'upstream/develop' into whatif/arrow
# Conflicts:
#	spacy/cli/train.py
#	spacy/gold.pyx
#	spacy/ml/models/multi_task.py
#	spacy/ml/models/simple_ner.py
#	spacy/ml/models/textcat.py
#	spacy/ml/models/tok2vec.py
#	spacy/pipeline/pipes.pyx
#	spacy/pipeline/simple_ner.py
#	spacy/scorer.py
#	spacy/tests/parser/test_add_label.py
#	spacy/tests/parser/test_nn_beam.py
#	spacy/tests/pipeline/test_morphologizer.py
#	spacy/tests/test_scorer.py
#	spacy/tests/test_util.py
#	spacy/util.py
2020-06-22 15:15:20 +02:00
svlandeg
5e71919322 avoid writing temp dir in json2docs, fixing 4402 test 2020-06-22 14:27:35 +02:00
Matthew Honnibal
6a75992af6 Format 2020-06-22 01:11:43 +02:00
Matthew Honnibal
75a5f2d499 Remove GoldCorpus
Update imports

Update after removing GoldCorpus

Fix module name of corpus

Fix mimport
2020-06-22 00:54:38 +02:00
Matthew Honnibal
50d4b21743 Xfail some tests
Skip tests that cause crashes

Skip test causing segfault
2020-06-22 00:54:38 +02:00
Ines Montani
40bb918a4c Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
svlandeg
12dc8ab208 remove redundant code from master in EntityLinker 2020-06-20 23:07:42 +02:00
Ines Montani
63c22969f4 Update test_issue5230.py 2020-06-20 16:17:48 +02:00
Ines Montani
296b5d633b Remove references to Python 2 / is_python2 2020-06-20 16:11:13 +02:00
Ines Montani
52728d8fa3 Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
Ines Montani
8283df80e9 Tidy up and auto-format 2020-06-20 14:15:04 +02:00
svlandeg
01f9ae774c small fixes 2020-06-18 14:01:19 +02:00
svlandeg
0c6f1f3891 fix BiluoPushDown parsing entities 2020-06-18 13:00:03 +02:00
svlandeg
1a151b10d6 correct silly typo 2020-06-17 14:48:14 +02:00
Matthew Honnibal
706e652820 Merge from develop 2020-06-14 17:35:01 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
Matthew Honnibal
d9289712ba * Make GoldCorpus return dict, not Example
* Make Example require a Doc object (previously optional)

Clarify methods in GoldCorpus

WIP refactor Example

Refactor Example.split_sents

Fix test

Fix augment

Update test

Update test

Fix import

Update test_scorer

Update Example
2020-06-09 01:01:59 +02:00
Ines Montani
c685ee734a Fix compat for v2.x branch 2020-05-22 14:22:36 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Ines Montani
245f91df78 Fix merge issues 2020-05-21 19:42:13 +02:00
Ines Montani
24f72c669c Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
svlandeg
36a94c409a failing test to reproduce overlapping spans problem 2020-05-20 23:06:03 +02:00
adrianeboyd
40e65d6f63
Fix most_similar for vectors with unused rows (#5348)
* Fix most_similar for vectors with unused rows

Address issues related to the unused rows in the vector table and
`most_similar`:

* Update `most_similar()` to search only through rows that are in use
according to `key2row`.

* Raise an error when `most_similar(n=n)` is larger than the number of
vectors in the table.

* Set and restore `_unset` correctly when vectors are added or
deserialized so that new vectors are added in the correct row.

* Set data and keys to the same length in `Vocab.prune_vectors()` to
avoid spurious entries in `key2row`.

* Fix regression test using `most_similar`

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-19 16:41:26 +02:00
Sofie Van Landeghem
f00de445dd
default models defined in component decorator (#5452)
* move defaults to pipeline and use in component decorator

* black formatting

* relative import
2020-05-19 16:20:03 +02:00
Sofie Van Landeghem
0d94737857
Feature toggle_pipes (#5378)
* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-18 22:27:10 +02:00
Matthew Honnibal
333b1a308b
Adapt parser and NER for transformers (#5449)
* Draft layer for BILUO actions

* Fixes to biluo layer

* WIP on BILUO layer

* Add tests for BILUO layer

* Format

* Fix transitions

* Update test

* Link in the simple_ner

* Update BILUO tagger

* Update __init__

* Import simple_ner

* Update test

* Import

* Add files

* Add config

* Fix label passing for BILUO and tagger

* Fix label handling for simple_ner component

* Update simple NER test

* Update config

* Hack train script

* Update BILUO layer

* Fix SimpleNER component

* Update train_from_config

* Add biluo_to_iob helper

* Add IOB layer

* Add IOBTagger model

* Update biluo layer

* Update SimpleNER tagger

* Update BILUO

* Read random seed in train-from-config

* Update use of normal_init

* Fix normalization of gradient in SimpleNER

* Update IOBTagger

* Remove print

* Tweak masking in BILUO

* Add dropout in SimpleNER

* Update thinc

* Tidy up simple_ner

* Fix biluo model

* Unhack train-from-config

* Update setup.cfg and requirements

* Add tb_framework.py for parser model

* Try to avoid memory leak in BILUO

* Move ParserModel into spacy.ml, avoid need for subclass.

* Use updated parser model

* Remove incorrect call to model.initializre in PrecomputableAffine

* Update parser model

* Avoid divide by zero in tagger

* Add extra dropout layer in tagger

* Refine minibatch_by_words function to avoid oom

* Fix parser model after refactor

* Try to avoid div-by-zero in SimpleNER

* Fix infinite loop in minibatch_by_words

* Use SequenceCategoricalCrossentropy in Tagger

* Fix parser model when hidden layer

* Remove extra dropout from tagger

* Add extra nan check in tagger

* Fix thinc version

* Update tests and imports

* Fix test

* Update test

* Update tests

* Fix tests

* Fix test

Co-authored-by: Ines Montani <ines@ines.io>
2020-05-18 22:23:33 +02:00
Sofie Van Landeghem
cfdaf99b80
Fix passing of component configuration (#5374)
* add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument

* add fix and test for Issue 5137
2020-04-29 12:56:17 +02:00
Sofie Van Landeghem
f67343295d
Update NEL examples and documentation (#5370)
* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
2020-04-29 12:53:53 +02:00
adrianeboyd
f8ac5b9f56
bugfix in span similarity (#5155) (#5358)
* bugfix in span similarity

* also rewrite doc.pyx for clarity

* formatting

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-27 16:51:27 +02:00
Jakob Jul Elben
663333c3b2
Fixes #5413 (#5315)
* Fix 5314

* Add contributor

* Resolve requested changes

Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>
2020-04-16 13:29:02 +02:00
Leander Fiedler
d60e2d3ebf issue5230 added unit test for dumping and loading knowledgebase 2020-04-12 09:08:41 +02:00
Leander Fiedler
d2bb649227 issue5230 filter warnings in addition to filterwarnings to prevent deprecation warnings in python35(win) setup to pop up 2020-04-10 23:21:13 +02:00
Leander Fiedler
ca2a7a44db issue5230 store string values of warnings to remotely debug failing python35(win) setup 2020-04-10 22:26:55 +02:00
Leander Fiedler
88ca40a15d issue5230 raise warnings as errors to remotely debug failing python35(win) setup 2020-04-10 21:45:53 +02:00
Leander Fiedler
a7bdfe42e1 issue5230 added print statement to warnings filter to remotely debug failing python35(win) setup 2020-04-10 21:14:33 +02:00
Leander Fiedler
8c1d0d628f issue5230 writer now checks instance of loc parameter before trying to operate on it 2020-04-10 20:35:52 +02:00
lfiedler
e1e25c7e30 issue5230: added unittest test case for completion 2020-04-06 21:36:02 +02:00
Leander Fiedler
cde96f6c64 issue5230: optimized unit test a bit 2020-04-06 20:51:12 +02:00
Leander Fiedler
71cc903d65 issue5230: replaced open statements on path objects so that serialization still works an files are closed 2020-04-06 20:30:41 +02:00
Leander Fiedler
273ed452bb issue5230: added unicode declaration at top of the file 2020-04-06 19:22:32 +02:00
Leander Fiedler
1cd975d4a5 issue5230: fixed resource warnings in language 2020-04-06 18:54:32 +02:00
Leander Fiedler
493c77462a issue5230: test cases
covering known sources of resource warnings
2020-04-06 18:46:51 +02:00
adrianeboyd
ce0e538068
Check whether doc is instantiated in Example.get_gold_parses() (#5167)
* Check whether doc is instantiated

When creating docs to pair with gold parses, modify test to check
whether a doc is unset rather than whether it contains tokens.

* Restore test of evaluate on an empty doc

* Set a minimal gold.orig for the scorer

Without a minimal gold.orig the scorer can't evaluate empty docs. This
is the v3 equivalent of #4925.
2020-03-29 13:57:00 +02:00
Sofie Van Landeghem
d6d95674c1
bugfix in span similarity (#5155)
* bugfix in span similarity

* also rewrite doc.pyx for clarity

* formatting
2020-03-29 13:56:07 +02:00
Sofie Van Landeghem
9b412516e7
Fixing pickling of the parser (#5218)
* fix __reduce__ for pickling parser

* setting the move object as 'state' during pickling

* unskip test_issue4725 - works again
2020-03-27 19:35:26 +01:00
Ines Montani
92b9b631ef xfail -> skip 2020-03-27 10:51:32 +01:00
Ines Montani
ee4bb0e3b6 Fix import 2020-03-26 21:44:18 +01:00
Ines Montani
4fe2299586 xfail hanging test 2020-03-26 20:58:13 +01:00
Ines Montani
f12a46472c Remove unicode declarations 2020-03-26 15:18:32 +01:00