Commit Graph

14033 Commits

Author SHA1 Message Date
Thomas Bird
cbb8c66da3 prevent the root logger from inialising 2020-12-15 19:50:34 +00:00
Adriane Boyd
1ddf2f39c7
Switch converters to generator functions (#6547)
* Switch converters to generator functions

To reduce the memory usage when converting large corpora, refactor the
convert methods to be generator functions.

* Update tests
2020-12-15 16:47:16 +08:00
Adriane Boyd
20e18cc246 Fix alignment and vector checks in debug data
* Update token alignment check to use Example alignment
* Update missing vector check further related to changes in v3
2020-12-15 09:43:14 +01:00
Raf Guns
ec876c9713 Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo 2020-12-14 22:03:58 +01:00
Raf Guns
db2a34d610 Update CITATION to Zenodo 2020-12-14 22:01:24 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Matthew Honnibal
8656a08777
Add beam_parser and beam_ner components for v3 (#6369)
* Get basic beam tests working

* Get basic beam tests working

* Compile _beam_utils

* Remove prints

* Test beam density

* Beam parser seems to train

* Draft beam NER

* Upd beam

* Add hypothesis as dev dependency

* Implement missing is-gold-parse method

* Implement early update

* Fix state hashing

* Fix test

* Fix test

* Default to non-beam in parser constructor

* Improve oracle for beam

* Start refactoring beam

* Update test

* Refactor beam

* Update nn

* Refactor beam and weight by cost

* Update ner beam settings

* Update test

* Add __init__.pxd

* Upd test

* Fix test

* Upd test

* Fix test

* Remove ring buffer history from StateC

* WIP change arc-eager transitions

* Add state tests

* Support ternary sent start values

* Fix arc eager

* Fix NER

* Pass oracle cut size for beam

* Fix ner test

* Fix beam

* Improve StateC.clone

* Improve StateClass.borrow

* Work directly with StateC, not StateClass

* Remove print statements

* Fix state copy

* Improve state class

* Refactor parser oracles

* Fix arc eager oracle

* Fix arc eager oracle

* Use a vector to implement the stack

* Refactor state data structure

* Fix alignment of sent start

* Add get_aligned_sent_starts method

* Add test for ae oracle when bad sentence starts

* Fix sentence segment handling

* Avoid Reduce that inserts illegal sentence

* Update preset SBD test

* Fix test

* Remove prints

* Fix sent starts in Example

* Improve python API of StateClass

* Tweak comments and debug output of arc eager

* Upd test

* Fix state test

* Fix state test
2020-12-13 09:08:32 +08:00
Ines Montani
85ca8c2bdd Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
Ines Montani
1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani
37c5d7e826
Merge pull request #6542 from adrianeboyd/chore/prepare-v2.3.5
Set version to v2.3.5
2020-12-11 10:33:18 +11:00
Ines Montani
fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip] 2020-12-11 10:20:35 +11:00
Ines Montani
76cfd89dea Update site.json 2020-12-11 10:19:42 +11:00
Ines Montani
c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
Ines Montani
43a69eecb7 Update site.json 2020-12-11 10:05:21 +11:00
Ines Montani
73896fcbc8 Update README.md 2020-12-11 10:05:19 +11:00
Ines Montani
25186fa431
Merge pull request #6543 from adrianeboyd/docs/install-v2
Docs and extras updates for v2.3.5
2020-12-11 09:53:53 +11:00
svlandeg
4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
svlandeg
d156b423ae remove gitter and reddit links 2020-12-10 20:41:02 +01:00
svlandeg
5afa567767 replace gitter with discussions in 101 2020-12-10 20:17:36 +01:00
svlandeg
ae1ccf2b04 update link to discussion forum 2020-12-10 20:02:49 +01:00
svlandeg
52cdb12d26 add GH discussions to readme 2020-12-10 19:58:43 +01:00
Adriane Boyd
27bb75e2a0 Docs and extras updates for v2.3.5
* Update install instructions for updated packages

* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Ines Montani
513c4e332a
Include custom code via spacy package command (#6531) 2020-12-10 20:36:46 +08:00
Adriane Boyd
7b277661f6 Set version to v2.3.5 2020-12-10 13:32:10 +01:00
Ines Montani
2a6043fabb
Merge pull request #6530 from explosion/feature/init-config-cpu-gpu 2020-12-10 09:38:46 +11:00
Ines Montani
dfe148935e
Merge pull request #6532 from adrianeboyd/feature/nlp-batch-size-setting 2020-12-10 09:01:58 +11:00
Ines Montani
9d32e839d3 Merge branch 'develop' into feature/init-config-cpu-gpu 2020-12-10 08:50:53 +11:00
Adriane Boyd
972820e2b3 Add batch_size to data formats docs 2020-12-09 12:44:04 +01:00
Adriane Boyd
80ac8af1bf Format 2020-12-09 12:44:01 +01:00
Adriane Boyd
795b5bd049
Update website/docs/api/language.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-12-09 12:23:32 +01:00
Adriane Boyd
6ee6e41234 Update docstring for Language.evaluate 2020-12-09 10:21:39 +01:00
Adriane Boyd
fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani
e09588e6ca Update README.md [ci skip] 2020-12-09 13:10:49 +11:00
Ines Montani
f2571b5ec4
Merge pull request #6444 from adrianeboyd/chore/update-develop-from-master 2020-12-09 13:09:58 +11:00
Ines Montani
d1a0e2f116 Don't build 3.9 for now 2020-12-09 12:10:48 +11:00
Ines Montani
90171f2031
Merge pull request #6528 from svlandeg/feature/pipe_fill_config 2020-12-09 12:01:22 +11:00
Ines Montani
dfaef27f90
Merge pull request #6503 from adrianeboyd/feature/lemmatizer-rule-warning-pos
Warn on empty POS for the rule-based lemmatizer
2020-12-09 11:34:16 +11:00
Ines Montani
271923eaea Fix retokenizer 2020-12-09 11:29:55 +11:00
Ines Montani
b85bd63eca Fix test 2020-12-09 11:24:01 +11:00
Ines Montani
febf71af28 Fix test 2020-12-09 11:23:07 +11:00
Ines Montani
04b3068747 Revert landing [ci skip] 2020-12-09 11:20:45 +11:00
Ines Montani
1da1568110 Remove tag map 2020-12-09 11:13:49 +11:00
Ines Montani
34449b66fd Update matcher.md 2020-12-09 11:09:45 +11:00
Ines Montani
1980203229 Merge branch 'master' into pr/6444 2020-12-09 11:09:40 +11:00
Ines Montani
05a2812ae0 Merge branch 'develop' into pr/6444 2020-12-09 11:04:03 +11:00
Ines Montani
758ad6c3cd Make CPU the default for init config 2020-12-09 11:00:51 +11:00
Ines Montani
5d605d539d Remove output_file from init_config helper 2020-12-09 10:57:55 +11:00
Sofie Van Landeghem
cfc72c2995
Bugfix multi-label textcat reproducibility (#6481)
* add test for multi-label textcat reproducibility

* remove positive_label

* fix lengths dtype

* fix comments

* remove comment that we should not have forgotten :-)
2020-12-09 06:29:15 +08:00
Sofie Van Landeghem
de108ed3e8
Add specific error when StaticVectors can't read the vectors data (#6450) 2020-12-09 06:16:07 +08:00
Koichi Yasuoka
0afb54ac93
JapaneseTokenizer.pipe added (#6515)
* JapaneseTokenizer.pipe added

For [spacymoji](https://spacy.io/universe/project/spacymoji)  with `Japanese()`.

* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00