Commit Graph

2748 Commits

Author SHA1 Message Date
Ines Montani
d5ef245bb1
Merge pull request #6822 from jganseman/master [ci skip] 2021-01-27 13:04:30 +11:00
Ines Montani
5d79d1af50
Merge pull request #6796 from svlandeg/docs/benchmarks [ci skip] 2021-01-27 13:01:23 +11:00
Ines Montani
1ed7029d47 Update website for v3 launch 2021-01-27 12:39:47 +11:00
Adriane Boyd
c447aa2b98 Update --code arg in evaluate CLI docs 2021-01-26 15:30:46 +01:00
jganseman
907bce7a78
Merge pull request #1 from jganseman/patch-1
Patch 1
2021-01-26 11:12:30 +01:00
jganseman
8bc57ec372
also update is_oov in lexeme docs 2021-01-26 11:09:16 +01:00
jganseman
1f2b0ec168
proposing a more concise explanation for is_oov
proposing a more concise explanation for is_oov
2021-01-26 10:53:39 +01:00
Matthew Honnibal
f049df1715
Revert "Set annotations in update" (#6810)
* Revert "Set annotations in update (#6767)"

This reverts commit e680efc7cc.

* Fix version

* Update spacy/pipeline/entity_linker.py

* Update spacy/pipeline/entity_linker.py

* Update spacy/pipeline/tagger.pyx

* Update spacy/pipeline/tok2vec.py

* Update spacy/pipeline/tok2vec.py

* Update spacy/pipeline/transition_parser.pyx

* Update spacy/pipeline/transition_parser.pyx

* Update website/docs/api/multilabel_textcategorizer.md

* Update website/docs/api/tok2vec.md

* Update website/docs/usage/layers-architectures.md

* Update website/docs/usage/layers-architectures.md

* Update website/docs/api/transformer.md

* Update website/docs/api/textcategorizer.md

* Update website/docs/api/tagger.md

* Update spacy/pipeline/entity_linker.py

* Update website/docs/api/sentencerecognizer.md

* Update website/docs/api/pipe.md

* Update website/docs/api/morphologizer.md

* Update website/docs/api/entityrecognizer.md

* Update spacy/pipeline/entity_linker.py

* Update spacy/pipeline/multitask.pyx

* Update spacy/pipeline/tagger.pyx

* Update spacy/pipeline/tagger.pyx

* Update spacy/pipeline/textcat.py

* Update spacy/pipeline/textcat.py

* Update spacy/pipeline/textcat.py

* Update spacy/pipeline/tok2vec.py

* Update spacy/pipeline/trainable_pipe.pyx

* Update spacy/pipeline/trainable_pipe.pyx

* Update spacy/pipeline/transition_parser.pyx

* Update spacy/pipeline/transition_parser.pyx

* Update website/docs/api/entitylinker.md

* Update website/docs/api/dependencyparser.md

* Update spacy/pipeline/trainable_pipe.pyx
2021-01-25 22:18:45 +08:00
Adriane Boyd
61c9f8bf24
Remove transformers model max length section (#6807) 2021-01-25 19:59:34 +08:00
muratjumashev
7d0154a36e Added language meta data 2021-01-25 00:42:19 +06:00
svlandeg
56064faed9 update caption 2021-01-23 00:57:00 +01:00
svlandeg
d7c0f40a96 update comment 2021-01-22 18:55:18 +01:00
svlandeg
a071279bc7 add speed comparison to docs 2021-01-22 18:46:35 +01:00
svlandeg
b132cb3036 update accuracies for new a1 models 2021-01-21 20:24:05 +01:00
Adriane Boyd
d0236136a2
Fix default config init in Transformer API docs (#6781) 2021-01-21 23:18:03 +08:00
Sofie Van Landeghem
e680efc7cc
Set annotations in update (#6767)
* bump to 3.0.0rc4

* do set_annotations in component update calls

* update docs and remove set_annotations flag

* fix EL test
2021-01-20 11:49:25 +11:00
Sofie Van Landeghem
57640aa838
warn when frozen components break listener pattern (#6766)
* warn when frozen components break listener pattern

* few notes in the documentation

* update arg name

* formatting

* cleanup

* specify listeners return type
2021-01-20 11:12:35 +11:00
Ines Montani
4a1029a9b6 Add infobox [ci skip] 2021-01-19 19:18:39 +11:00
Adriane Boyd
7cd5c9e098 Add xx_sent_ud_sm model to website 2021-01-19 09:02:35 +01:00
Ines Montani
76e25afcd7
Merge pull request #6757 from adrianeboyd/docs/mk-ru-langs [ci skip]
Update languages for website
2021-01-19 11:10:48 +11:00
Ines Montani
f50502dad7 Update docs [ci skip] 2021-01-19 00:22:47 +11:00
Adriane Boyd
e8f6400923 Update languages for website
* Add Macedonian
* Add Russian dependencies
* Switch Chinese dependency to spacy-pkuseg
2021-01-18 14:09:34 +01:00
Ines Montani
2ae8dfbb93 Fix website [ci skip] 2021-01-18 22:31:32 +11:00
Ines Montani
09cacbb7ee Fix website [ci skip] 2021-01-18 11:37:04 +11:00
Sofie Van Landeghem
fed8f48965
raise NotImplementedError when noun_chunks iterator is not implemented (#6711)
* raise NotImplementedError when noun_chunks iterator is not implemented

* bring back, fix and document span.noun_chunks

* formatting

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-01-17 19:56:05 +08:00
Adriane Boyd
bf0cdae8d4
Add token_splitter component (#6726)
* Add long_token_splitter component

Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.

The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.

Notes:

* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.

* Adjust API, add test

* Fix name in factory
2021-01-17 19:54:41 +08:00
Adriane Boyd
9328dd5625
Handle unset token.morph in Morphologizer (#6704)
* Handle unset token.morph in Morphologizer

Handle unset `token.morph` in `Morphologizer.initialize` and
`Morphologizer.get_loss`. If both `token.morph` and `token.pos` are
unset, treat the annotation as missing rather than empty.

* Add token.has_morph()
2021-01-15 17:20:10 +01:00
Adriane Boyd
0c936004d1 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
Matthew Honnibal
f277bfdf0f
Add SpanGroup and Graph container types to represent arbitrary annotations (#6696)
* Draft out initial Spans data structure

* Initial span group commit

* Basic span group support on Doc

* Basic test for span group

* Compile span_group.pyx

* Draft addition of SpanGroup to DocBin

* Add deserialization for SpanGroup

* Add tests for serializing SpanGroup

* Fix serialization of SpanGroup

* Add EdgeC and GraphC structs

* Add draft Graph data structure

* Compile graph

* More work on Graph

* Update GraphC

* Upd graph

* Fix walk functions

* Let Graph take nodes and edges on construction

* Fix walking and getting

* Add graph tests

* Fix import

* Add module with the SpanGroups dict thingy

* Update test

* Rename 'span_groups' attribute

* Try to fix c++11 compilation

* Fix test

* Update DocBin

* Try to fix compilation

* Try to fix graph

* Improve SpanGroup docstrings

* Add doc.spans to documentation

* Fix serialization

* Tidy up and add docs

* Update docs [ci skip]

* Add SpanGroup.has_overlap

* WIP updated Graph API

* Start testing new Graph API

* Update Graph tests

* Update Graph

* Add docstring

Co-authored-by: Ines Montani <ines@ines.io>
2021-01-14 17:30:41 +11:00
Ines Montani
29c3ca7e34 Fix SVG integration [ci skip] 2021-01-14 13:33:41 +11:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet (#6702)
* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Adriane Boyd
a45d89f09a Add initialize.before_init and after_init callbacks
Add `initialize.before_init` and `initialize.after_init` callbacks to
the config. The `initialize.before_init` callback is a place to
implement one-time tokenizer customizations that are then saved with the
model.
2021-01-12 13:07:44 +01:00
Sofie Van Landeghem
a612a5ba3f
fix small typos (#6698) 2021-01-08 09:39:47 +01:00
Sofie Van Landeghem
75d9019343
Fix types of Tok2Vec encoding architectures (#6442)
* fix TorchBiLSTMEncoder documentation

* ensure the types of the encoding Tok2vec layers are correct

* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem
82ae95267a
Docs for pretrain architectures (#6605)
* document pretraining architectures

* formatting

* bit more info

* small fixes
2021-01-06 16:12:30 +11:00
Sofie Van Landeghem
afc5714d32
multi-label textcat component (#6474)
* multi-label textcat component

* formatting

* fix comment

* cleanup

* fix from #6481

* random edit to push the tests

* add explicit error when textcat is called with multi-label gold data

* fix error nr

* small fix
2021-01-06 13:07:14 +11:00
Ines Montani
6f83abb971
Merge pull request #6647 from svlandeg/feature/init_config_overwrite 2021-01-05 14:59:04 +11:00
Ines Montani
3614472e29
Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip] 2021-01-05 13:52:49 +11:00
Ines Montani
9c078a5885
Update formatting for consistency [ci skip] 2021-01-05 13:52:28 +11:00
Ines Montani
a9e845426f Use --force for consistency and add docs 2021-01-05 13:49:59 +11:00
svlandeg
d5ff0fecf8 add docs 2020-12-30 14:01:13 +01:00
svlandeg
2fa23b0304 fix capitalization for link 2020-12-29 15:01:22 +01:00
svlandeg
43cc6aea93 remove non-existing link 2020-12-29 14:59:39 +01:00
svlandeg
543073bf9d add pretrain example 2020-12-29 14:51:23 +01:00
svlandeg
1d0ef98873 move example 2020-12-29 14:46:03 +01:00
svlandeg
20113b8063 add train CLI example 2020-12-29 14:44:56 +01:00
Sofie Van Landeghem
87562e470d
fix backticks in docs (#6635) 2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk (#6634) 2020-12-27 22:01:06 +01:00
Sofie Van Landeghem
282a3b49ea
Fix parser resizing when there is no upper layer (#6460)
* allow resizing of the parser model even when upper=False

* update from spacy.TransitionBasedParser.v1 to v2

* bugfix
2020-12-18 18:56:57 +08:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode (#6591)
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Jeno Pizarro
a6fe35a0f9
Update universe.json 2020-12-15 21:53:20 -05:00
Jeno Pizarro
343a44abe9 Merge branch 'master' of https://github.com/explosion/spaCy 2020-12-15 21:49:46 -05:00
Ines Montani
85ca8c2bdd Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
Ines Montani
fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip] 2020-12-11 10:20:35 +11:00
Ines Montani
76cfd89dea Update site.json 2020-12-11 10:19:42 +11:00
Ines Montani
43a69eecb7 Update site.json 2020-12-11 10:05:21 +11:00
svlandeg
d156b423ae remove gitter and reddit links 2020-12-10 20:41:02 +01:00
svlandeg
5afa567767 replace gitter with discussions in 101 2020-12-10 20:17:36 +01:00
svlandeg
ae1ccf2b04 update link to discussion forum 2020-12-10 20:02:49 +01:00
Adriane Boyd
27bb75e2a0 Docs and extras updates for v2.3.5
* Update install instructions for updated packages

* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Ines Montani
513c4e332a
Include custom code via spacy package command (#6531) 2020-12-10 20:36:46 +08:00
Ines Montani
2a6043fabb
Merge pull request #6530 from explosion/feature/init-config-cpu-gpu 2020-12-10 09:38:46 +11:00
Ines Montani
9d32e839d3 Merge branch 'develop' into feature/init-config-cpu-gpu 2020-12-10 08:50:53 +11:00
Adriane Boyd
972820e2b3 Add batch_size to data formats docs 2020-12-09 12:44:04 +01:00
Adriane Boyd
80ac8af1bf Format 2020-12-09 12:44:01 +01:00
Adriane Boyd
795b5bd049
Update website/docs/api/language.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-12-09 12:23:32 +01:00
Adriane Boyd
fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani
04b3068747 Revert landing [ci skip] 2020-12-09 11:20:45 +11:00
Ines Montani
34449b66fd Update matcher.md 2020-12-09 11:09:45 +11:00
Ines Montani
1980203229 Merge branch 'master' into pr/6444 2020-12-09 11:09:40 +11:00
Ines Montani
05a2812ae0 Merge branch 'develop' into pr/6444 2020-12-09 11:04:03 +11:00
Ines Montani
758ad6c3cd Make CPU the default for init config 2020-12-09 11:00:51 +11:00
Ines Montani
8921364579
Merge pull request #6521 from explosion/feature/config-stdin
Allow reading config from stdin in spacy train
2020-12-08 22:07:43 +11:00
Ines Montani
94a5a9814f Update argument handling and documentation 2020-12-08 20:41:18 +11:00
Adriane Boyd
5ceac425ee Remove non-working --use-chars from train CLI
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
2020-12-08 08:30:00 +01:00
Ines Montani
ef59ce783b Adjust install instructions [ci skip] 2020-12-08 18:06:50 +11:00
Sofie Van Landeghem
2c27093c5f
require_cpu functionality (#6336)
* add require_cpu from Thinc 8.0.0rc2

* add docs

* fix test if cupy is not installed
2020-12-08 14:42:40 +08:00
Ines Montani
d8e01ca931
Merge pull request #6391 from adrianeboyd/docs/install-guide 2020-12-08 07:42:16 +01:00
Ines Montani
ee2ec52f48
Merge pull request #6409 from svlandeg/feature/trf-docs 2020-12-08 06:32:10 +01:00
Ines Montani
c2b196c2c1
Merge pull request #6419 from svlandeg/feature/rel-docs 2020-12-08 06:30:41 +01:00
Ines Montani
82e88f0e3b
Merge pull request #6379 from svlandeg/fix/labels-constructor 2020-12-08 06:29:56 +01:00
Adriane Boyd
1442d2f213
Improve simple training example in v3 migration (#6438)
* Create the examples once
* Use the examples in the initialization
* Provide the batch size
* Fix `begin_training` migration example
2020-11-30 09:39:45 +08:00
Adriane Boyd
03ae77e603
Add SPACY as a Matcher attribute (#6463) 2020-11-30 09:34:50 +08:00
Ines Montani
d21d2c2e59 Don't multiply accuracy by 100 2020-11-27 15:15:51 +08:00
Adriane Boyd
724831b066 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
fe9009911a Update rule-based-matching.md (#6421)
* Update rule-based-matching.md

Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc).

Clarified "Type" column header to "Value Type"

* Update rule-based-matching.md

Improved clarity of wording
2020-11-24 16:20:19 +01:00
Adriane Boyd
6f133877aa Update source install instructions
* Don't recommend an editable install in the default source
instructions.
* Use `pip install --no-build-isolation` for editable installs.
* Remove reference to `virtualenv`.
2020-11-24 14:44:13 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
svlandeg
218abaa69a typo 2020-11-20 22:36:49 +01:00
svlandeg
e861e928df more small corrections 2020-11-20 22:29:58 +01:00
svlandeg
5ac0867427 final fixes 2020-11-20 22:18:53 +01:00
svlandeg
331ec83493 edits and updates to implementing REL component docs 2020-11-20 21:41:52 +01:00
svlandeg
4a3e611abc small fixes and formatting 2020-11-20 15:55:05 +01:00
svlandeg
124f49feb6 update REL model code 2020-11-20 15:25:20 +01:00
svlandeg
636be3c791 Merge remote-tracking branch 'upstream/develop' into feature/trf-docs 2020-11-19 14:15:35 +01:00
Sofie Van Landeghem
165993d8e5
fix typo in transformer docs (#6404) 2020-11-19 14:11:38 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Adriane Boyd
3cf6479467 Fix JSON in #6395 2020-11-17 15:25:41 +01:00
Sam Edwardes
78913a4f95
Added spaCyTextBlob to universe.json (#6395) 2020-11-17 14:38:34 +01:00
Adriane Boyd
96726ec1f6
Fix DocBin init in training example (#6396) 2020-11-17 14:36:44 +01:00
Adriane Boyd
ed32fa80cd Update source install instructions
* Use `pip install` instead of `python setup.py install`
* For developers recommend:
  * `python setup.py build_ext --inplace -j N`
  * `python setup.py develop`
2020-11-16 10:13:51 +01:00
svlandeg
99d0412b6e add link to REL project 2020-11-15 18:35:56 +01:00
svlandeg
73fc1ed963 remove labels from morphologizer constructor 2020-11-11 21:48:50 +01:00
svlandeg
fcd79e0655 remove set_morphology from docs 2020-11-11 21:32:34 +01:00
Ines Montani
3ca5c7082d Use pip install . in quickstart [ci skip] 2020-11-10 17:27:49 +08:00
Ines Montani
de6453940e
Merge pull request #6305 from svlandeg/feature/score-docs [ci skip] 2020-11-10 02:52:11 +01:00
Ines Montani
4d337eedf2
Merge pull request #6322 from medspacy/master 2020-11-10 02:47:29 +01:00
Ines Montani
d7950c5ada
Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip] 2020-11-10 02:45:52 +01:00
Ines Montani
448bfbdc30 Remove conda from nightly install widget [ci skip] 2020-11-10 09:44:52 +08:00
svlandeg
789fb3d124 add docs for upstream argument of TransformerListener 2020-11-09 21:42:58 +01:00
Ines Montani
363ac73c72 Update docs [ci skip] 2020-11-09 12:43:26 +08:00
Adriane Boyd
8644ee3e3f
Update TIGER link and tag description (#6344) 2020-11-05 09:33:00 +01:00
Sofie Van Landeghem
8ef056cf98
fix embed_size in Entity Linker architecture (#6343) 2020-11-04 22:20:13 +01:00
Ines Montani
019a1dd5e8 Fix v3 overview [ci skip] 2020-11-03 18:10:06 +01:00
Adriane Boyd
a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
Alec Chapman
204c7c8a00 fix thumbnail link to be github raw url 2020-11-01 07:53:48 -07:00
Alec Chapman
73d22d96ff add medspacy to universe and fix example w/ cov-bsv 2020-10-29 07:53:56 -06:00
Adriane Boyd
8cc5ed6771 Add Macedonian to website languages 2020-10-29 08:49:56 +01:00
Adriane Boyd
dc816bba9d
Fix node name typo in dependency matcher example (#6311) 2020-10-28 16:32:46 +01:00
Adriane Boyd
4dd86306e9
Add Nepali to supported languages on website (#6315) 2020-10-28 16:32:07 +01:00
svlandeg
77688b0072 fix config 2020-10-26 11:14:34 +01:00
svlandeg
5878ff6bcd cleanup 2020-10-26 11:13:02 +01:00
svlandeg
e95d9caa87 small edits 2020-10-26 11:09:25 +01:00
svlandeg
a664994a81 adding score method to explanation of new component 2020-10-26 10:52:47 +01:00
Adriane Boyd
253480353c Remove zh from quickstart extras 2020-10-23 11:39:25 +02:00
Adriane Boyd
af26886fff Fix formatting 2020-10-23 11:38:14 +02:00
Adriane Boyd
c0b76f4c19 Add install step to "Compile from source" 2020-10-23 11:36:36 +02:00
Adriane Boyd
8fe7ede667 Add install step to source install quickstart 2020-10-23 11:34:43 +02:00
Adriane Boyd
4299a7f654 Setup / install / quickstart updates
* Add `cuda110` to setup.cfg and quickstart dropdown
* Switch to `pip` for pip-only packages in conda quickstart instructions
* Update zh pkuseg install message with version range and conda
* Remove `zh` from `extras_require` because the default doesn't require
additional packages
2020-10-23 11:27:54 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
Ines Montani
6523f2daac
Merge pull request #6273 from adrianeboyd/bugfix/detailed-scores-in-evaluate2 2020-10-20 10:03:09 +02:00
Adriane Boyd
fbe65b257b Convert accuracy numbers on website models page 2020-10-19 18:55:55 +02:00
Ines Montani
b6b1c1e23c
Merge pull request #6271 from walterhenry/develop-proof [ci skip] 2020-10-19 16:31:43 +02:00
walterhenry
db24dc5614 Proofread remarks
I think these may the last remarks for the nightly docs. Only two minor things actually.
2020-10-19 11:11:32 +02:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Ines Montani
e2f3c4e12d Fix robots [ci skip] 2020-10-16 17:44:13 +02:00
Adriane Boyd
e896803792 Add and update website license links 2020-10-16 17:01:52 +02:00
Ines Montani
c655742b8b Remove docs references to starters for now (see #6262) [ci skip] 2020-10-16 15:46:34 +02:00
Ines Montani
3851300e80 Update landing [ci skip] 2020-10-16 11:46:33 +02:00
Ines Montani
c968d1560f Fix docs example [ci skip] 2020-10-16 11:33:20 +02:00
Ines Montani
ba1e004049 Fix typo [ci skip] 2020-10-15 23:39:04 +02:00
Ines Montani
32dc4f4796 Sort models sidebar alphabetically [ci skip] 2020-10-15 22:47:16 +02:00
Ines Montani
20f80587d6
Merge pull request #6257 from walterhenry/develop-proof
A few tiny typo fixes to push through with release of nightly
2020-10-15 18:17:30 +02:00
walterhenry
75b7f86383 Three small typos
Some little typos since v3.0 is out.
2020-10-15 18:06:37 +02:00
Ines Montani
09dbbe75d7 Update docs [ci skip] 2020-10-15 17:27:24 +02:00
Ines Montani
7f05ccc170 Update docs [ci skip] 2020-10-15 12:35:30 +02:00
Ines Montani
4fa869e6f7 Update docs [ci skip] 2020-10-15 11:16:06 +02:00
Ines Montani
178760855f Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
Ines Montani
abeafcbc08 Update docs [ci skip] 2020-10-15 08:58:30 +02:00
Ines Montani
050aa1e0e2 Update languages.json [ci skip] 2020-10-14 20:51:50 +02:00
Ines Montani
a966c271f7 Update models docs [ci skip] 2020-10-14 20:50:23 +02:00
Ines Montani
a2d4aaee70
Apply suggestions from code review 2020-10-14 19:51:36 +02:00
Ines Montani
d94e241fce Merge branch 'develop' into pr/6253 2020-10-14 16:55:46 +02:00
Ines Montani
cb47f25cda
Merge pull request #6252 from svlandeg/fix/docs 2020-10-14 16:43:12 +02:00
walterhenry
6af585dba5 New batch of proofs
Just tiny fixes to the docs as a proofreader
2020-10-14 16:37:57 +02:00
svlandeg
478a14a619 fix few typos 2020-10-14 15:01:19 +02:00
Ines Montani
1aa8e8f2af Update docs [ci skip] 2020-10-14 14:58:45 +02:00
Ines Montani
4d99d2b94a Update docs [ci skip] 2020-10-13 11:38:52 +02:00
svlandeg
40276fd3be update NEL docs after latest refactor 2020-10-12 11:41:27 +02:00
svlandeg
08cb085f6c Merge remote-tracking branch 'upstream/develop' into fix/various 2020-10-09 17:01:27 +02:00
Ines Montani
97ff090e49 Fix docs example [ci skip] 2020-10-09 16:03:57 +02:00
Ines Montani
9fb3244672
Merge pull request #6231 from adrianeboyd/feature/include-static-vectors 2020-10-09 15:54:52 +02:00
Adriane Boyd
2dd79454af Update docs 2020-10-09 14:42:07 +02:00
svlandeg
853edace37 fix MultiHashEmbed example in documentation 2020-10-09 14:11:06 +02:00
Ines Montani
e50dc2c1c9 Update docs [ci skip] 2020-10-09 12:04:52 +02:00
Ines Montani
7c52def5da
Merge pull request #6227 from adrianeboyd/chore/update-3.0.0a36-from-master 2020-10-09 10:49:20 +02:00
Ines Montani
329b61ee7b Update docs [ci skip] 2020-10-09 10:36:06 +02:00
Šarūnas Navickas
287ba94a2f Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-09 10:14:40 +02:00
delzac
668507be1b Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-09 10:14:40 +02:00
Sofie Van Landeghem
d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani
5ebd1fc2cf Update docs [ci skip] 2020-10-08 16:23:12 +02:00
Ines Montani
741796e500 Update docs [ci skip] 2020-10-08 14:31:34 +02:00
Ines Montani
d1602e1ece Update docs [ci skip] 2020-10-08 11:56:50 +02:00
Ines Montani
064575d79d
Merge pull request #6216 from svlandeg/feature/nel-initialize 2020-10-08 11:14:12 +02:00
Ines Montani
43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
svlandeg
eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
svlandeg
bcaad28eda fix typos 2020-10-07 13:05:37 +02:00
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Ines Montani
ce14520789 Update docs [ci skip] 2020-10-06 14:35:17 +02:00
Ines Montani
2a17566da3 Update docs [ci skip] 2020-10-06 14:15:08 +02:00
Ines Montani
967377287a
Merge pull request #6210 from adrianeboyd/docs/various-v3-3 [ci skip] 2020-10-06 11:28:45 +02:00
Adriane Boyd
aa9c9f3bf0 Update Chinese usage for spacy-pkuseg 2020-10-06 11:21:17 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-06 11:19:36 +02:00
Ines Montani
2fd7122074 Update docs [ci skip] 2020-10-06 10:31:48 +02:00
Ines Montani
568e12215d
Merge pull request #6206 from svlandeg/fix/patterns-init 2020-10-06 10:27:23 +02:00
Ines Montani
2e961817cb Update docs [ci skip] 2020-10-06 10:23:01 +02:00
svlandeg
9b4cf7b0b6 update output of debug config command 2020-10-06 09:47:23 +02:00
svlandeg
fd0f60e2bc updates to data format for training and pretraining 2020-10-06 09:28:53 +02:00
svlandeg
ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
Ines Montani
1a554bdcb1 Update docs and docstring [ci skip] 2020-10-05 21:55:27 +02:00
Ines Montani
181039bd17
Merge pull request #6205 from explosion/feature/embed-features 2020-10-05 21:49:10 +02:00
Ines Montani
5ba418b08c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-05 21:44:01 +02:00
Ines Montani
8a39d5414e Update quickstart [ci skip] 2020-10-05 21:43:51 +02:00
Ines Montani
9ca283a899 Merge branch 'develop' into feature/project-spacy-version 2020-10-05 21:06:07 +02:00
Ines Montani
9aa07ad001 Update quickstarts [ci skip] 2020-10-05 21:05:41 +02:00
Ines Montani
706b7f6973 Update docs 2020-10-05 20:51:22 +02:00
Matthew Honnibal
919790cb47 Upd MultiHashEmbed docs 2020-10-05 20:28:21 +02:00
svlandeg
193e0d5a98 add docs for entity_ruler.initialize 2020-10-05 18:04:08 +02:00
svlandeg
65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
Ines Montani
e3acad6264 Update docs [ci skip] 2020-10-05 13:06:20 +02:00
Ines Montani
0f64556c04
Merge pull request #6197 from svlandeg/feature/pipe-docs [ci skip] 2020-10-05 11:55:40 +02:00
svlandeg
9a6c9b133b various small fixes 2020-10-05 01:05:37 +02:00
svlandeg
52b660e9dc initialize and update explanation 2020-10-05 00:39:36 +02:00
Ines Montani
3c36a57e84
Update data augmenters (#6196)
* Draft lower-case augmenter

* Make warning a debug log

* Update lowercase augmenter, docs and tests

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-04 17:46:29 +02:00
svlandeg
b0463fbf75 set_annotations explanation 2020-10-04 14:56:48 +02:00
Ines Montani
43d7652635
Merge pull request #6192 from explosion/feature/init-attr-ruler 2020-10-04 14:46:37 +02:00
Ines Montani
9b3a934361 Update docs [ci skip] 2020-10-04 14:14:55 +02:00
svlandeg
9f40d963fd highlight the two steps: the model and the pipeline component 2020-10-04 14:11:53 +02:00
Ines Montani
11347f34da Tidy up, tests and docs 2020-10-04 13:54:05 +02:00
svlandeg
452b8309f9 slight rewrite to hide some thinc implementation details 2020-10-04 13:26:46 +02:00
svlandeg
08ad349a18 tok2vec layer 2020-10-04 00:08:02 +02:00
svlandeg
2c4b2ee5e9 REL intro and get_candidates function 2020-10-03 23:27:05 +02:00
Ines Montani
989c59918c Update docs [ci skip] 2020-10-03 18:53:39 +02:00
Ines Montani
7c4ab7e82c Fix Lemmatizer.get_lookups_config 2020-10-03 17:16:10 +02:00
Ines Montani
dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani
3b8f352eda Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-03 16:08:27 +02:00
Ines Montani
35d695a031 Update docs 2020-10-03 16:08:24 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani
5fb776556a Update docs [ci skip] 2020-10-03 14:47:02 +02:00
Ines Montani
5413358ba1
Merge pull request #6188 from svlandeg/feature/small-fixes 2020-10-03 11:44:24 +02:00
Ines Montani
eb9b3ff9c5 Update install docs and quickstarts [ci skip] 2020-10-03 11:35:42 +02:00
svlandeg
02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
Sofie Van Landeghem
09dcb75076
small UX fix for DocBin (#6167)
* add informative warning when messing up store_user_data DocBin flags

* add informative warning when messing up store_user_data DocBin flags

* cleanup test

* rename to patterns_path
2020-10-02 15:43:32 +02:00
Ines Montani
f0b30aedad
Make lemmatizers use initialize logic (#6182)
* Make lemmatizer use initialize logic and tidy up

* Fix typo

* Raise for uninitialized tables
2020-10-02 15:42:36 +02:00
Ines Montani
df06f7a792 Update docs [ci skip] 2020-10-02 13:24:33 +02:00
Ines Montani
d2aa662ab2
Merge pull request #6179 from adrianeboyd/feature/token-morph-refactor-2 [ci skip] 2020-10-02 12:10:27 +02:00
Ines Montani
0f11c2150d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-02 11:38:05 +02:00
Ines Montani
32cdc1c4f4 Update docs [ci skip] 2020-10-02 11:38:03 +02:00
Ines Montani
6d8df081bd
Merge pull request #6180 from adrianeboyd/docs/minor-v3-2 [ci skip] 2020-10-02 11:37:25 +02:00
Adriane Boyd
351f352cdc Update Japanese docs and pin for sudachipy 2020-10-02 10:12:44 +02:00
Adriane Boyd
7670df04dd Update Chinese usage docs 2020-10-02 10:09:03 +02:00
Adriane Boyd
3908fff899 Remove tag map sidebar 2020-10-02 09:07:55 +02:00
Adriane Boyd
fd09e6b140 Update docs for Token.morph / Token.set_morph 2020-10-02 09:05:15 +02:00
Ines Montani
01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Ines Montani
6b94cee468 Fix docs [ci skip] 2020-10-02 01:11:19 +02:00
Ines Montani
50162b8726 Try to work around Sharp build issue [ci skip] 2020-10-01 22:27:45 +02:00
Ines Montani
b6b73a3ca8 Update docs [ci skip] 2020-10-01 17:45:29 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
svlandeg
1328c9fd14 consistently use --code instead of --code-path 2020-10-01 16:59:22 +02:00
Sofie Van Landeghem
a22215f427
Add FeatureExtractor from Thinc (#6170)
* move featureextractor from Thinc

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
Ines Montani
0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani
a103ab5f1a Update augmenter lookups and docs 2020-09-30 23:03:47 +02:00
Ines Montani
115481aca7 Update docs [ci skip] 2020-09-30 15:16:00 +02:00
walterhenry
1c65b3b2c0 Proofreading
A few more small things in Usage.
2020-09-30 11:33:40 +02:00
Ines Montani
469f0e539c Fix docs [ci skip] 2020-09-30 10:24:06 +02:00
Ines Montani
9bb958fd0a Fix debug data [ci skip] 2020-09-29 23:07:11 +02:00
Ines Montani
604be54a5c Support --code in evaluate CLI [ci skip] 2020-09-29 21:20:56 +02:00
Ines Montani
d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani
361f91e286
Merge pull request #6135 from walterhenry/develop-proof 2020-09-29 20:49:06 +02:00
Ines Montani
b486389eec
Update website/docs/api/doc.md 2020-09-29 20:48:43 +02:00
Ines Montani
d7469283c5 Update docs [ci skip] 2020-09-29 16:59:21 +02:00
Sofie Van Landeghem
6a04e5adea
encoding UTF8 (#6161) 2020-09-29 14:49:55 +02:00
walterhenry
1d80b3dc1b Proofreading
Finished with the API docs and started on the Usage, but Embedding & Transformers
2020-09-29 12:39:10 +02:00
walterhenry
c1c841940c Merge branch 'develop-proof' of https://github.com/walterhenry/spaCy into develop-proof 2020-09-29 11:47:43 +02:00
svlandeg
64d90039a1 encoding UTF8 2020-09-29 10:54:42 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
walterhenry
3360825e00 Proofreading
Another round of proofreading. All the API docs have been read through and I've grazed the Usage docs.
2020-09-28 16:50:15 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Ines Montani
f29d5b9b89 Update docs [ci skip] 2020-09-27 18:39:38 +02:00
Ines Montani
e06ff8b71d Update docs [ci skip] 2020-09-26 13:18:08 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani
02a1b6ab83 Update links [ci skip] 2020-09-25 13:21:43 +02:00
Ines Montani
2cfe9340a1 Link model components [ci skip] 2020-09-25 13:21:20 +02:00
Ines Montani
c7956a4047 Update models.js [ci skip] 2020-09-25 09:25:46 +02:00
Ines Montani
27c5795ea5 Fix version check in models directory [ci skip] 2020-09-25 09:23:29 +02:00
Ines Montani
2aa4d65734 Update docs [ci skip] 2020-09-24 20:41:09 +02:00
Adriane Boyd
3c062b3911
Add MORPH handling to Matcher (#6107)
* Add MORPH handling to Matcher

* Add `MORPH` to `Matcher` schema
* Rename `_SetMemberPredicate` to `_SetPredicate`
* Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate`
  * Add special handling for normalization and conversion of morph
    values into sets
  * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only
    matches for 0 or 1 values

* Update test

* Rename to IS_SUBSET and IS_SUPERSET
2020-09-24 16:55:09 +02:00
Sofie Van Landeghem
c7eedd3534
updates to NEL functionality (#6132)
* NEL: read sentences and ents from reference

* fiddling with sent_start annotations

* add KB serialization test

* KB write additional file with strings.json

* score_links function to calculate NEL P/R/F

* formatting

* documentation
2020-09-24 16:53:59 +02:00
Ines Montani
6bc5058d13 Update models directory [ci skip] 2020-09-24 14:53:34 +02:00
Ines Montani
58dde293ce
Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2 2020-09-24 14:44:42 +02:00
Ines Montani
74e1f192b4
Merge pull request #6134 from explosion/feature/training_before_to_disk 2020-09-24 14:44:11 +02:00
Ines Montani
3b58a8be2b Update docs 2020-09-24 14:32:42 +02:00
Ines Montani
88e54caa12 accuracy -> performance 2020-09-24 14:32:35 +02:00
Ines Montani
b92c8aae78 Merge branch 'develop' into pr/6135 2020-09-24 13:44:56 +02:00
Ines Montani
6836b66433 Update docs and resolve todos [ci skip] 2020-09-24 13:41:25 +02:00
walterhenry
3dd5f409ec Proofreading
Proofread some API docs
2020-09-24 13:15:28 +02:00
Adriane Boyd
1c63f02f99 Add API docs 2020-09-24 12:51:16 +02:00
Ines Montani
138c8d45db Update docs 2020-09-24 12:43:39 +02:00
Ines Montani
d7ab6a2ffe Update docs [ci skip] 2020-09-24 12:37:21 +02:00
Ines Montani
ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani
e2ffe51fb5 Update docs [ci skip] 2020-09-24 10:13:41 +02:00
Ines Montani
02008e9a55 Update docs [ci skip] 2020-09-23 22:02:31 +02:00
Ines Montani
c8bda92243 Update benchmarks [ci skip] 2020-09-23 20:05:02 +02:00
svlandeg
35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
Ines Montani
50a4425cda Adjust docs 2020-09-23 16:03:32 +02:00
Ines Montani
e4e7f5b00d Update docs [ci skip] 2020-09-23 15:44:40 +02:00
svlandeg
6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani
a9da33c4d9 Fix infobox with ID [ci skip] 2020-09-23 13:00:56 +02:00
Ines Montani
02b69dd0d5 Update models directory [ci skip] 2020-09-23 12:56:54 +02:00
Ines Montani
6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
Ines Montani
60a317520a
Merge pull request #6109 from svlandeg/feature/2rename 2020-09-23 09:47:12 +02:00
Ines Montani
566d048753 Fix project repo link [ci skip] 2020-09-23 09:43:51 +02:00
Ines Montani
930b116f00 Update docs [ci skip] 2020-09-23 09:35:21 +02:00
Ines Montani
d8f661c910 Update docs [ci skip] 2020-09-23 09:30:26 +02:00
svlandeg
b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00
Ines Montani
f9af7d365c Update docs [ci skip] 2020-09-22 09:45:41 +02:00
Ines Montani
49e80dbcac
Merge pull request #6103 from explosion/chore/tidy-up-tests-docs-get-doc 2020-09-22 09:45:04 +02:00
Adriane Boyd
e05d6d358d Update API sidebar MorphAnalysis link 2020-09-22 09:36:37 +02:00
Adriane Boyd
844db6ff12 Update architecture overview 2020-09-22 09:31:47 +02:00
Adriane Boyd
fc9c78da25 Add MorphAnalysis to API sidebar 2020-09-22 09:23:47 +02:00
Adriane Boyd
5fbb8dfcbc Merge remote-tracking branch 'upstream/develop' into docs/various-v3-2 2020-09-22 09:22:58 +02:00
Ines Montani
67fbcb3da5 Tidy up tests and docs 2020-09-21 20:43:54 +02:00
Ines Montani
a5f6ab4943
Merge pull request #6098 from adrianeboyd/feature/doc-init 2020-09-21 18:35:20 +02:00
Adriane Boyd
f212303729 Add sent_starts to Doc.__init__
Add sent_starts to `Doc.__init__`. Officially specify `is_sent_start`
values but also convert to and accept `sent_start` internally.
2020-09-21 17:59:09 +02:00
Adriane Boyd
6aa91c7ca0 Make user_data keyword-only 2020-09-21 16:00:06 +02:00
Ines Montani
e548654aca Update docs [ci skip] 2020-09-21 14:46:55 +02:00
Adriane Boyd
9b8d0b7f90 Alphabetize API sidebars 2020-09-21 13:46:21 +02:00
Adriane Boyd
bc02e86494 Extend Doc.__init__ with additional annotation
Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to
`Doc.__init__` to initialize the most common doc/token values.
2020-09-21 13:36:24 +02:00
Ines Montani
9d32cac736 Update docs [ci skip] 2020-09-21 10:55:36 +02:00
Adriane Boyd
cc71ec901f Fix typo in saving and loading usage docs 2020-09-21 09:08:55 +02:00
Adriane Boyd
3aa57ce6c9 Update alignment mode in Doc.char_span docs 2020-09-21 09:07:20 +02:00
Ines Montani
b9d2b29684 Update docs [ci skip] 2020-09-20 17:49:09 +02:00
Ines Montani
012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani
744f259b9c Update landing [ci skip] 2020-09-20 16:37:23 +02:00
Ines Montani
554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
Ines Montani
0406200a1e Update docs [ci skip] 2020-09-18 15:13:13 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
Ines Montani
d32ce121be Fix docs [ci skip] 2020-09-18 13:41:12 +02:00
Ines Montani
a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal
6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Ines Montani
1bb8b4f824 Merge branch 'master' into develop 2020-09-17 17:46:20 +02:00
Ines Montani
6bd0d25fb9
Merge pull request #6085 from explosion/docs/static-vectors-intro [ci skip] 2020-09-17 17:14:45 +02:00
Ines Montani
a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Ines Montani
2e3ce9f42f Merge branch 'feature/init-config-pretrain' of https://github.com/svlandeg/spaCy into pr/6084 2020-09-17 16:58:49 +02:00
Ines Montani
3d8e010655 Change order 2020-09-17 16:58:46 +02:00
Ines Montani
c4b414b282
Update website/docs/api/cli.md 2020-09-17 16:58:09 +02:00
Sofie Van Landeghem
e5ceec5df0
Update website/docs/api/cli.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-09-17 16:56:20 +02:00
Sofie Van Landeghem
127ce0c574
Update website/docs/api/cli.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-09-17 16:55:53 +02:00
Matthew Honnibal
ec751068f3 Draft text for static vectors intro 2020-09-17 16:42:53 +02:00
svlandeg
5fade4feb7 fix cli abbrev 2020-09-17 16:15:20 +02:00
svlandeg
ddfc1fc146 add pretraining option to init config 2020-09-17 16:05:40 +02:00
svlandeg
c8c84f1ccd Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 15:43:04 +02:00
svlandeg
130ffa5fbf fix typos in docs 2020-09-17 14:59:41 +02:00
Ines Montani
c8fa2247e3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-17 12:34:15 +02:00
Ines Montani
6761028c6f Update docs [ci skip] 2020-09-17 12:34:11 +02:00
svlandeg
0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg
8cedb2f380 Merge branch 'fix/corpus' of https://github.com/svlandeg/spaCy into fix/corpus 2020-09-17 09:27:55 +02:00
svlandeg
781fae678b Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 09:24:36 +02:00
Sofie Van Landeghem
21dcf92964
Update website/docs/api/data-formats.md
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-17 09:21:36 +02:00
Adriane Boyd
7e4cd7575c
Refactor Docs.is_ flags (#6044)
* Refactor Docs.is_ flags

* Add derived `Doc.has_annotation` method

  * `Doc.has_annotation(attr)` returns `True` for partial annotation

  * `Doc.has_annotation(attr, require_complete=True)` returns `True` for
    complete annotation

* Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced`
and `is_nered`

* Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs
for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The
list is the `DocBin` attributes list plus `SPACY` and `LENGTH`.

Notes on `Doc.has_annotation`:

* `HEAD` is converted to `DEP` because heads don't have an unset state

* Accept `IS_SENT_START` as a synonym of `SENT_START`

Additional changes:

* Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for
`DocBin`

* In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override
`SENT_START`

* In `Doc.from_array()` using `attrs` other than
`Doc._get_array_attrs()` (i.e., a user's custom list rather than our
default internal list) with both `HEAD` and `SENT_START` shows a warning
that `HEAD` will override `SENT_START`

* `set_children_from_heads` does not require dependency labels to set
sentence boundaries and sets `sent_start` for all non-sentence starts to
`-1`

* Fix call to set_children_form_heads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-17 00:14:01 +02:00
svlandeg
55f8d5478e fix example output 2020-09-15 22:09:30 +02:00
svlandeg
51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Ines Montani
2214d1bb7b
Merge pull request #6067 from explosion/feature/spacy-blank-from-config 2020-09-15 14:18:33 +02:00
Ines Montani
b7faa38960 Update docs [ci skip] 2020-09-15 12:44:03 +02:00
Ines Montani
0edd695bf6 Update docs 2020-09-15 11:41:49 +02:00
Ines Montani
99549a5ace Fix consistency and update docs 2020-09-15 11:37:37 +02:00
Ines Montani
154752f9c2 Update docs and consistency [ci skip] 2020-09-15 00:32:49 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00