Commit Graph

953 Commits

Author SHA1 Message Date
Ines Montani
af9d984407
Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip] 2021-06-30 20:52:59 +10:00
themrmax
d96c422cfc
Fix broken link
change /api/registry to /api/top-level#registry
2021-06-22 15:34:06 -07:00
Nick Sorros
31504f5982
Switch model and data path in prodigy project.yml recipe (#8467) 2021-06-22 09:41:45 +02:00
Ines Montani
02d2fdb123 Add link anchor [ci skip] 2021-06-20 11:29:19 +10:00
svlandeg
bb9d2f1546 extend example to ensure the text is preserved 2021-06-16 23:56:35 +02:00
Sofie Van Landeghem
e796aab4b3
Resizable textcat (#7862)
* implement textcat resizing for TextCatCNN

* resizing textcat in-place

* simplify code

* ensure predictions for old textcat labels remain the same after resizing (WIP)

* fix for softmax

* store softmax as attr

* fix ensemble weight copy and cleanup

* restructure slightly

* adjust documentation, update tests and quickstart templates to use latest versions

* extend unit test slightly

* revert unnecessary edits

* fix typo

* ensemble architecture won't be resizable for now

* use resizable layer (WIP)

* revert using resizable layer

* resizable container while avoid shape inference trouble

* cleanup

* ensure model continues training after resizing

* use fill_b parameter

* use fill_defaults

* resize_layer callback

* format

* bump thinc to 8.0.4

* bump spacy-legacy to 3.0.6
2021-06-16 11:45:00 +02:00
svlandeg
29d83dec0c adjust whitespace tokenizer to avoid sep in split() 2021-06-16 10:58:45 +02:00
Adriane Boyd
5646fcbe46 Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
Sofie Van Landeghem
0fd0d949c4
fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
Adriane Boyd
6baab565eb
Minor updates to quickstart settings/instructions (#7965)
* Minor updates to quickstart settings/instructions

* set default value of textcat exclusive to `false` until the default
checkbox behavior is updated
* add the `morphologizer` to the list of components
* add a note that v3.0.6+ is required

* Switch to warning above quickstart

* Undo changes to textcat default in quickstart

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-05-17 16:55:22 +02:00
Paul O'Leary McCann
66bfabd839
Fix pretraining objectives fragment (#8005)
* Fix pretraining objectives fragment

The fragment here is reused from a heading higher up, so you couldn't
link to this section.

* Fix section link to new fragment
2021-05-06 08:27:36 +02:00
Adriane Boyd
95c0833656
Add training option to set annotations on update (#7767)
* Add training option to set annotations on update

Add a `[training]` option called `set_annotations_on_update` to specify
a list of components for which the predicted annotations should be set
on `example.predicted` immediately after that component has been
updated. The predicted annotations can be accessed by later components
in the pipeline during the processing of the batch in the same `update`
call.

* Rename to annotates / annotating_components

* Add test for `annotating_components` when training from config

* Add documentation
2021-04-26 16:53:53 +02:00
Adriane Boyd
d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Shantam Raj
6017fcf693
Default code for Setting Entity annotations on the website errors (#7738)
* the default example for "Setting entity annotations" errors on Binder

* updating contributer info

* using a new variable to store original entities
2021-04-21 09:16:32 +02:00
langdonholmes
df541c6b5e
Update processing-pipelines.md to mention method for doc metadata (#7480)
* Update processing-pipelines.md

Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True)

Link to a new example on the attributes page detailing the following:

> ```
> data = [
>   ("Some text to process", {"meta": "foo"}),
>   ("And more text...", {"meta": "bar"})
> ]
> 
> for doc, context in nlp.pipe(data, as_tuples=True):
>     # Let's assume you have a "meta" extension registered on the Doc
>     doc._.meta = context["meta"]
> ```

from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as

* Updating the attributes section

Update the attributes section with example of how extensions can be used to store metadata.

* Update processing-pipelines.md

* Update processing-pipelines.md

Made as_tuples example executable and relocated to the end of the "Processing Text" section.

* Update processing-pipelines.md

* Update processing-pipelines.md

Removed extra line

* Reformat and rephrase

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-04-19 11:58:12 +02:00
Adriane Boyd
0e7f94b247
Update Tokenizer.explain with special matches (#7749)
* Update Tokenizer.explain with special matches

Update `Tokenizer.explain` and the pseudo-code in the docs to include
the processing of special cases that contain affixes or whitespace.

* Handle optional settings in explain

* Add test for special matches in explain

Add test for `Tokenizer.explain` for special cases containing affixes.
2021-04-19 19:08:20 +10:00
Bram Vanroy
ed561cf428
Terminology: deprecated vs obsolete (#7621)
* Terminology: deprecated vs obsolete

Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works.

In light of this, perhaps all other error codes should be checked as well.

* clarify that the link command is removed and not just deprecated

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-12 14:37:00 +02:00
Adriane Boyd
673e2bc4c0
Add usage docs for streamed train corpora (#7693) 2021-04-09 16:15:38 +02:00
Ayush Chaurasia
3c2ce41dd8
W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429)
* Add optional artifacts logging

* Update docs

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Bump WandbLogger Version

* Add documentation of v1 to legacy docs

* bump spacy-legacy to 3.0.2 (to be released)

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-01 19:36:23 +02:00
Santiago Castro
af07fc3bc1
Add support for CUDA 11.2 (#7583)
* Add support for CUDA 11.2

* Update the docs

* Format

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán
5b4dde38a3
fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606) 2021-03-30 09:45:49 +02:00
Adriane Boyd
0d2b723e8d Update entity setting section 2021-03-20 11:38:55 +01:00
Adriane Boyd
6a9a467766
Update website/docs/usage/processing-pipelines.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-03-19 08:12:49 +01:00
Adriane Boyd
40e5d3a980 Update saving/loading example 2021-03-18 16:56:10 +01:00
Adriane Boyd
0fb1881f36 Reformat processing pipelines 2021-03-18 13:31:42 +01:00
Adriane Boyd
acc58719da Update custom similarity hooks example 2021-03-18 13:31:42 +01:00
Adriane Boyd
c9e1a9ac17 Add multiprocessing section 2021-03-18 13:31:42 +01:00
Adriane Boyd
9a254d3995 Include all en_core_web_sm components in examples 2021-03-18 13:31:42 +01:00
bsweileh
61472e7cb3
Update _training.md - Fix broken link on backpropagation (#7431)
* Update _training.md

Fix broken link on backpropagation

* Add agreement

add spacy contributor agreement
2021-03-15 09:21:35 +01:00
Adriane Boyd
d746ea6278
Add warning about GPU selection in Jupyter notebooks (#7075)
* Initial warning

* Update check

* Redo edit

* Move jupyter warning to helper method

* Add link with details to warnings
2021-03-09 15:35:21 +01:00
Sofie Van Landeghem
932887b950
textcat scoring fix and multi_label docs (#6974)
* add multi-label textcat to menu

* add infobox on textcat API

* add info to v3 migration guide

* small edits

* further fixes in doc strings

* add infobox to textcat architectures

* add textcat_multilabel to overview of built-in components

* spelling

* fix unrelated warn msg

* Add textcat_multilabel to quickstart [ci skip]

* remove separate documentation page for multilabel_textcategorizer

* small edits

* positive label clarification

* avoid duplicating information in self.cfg and fix textcat.score

* fix multilabel textcat too

* revert threshold to storage in cfg

* revert threshold stuff for multi-textcat

Co-authored-by: Ines Montani <ines@ines.io>
2021-03-09 23:04:22 +11:00
Ines Montani
dfb23a419e Merge branch 'spacy.io' [ci skip] 2021-03-06 17:38:54 +11:00
graue70
7d085d5b1c
Fix typo in docs 2021-03-05 18:30:09 +01:00
svlandeg
d900c55061 consistently use registry as callable 2021-03-02 17:56:28 +01:00
svlandeg
08fd901a1b kb.get_candidates renamed to get_alias_candidates 2021-02-25 20:09:36 +01:00
Ines Montani
24cecbb3f4
Merge pull request #7126 from adrianeboyd/docs/gpu-id-opt [ci skip]
Add tip about --gpu-id to training quickstart
2021-02-24 22:34:17 +11:00
Tocic
b1996a51a1
fix typo in models.md (#7157) 2021-02-22 09:00:38 +01:00
Adriane Boyd
7198be0f4b Add tip about --gpu-id to training quickstart 2021-02-19 14:07:51 +01:00
Sofie Van Landeghem
709c9e75af
span.ent only returns first sentence (#7084)
* return first sentence when span contains sentence boundary

* docs fix

* small fixes

* cleanup
2021-02-19 23:02:38 +11:00
palandlom
9b82586699
var batch is useless (#7111)
It seems that nlp.update(examples) should be nlp.update(batch)
2021-02-18 09:44:22 +01:00
Ines Montani
fc4fb6eb3a Make v2.x docs more prominent [ci skip] 2021-02-17 23:42:27 +11:00
Ines Montani
c08b3f294c Support env vars and CLI overrides for project.yml 2021-02-10 13:45:27 +11:00
svlandeg
9a7f33c916 final 3.0 benchmark numbers 2021-02-09 21:28:33 +01:00
svlandeg
bb7482bef8 fix link 2021-02-08 18:39:59 +01:00
Ines Montani
433835d9b0
Merge pull request #6889 from adrianeboyd/docs/source-install-dup [ci skip] 2021-02-05 13:35:16 +11:00
Ines Montani
2cdfcd2d19 Update naming [ci skip] 2021-02-03 12:48:31 +11:00
Adriane Boyd
37a68a06ab Update to recommend editable installs for source installs 2021-02-02 16:51:27 +01:00
Adriane Boyd
3a3e4daf60 Update install instructions
* Remove duplicate section about compiling from source
2021-02-02 14:44:15 +01:00
Pengcheng YIN
6fdc33203a
Fix a typo 2021-02-01 17:26:28 -05:00
Ines Montani
a59f3fcf5d Make wheel the default format and update docs [ci skip] 2021-02-01 23:18:43 +11:00