Commit Graph

988 Commits

Author SHA1 Message Date
Paul O'Leary McCann
f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Sofie Van Landeghem
8895e3c9ad
matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:26:33 +02:00
Robyn Speer
d60b748e3c
Fix surprises when asking for the root of a git repo (#9074)
* Fix surprises when asking for the root of a git repo

In the case of the first asset I wanted to get from git, the data I
wanted was the entire repository. I tried leaving "path" blank, which
gave a less-than-helpful error, and then I tried `path: "/"`, which
started copying my entire filesystem into the project. The path I should
have used was "".

I've made two changes to make this smoother for others:

- The 'path' within a git clone defaults to ""
- If the path points outside of the tmpdir that the git clone goes
into, we fail with an error

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* use a descriptive error instead of a default

plus some minor fixes from PR review

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* check for None values in assets

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
2021-09-01 22:52:08 +02:00
Sofie Van Landeghem
4d39430b82
Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:50:35 +02:00
Paul O'Leary McCann
9391998c77
Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:37:21 +02:00
Ines Montani
4f769ff913 Update Prodigy project template for v1.11 [ci skip] 2021-08-12 13:46:20 +10:00
Adriane Boyd
175847f92c
Support list values and INTERSECTS in Matcher (#8784)
* Support list values and IS_INTERSECT in Matcher

* Support list values as token attributes for set operators, not just as
pattern values.

* Add `IS_INTERSECT` operator.

* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.

* Rename IS_INTERSECT to INTERSECTS
2021-08-02 19:39:26 +02:00
Ines Montani
30f20496d5
Merge pull request #8840 from polm/docs/evaluate-speed [ci skip] 2021-07-30 09:10:15 +10:00
Ines Montani
65d163fab5
Adjust formatting [ci skip] 2021-07-30 09:10:04 +10:00
Paul O'Leary McCann
a60cb13910 Update speed entry in metrics table 2021-07-29 16:35:19 +09:00
Paul O'Leary McCann
8867e60fbb
Update website/docs/usage/v3.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-29 14:56:56 +09:00
Paul O'Leary McCann
76ac95923a Add note to migration guide about lexeme tables (fix #7290)
This just adds the resolution from #6388 to the docs.
2021-07-27 19:19:25 +09:00
Paul O'Leary McCann
67ecdcc3ac
Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip] 2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes 2021-07-09 10:25:56 -06:00
Calum Sieppert
889c187bc2
Typo fixes 2021-07-07 16:53:04 -06:00
Adriane Boyd
6db647dfe0 Update v3.1 usage docs 2021-07-07 08:43:33 +02:00
Ines Montani
04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip] 2021-07-06 22:20:24 +10:00
Sofie Van Landeghem
b9f59118bf
Fix silent evaluation (#8581)
* fix silentness

* sneak in docs typo fix

* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Ines Montani
5bb7fe4b41 Update with HF hub integration [ci skip] 2021-07-06 19:30:59 +10:00
Cass
7d13fc799b
Fix a command typo in models.md
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00
Ines Montani
8423864b50
Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597) 2021-07-05 13:49:20 +02:00
Ines Montani
af9d984407
Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip] 2021-06-30 20:52:59 +10:00
Adriane Boyd
41292a1b84 Add note about updating with fill-config 2021-06-29 10:45:36 +02:00
Ines Montani
4544412442
Update wording [ci skip]
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-25 13:52:48 +10:00
Ines Montani
0d2e2b59bc Update intro [ci skip] 2021-06-24 22:53:20 +10:00
Ines Montani
68721af628 Formatting and preliminary intro [ci skip] 2021-06-24 20:32:23 +10:00
Adriane Boyd
92dc6b409e Notes on source with vectors 2021-06-24 10:34:07 +02:00
Adriane Boyd
35425d7e26 Add details for Catalan and Danish 2021-06-24 10:10:33 +02:00
Ines Montani
5daf450f51 Update upgrading notes [ci skip] 2021-06-24 18:06:28 +10:00
Ines Montani
528746129d Merge branch 'master' into docs/new-in-v3-1 2021-06-24 13:11:37 +10:00
Ines Montani
3e058dee62 Update features [ci skip] 2021-06-24 12:36:04 +10:00
Ines Montani
a1e4aca267 Fix sentence [ci skip] 2021-06-24 11:40:36 +10:00
Ines Montani
ca0d904faa Update details [ci skip] 2021-06-23 13:05:56 +10:00
themrmax
d96c422cfc
Fix broken link
change /api/registry to /api/top-level#registry
2021-06-22 15:34:06 -07:00
Ines Montani
e9b68d4f4c Update details and add example [ci skip] 2021-06-22 17:51:03 +10:00
Nick Sorros
31504f5982
Switch model and data path in prodigy project.yml recipe (#8467) 2021-06-22 09:41:45 +02:00
Ines Montani
bc93c34f54 Add "New in v3.1" guide 2021-06-22 15:23:18 +10:00
Ines Montani
02d2fdb123 Add link anchor [ci skip] 2021-06-20 11:29:19 +10:00
svlandeg
bb9d2f1546 extend example to ensure the text is preserved 2021-06-16 23:56:35 +02:00
Sofie Van Landeghem
e796aab4b3
Resizable textcat (#7862)
* implement textcat resizing for TextCatCNN

* resizing textcat in-place

* simplify code

* ensure predictions for old textcat labels remain the same after resizing (WIP)

* fix for softmax

* store softmax as attr

* fix ensemble weight copy and cleanup

* restructure slightly

* adjust documentation, update tests and quickstart templates to use latest versions

* extend unit test slightly

* revert unnecessary edits

* fix typo

* ensemble architecture won't be resizable for now

* use resizable layer (WIP)

* revert using resizable layer

* resizable container while avoid shape inference trouble

* cleanup

* ensure model continues training after resizing

* use fill_b parameter

* use fill_defaults

* resize_layer callback

* format

* bump thinc to 8.0.4

* bump spacy-legacy to 3.0.6
2021-06-16 11:45:00 +02:00
svlandeg
29d83dec0c adjust whitespace tokenizer to avoid sep in split() 2021-06-16 10:58:45 +02:00
Adriane Boyd
5646fcbe46 Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
Sofie Van Landeghem
0fd0d949c4
fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
Adriane Boyd
6baab565eb
Minor updates to quickstart settings/instructions (#7965)
* Minor updates to quickstart settings/instructions

* set default value of textcat exclusive to `false` until the default
checkbox behavior is updated
* add the `morphologizer` to the list of components
* add a note that v3.0.6+ is required

* Switch to warning above quickstart

* Undo changes to textcat default in quickstart

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-05-17 16:55:22 +02:00
Paul O'Leary McCann
66bfabd839
Fix pretraining objectives fragment (#8005)
* Fix pretraining objectives fragment

The fragment here is reused from a heading higher up, so you couldn't
link to this section.

* Fix section link to new fragment
2021-05-06 08:27:36 +02:00
Adriane Boyd
95c0833656
Add training option to set annotations on update (#7767)
* Add training option to set annotations on update

Add a `[training]` option called `set_annotations_on_update` to specify
a list of components for which the predicted annotations should be set
on `example.predicted` immediately after that component has been
updated. The predicted annotations can be accessed by later components
in the pipeline during the processing of the batch in the same `update`
call.

* Rename to annotates / annotating_components

* Add test for `annotating_components` when training from config

* Add documentation
2021-04-26 16:53:53 +02:00
Adriane Boyd
d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Shantam Raj
6017fcf693
Default code for Setting Entity annotations on the website errors (#7738)
* the default example for "Setting entity annotations" errors on Binder

* updating contributer info

* using a new variable to store original entities
2021-04-21 09:16:32 +02:00
langdonholmes
df541c6b5e
Update processing-pipelines.md to mention method for doc metadata (#7480)
* Update processing-pipelines.md

Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True)

Link to a new example on the attributes page detailing the following:

> ```
> data = [
>   ("Some text to process", {"meta": "foo"}),
>   ("And more text...", {"meta": "bar"})
> ]
> 
> for doc, context in nlp.pipe(data, as_tuples=True):
>     # Let's assume you have a "meta" extension registered on the Doc
>     doc._.meta = context["meta"]
> ```

from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as

* Updating the attributes section

Update the attributes section with example of how extensions can be used to store metadata.

* Update processing-pipelines.md

* Update processing-pipelines.md

Made as_tuples example executable and relocated to the end of the "Processing Text" section.

* Update processing-pipelines.md

* Update processing-pipelines.md

Removed extra line

* Reformat and rephrase

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-04-19 11:58:12 +02:00