Commit Graph

2755 Commits

Author SHA1 Message Date
Sofie Van Landeghem
4d39430b82
Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:50:35 +02:00
Sofie Van Landeghem
94fb840443
fix docs for Span constructor arguments (#9023) 2021-08-25 16:06:22 +02:00
Sofie Van Landeghem
de025beb5f
Warn and document spangroup.doc weakref (#8980)
* test for error after Doc has been garbage collected

* warn about using a SpanGroup when the Doc has been garbage collected

* add warning to the docs

* rephrase slightly

* raise error instead of warning

* update

* move warning to doc property
2021-08-20 11:06:19 +02:00
Paul O'Leary McCann
37fe847af4 Fix type annotation in docs 2021-08-20 15:34:22 +09:00
Ines Montani
f2b61b77a5 Fix universe.json [ci skip] 2021-08-20 11:26:29 +10:00
Baltazar
71e65fe943 added spacy api v3 docker 2021-08-19 21:29:25 +02:00
Paul O'Leary McCann
9391998c77
Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:37:21 +02:00
Lasse
839ea0f987 change tags formatting to match 2021-08-13 14:40:08 +02:00
Lasse
70ab596f61 Merge branch 'master' of https://github.com/HLasse/spaCy 2021-08-13 14:35:21 +02:00
Lasse
195e4e48c3 add textdescriptives to universe 2021-08-13 14:35:18 +02:00
Adriane Boyd
b278f31ee6
Document scorers in registry and components from #8766 (#8929)
* Document scorers in registry and components from #8766

* Update spacy/pipeline/lemmatizer.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/dependencyparser.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Reformat

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-12 12:50:03 +02:00
Ines Montani
4f769ff913 Update Prodigy project template for v1.11 [ci skip] 2021-08-12 13:46:20 +10:00
Paul O'Leary McCann
e227d24d43
Allow passing in array vars for speedup (#8882)
* Allow passing in array vars for speedup

This fixes #8845. Not sure about the docstring changes here...

* Update docs

Types maybe need more detail? Maybe not?

* Run prettier on docs

* Update spacy/tokens/span.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-10 15:13:53 +02:00
Paul O'Leary McCann
6029cfc391
Add scores to output in spancat (#8855)
* Add scores to output in spancat

This exposes the scores as an attribute on the SpanGroup. Includes a
basic test.

* Add basic doc note

* Vectorize score calcs

* Add "annotation format" section

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Clean up doc section

* Ran prettier on docs

* Get arrays off the gpu before iterating over them

* Remove int() calls

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-08-10 13:47:49 +02:00
Duygu Altinok
380b2817cf
updated unv json for new book 2021-08-09 12:39:22 +02:00
Paul O'Leary McCann
cac298471f
Fix #8902 (bad link in docs)
typo fix
2021-08-08 22:04:00 +09:00
Adriane Boyd
175847f92c
Support list values and INTERSECTS in Matcher (#8784)
* Support list values and IS_INTERSECT in Matcher

* Support list values as token attributes for set operators, not just as
pattern values.

* Add `IS_INTERSECT` operator.

* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.

* Rename IS_INTERSECT to INTERSECTS
2021-08-02 19:39:26 +02:00
Ines Montani
30f20496d5
Merge pull request #8840 from polm/docs/evaluate-speed [ci skip] 2021-07-30 09:10:15 +10:00
Ines Montani
65d163fab5
Adjust formatting [ci skip] 2021-07-30 09:10:04 +10:00
Ines Montani
3a701d3645
Merge pull request #8841 from adrianeboyd/docs/ent-id-sep [ci skip]
Fix formatting of ent_id_sep in EntityRuler API docs
2021-07-30 09:09:25 +10:00
thomashacker
02258916c8 Fix example config typo for transformer architecture 2021-07-29 11:19:40 +02:00
Adriane Boyd
15b12f3e35 Fix formatting of ent_id_sep in EntityRuler API docs 2021-07-29 10:10:12 +02:00
Paul O'Leary McCann
a60cb13910 Update speed entry in metrics table 2021-07-29 16:35:19 +09:00
Paul O'Leary McCann
e125313a50 Revert "Add note about SPEED in output"
This reverts commit c92d268176.
2021-07-29 16:34:08 +09:00
Ines Montani
0a1e299d30
Merge pull request #8814 from polm/docs/migrate-lexeme-tables [ci skip] 2021-07-29 17:18:02 +10:00
Paul O'Leary McCann
c92d268176 Add note about SPEED in output
In #8823 it was pointed out that the `SPEED` value wasn't documented
anywhere.
2021-07-29 15:03:07 +09:00
Paul O'Leary McCann
8867e60fbb
Update website/docs/usage/v3.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-29 14:56:56 +09:00
Adriane Boyd
8547514aa4
Remove labels from textcat component config example (#8815) 2021-07-27 13:14:38 +02:00
Paul O'Leary McCann
76ac95923a Add note to migration guide about lexeme tables (fix #7290)
This just adds the resolution from #6388 to the docs.
2021-07-27 19:19:25 +09:00
Paul O'Leary McCann
67ecdcc3ac
Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Ines Montani
134cb06af3
Merge pull request #8808 from kevinlu1248/master [ci skip]
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:16 +10:00
Kevin Lu
4a8e9e4e4e
Update data-formats.md 2021-07-25 22:58:53 -07:00
Ledenel
413f745c68 fix broken example in spaCy universe Chatterbot 2021-07-25 15:53:32 +00:00
Paul O'Leary McCann
d717593eb7
Merge pull request #8754 from KennethEnevoldsen/patch-1
[minor] removed outdated spacy version for spacymoji
2021-07-18 19:17:33 +09:00
Kenneth Enevoldsen
5d6aed0773
fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 10:22:00 +02:00
Ines Montani
313f55e560 Fix JSON [ci skip] 2021-07-18 13:21:33 +10:00
Ines Montani
51e5903d6f
Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:18:42 +10:00
Kenneth Enevoldsen
8546948fba
removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-17 15:19:43 +02:00
Kenneth Enevoldsen
a0e0ccdb46
Update website/meta/universe.json
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-17 07:14:46 +02:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
Adriane Boyd
f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:41:36 +02:00
Sofie Van Landeghem
77859beb99
spacy.ngram_range_suggester.v1 (#8699) 2021-07-15 10:01:22 +02:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00
Kenneth Enevoldsen
94ce904e10
added missing comma 2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen
a81fcc81b0
added dacy to universe 2021-07-13 09:54:08 +02:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip] 2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes 2021-07-09 10:25:56 -06:00
Adriane Boyd
1ee5bee29d
Add Macedonian models to website (#8637) 2021-07-08 09:32:14 +02:00
Paul O'Leary McCann
1d9209d43a
Merge pull request #8547 from mylibrar/update-universe
Add forte to universe.json
2021-07-08 14:59:49 +09:00
Ines Montani
39c8f7949e Add code preview for textcat_multilabel [ci skip] 2021-07-08 13:33:25 +10:00
Calum Sieppert
889c187bc2
Typo fixes 2021-07-07 16:53:04 -06:00
Adriane Boyd
6db647dfe0 Update v3.1 usage docs 2021-07-07 08:43:33 +02:00
Sofie Van Landeghem
64fac754fe
add spacy prefix to ngram_suggester.v1 (#8623) 2021-07-07 08:09:30 +02:00
Sofie Van Landeghem
e7d747e3ee
TransitionBasedParser.v1 to legacy (#8586)
* TransitionBasedParser.v1 to legacy

* register sublayers

* bump spacy-legacy to 3.0.7
2021-07-06 15:26:45 +02:00
Ines Montani
04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip] 2021-07-06 22:20:24 +10:00
Sofie Van Landeghem
b9f59118bf
Fix silent evaluation (#8581)
* fix silentness

* sneak in docs typo fix

* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Adriane Boyd
29906884c5
Raise an error for textcat with <2 labels (#8584)
* Raise an error for textcat with <2 labels

Raise an error if initializing a `textcat` component without at least
two labels.

* Add similar note to docs

* Update positive_label description in API docs
2021-07-06 12:35:22 +02:00
Ines Montani
5bb7fe4b41 Update with HF hub integration [ci skip] 2021-07-06 19:30:59 +10:00
Cass
7d13fc799b
Fix a command typo in models.md
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00
Ines Montani
8423864b50
Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597) 2021-07-05 13:49:20 +02:00
Yoichiro Hasebe
596e04cbb4
Github repo info fixed for ruby-spacy 2021-07-04 18:55:17 +09:00
Yoichiro Hasebe
2bdfa42107
Update universe.json 2021-07-04 08:44:39 +09:00
Suqi Sun
3901507df8 Update pip 2021-06-30 16:44:43 -04:00
Suqi Sun
61c868ed75 Update pip and code example 2021-06-30 14:49:51 -04:00
Ines Montani
af9d984407
Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip] 2021-06-30 20:52:59 +10:00
Suqi Sun
4331c40b78 Add forte to universe.json 2021-06-29 16:17:22 -04:00
Adriane Boyd
41292a1b84 Add note about updating with fill-config 2021-06-29 10:45:36 +02:00
Nick Sorros
bb781ae7f7
Remove extra parenthesis from the example for spacy-streamlit (#8527) 2021-06-28 14:03:31 +02:00
Adriane Boyd
4d1ef8f695 Tidy up docs 2021-06-28 12:08:15 +02:00
Ines Montani
93572dc12a
Merge pull request #8505 from bryant1410/patch-2 [ci skip]
Fix double slash in model release web page
2021-06-28 12:51:06 +10:00
Kevin
1a3e7cc5ef Updated PyATE syntax to fit spaCy V3 2021-06-26 17:52:41 -07:00
Santiago Castro
2e71944e1e
Fix double slash in model release web page 2021-06-25 19:19:10 -07:00
Ines Montani
4544412442
Update wording [ci skip]
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-25 13:52:48 +10:00
Ines Montani
0d2e2b59bc Update intro [ci skip] 2021-06-24 22:53:20 +10:00
Matthew Honnibal
f9946154d9
Add SpanCategorizer component (#6747)
* Draft spancat model

* Add spancat model

* Add test for extract_spans

* Add extract_spans layer

* Upd extract_spans

* Add spancat model

* Add test for spancat model

* Upd spancat model

* Update spancat component

* Upd spancat

* Update spancat model

* Add quick spancat test

* Import SpanCategorizer

* Fix SpanCategorizer component

* Import SpanGroup

* Fix span extraction

* Fix import

* Fix import

* Upd model

* Update spancat models

* Add scoring, update defaults

* Update and add docs

* Fix type

* Update spacy/ml/extract_spans.py

* Auto-format and fix import

* Fix comment

* Fix type

* Fix type

* Update website/docs/api/spancategorizer.md

* Fix comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Better defense

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix labels list

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/extract_spans.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Set annotations during update

* Set annotations in spancat

* fix imports in test

* Update spacy/pipeline/spancat.py

* replace MaxoutLogistic with LinearLogistic

* fix config

* various small fixes

* remove set_annotations parameter in update

* use our beloved tupley format with recent support for doc.spans

* bugfix to allow renaming the default span_key (scores weren't showing up)

* use different key in docs example

* change defaults to better-working parameters from project (WIP)

* register spacy.extract_spans.v1 for legacy purposes

* Upd dev version so can build wheel

* layers instead of architectures for smaller building blocks

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Include additional scores from overrides in combined score weights

* Parameterize spans key in scoring

Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so
that it's possible to evaluate multiple `spancat` components in the same
pipeline.

* Use the (intentionally very short) default spans key `sc` in the
  `SpanCategorizer`
* Adjust the default score weights to include the default key
* Adjust the scorer to use `spans_{spans_key}` as the prefix for the
  returned score
* Revert addition of `attr_name` argument to `score_spans` and adjust
  the key in the `getter` instead.

Note that for `spancat` components with a custom `span_key`, the score
weights currently need to be modified manually in
`[training.score_weights]` for them to be available during training. To
suppress the default score weights `spans_sc_p/r/f` during training, set
them to `null` in `[training.score_weights]`.

* Update website/docs/api/scorer.md

* Fix scorer for spans key containing underscore

* Increment version

* Add Spans to Evaluate CLI (#8439)

* Add Spans to Evaluate CLI

* Change to spans_key

* Add spans per_type output

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix spancat GPU issues (#8455)

* Fix GPU issues

* Require thinc >=8.0.6

* Switch to glorot_uniform_init

* Fix and test ngram suggester

* Include final ngram in doc for all sizes
* Fix ngrams for docs of the same length as ngram size
* Handle batches of docs that result in no ngrams
* Add tests

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Nirant <NirantK@users.noreply.github.com>
2021-06-24 12:35:27 +02:00
Ines Montani
68721af628 Formatting and preliminary intro [ci skip] 2021-06-24 20:32:23 +10:00
Adriane Boyd
92dc6b409e Notes on source with vectors 2021-06-24 10:34:07 +02:00
Adriane Boyd
35425d7e26 Add details for Catalan and Danish 2021-06-24 10:10:33 +02:00
Ines Montani
5daf450f51 Update upgrading notes [ci skip] 2021-06-24 18:06:28 +10:00
Ines Montani
528746129d Merge branch 'master' into docs/new-in-v3-1 2021-06-24 13:11:37 +10:00
Ines Montani
a8e8d02ba7
Merge pull request #8465 from explosion/feature/spacy-package-readme 2021-06-24 13:11:08 +10:00
Ines Montani
3e058dee62 Update features [ci skip] 2021-06-24 12:36:04 +10:00
Ines Montani
40f13c3f0c Add docs [ci skip] 2021-06-24 11:57:15 +10:00
Ines Montani
a1e4aca267 Fix sentence [ci skip] 2021-06-24 11:40:36 +10:00
Ines Montani
ca0d904faa Update details [ci skip] 2021-06-23 13:05:56 +10:00
themrmax
d96c422cfc
Fix broken link
change /api/registry to /api/top-level#registry
2021-06-22 15:34:06 -07:00
Ines Montani
e9b68d4f4c Update details and add example [ci skip] 2021-06-22 17:51:03 +10:00
Nick Sorros
31504f5982
Switch model and data path in prodigy project.yml recipe (#8467) 2021-06-22 09:41:45 +02:00
Ines Montani
bc93c34f54 Add "New in v3.1" guide 2021-06-22 15:23:18 +10:00
Adriane Boyd
e39d1bd4ab
Various docs updates for v3.1 (#8406)
* Update for Catalan/Italian lemmatizer changes

* Add warning about relevance of section
2021-06-21 09:33:50 +02:00
Ines Montani
02d2fdb123 Add link anchor [ci skip] 2021-06-20 11:29:19 +10:00
Matthew Honnibal
6f5e308d17
Support negative examples in partial NER annotations (#8106)
* Support a cfg field in transition system

* Make NER 'has gold' check use right alignment for span

* Pass 'negative_samples_key' property into NER transition system

* Add field for negative samples to NER transition system

* Check neg_key in NER has_gold

* Support negative examples in NER oracle

* Test for negative examples in NER

* Fix name of config variable in NER

* Remove vestiges of old-style partial annotation

* Remove obsolete tests

* Add comment noting lack of support for negative samples in parser

* Additions to "neg examples" PR (#8201)

* add custom error and test for deprecated format

* add test for unlearning an entity

* add break also for Begin's cost

* add negative_samples_key property on Parser

* rename

* extend docs & fix some older docs issues

* add subclass constructors, clean up tests, fix docs

* add flaky test with ValueError if gold parse was not found

* remove ValueError if n_gold == 0

* fix docstring

* Hack in environment variables to try out training

* Remove hack

* Remove NER hack, and support 'negative O' samples

* Fix O oracle

* Fix transition parser

* Remove 'not O' from oracle

* Fix NER oracle

* check for spans in both gold.ents and gold.spans and raise if so, to prevent memory access violation

* use set instead of list in consistency check

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-17 17:33:00 +10:00
svlandeg
bb9d2f1546 extend example to ensure the text is preserved 2021-06-16 23:56:35 +02:00
Sofie Van Landeghem
e796aab4b3
Resizable textcat (#7862)
* implement textcat resizing for TextCatCNN

* resizing textcat in-place

* simplify code

* ensure predictions for old textcat labels remain the same after resizing (WIP)

* fix for softmax

* store softmax as attr

* fix ensemble weight copy and cleanup

* restructure slightly

* adjust documentation, update tests and quickstart templates to use latest versions

* extend unit test slightly

* revert unnecessary edits

* fix typo

* ensemble architecture won't be resizable for now

* use resizable layer (WIP)

* revert using resizable layer

* resizable container while avoid shape inference trouble

* cleanup

* ensure model continues training after resizing

* use fill_b parameter

* use fill_defaults

* resize_layer callback

* format

* bump thinc to 8.0.4

* bump spacy-legacy to 3.0.6
2021-06-16 11:45:00 +02:00
svlandeg
29d83dec0c adjust whitespace tokenizer to avoid sep in split() 2021-06-16 10:58:45 +02:00
Adriane Boyd
5646fcbe46 Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
Sofie Van Landeghem
0fd0d949c4
fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
Adriane Boyd
507422149f
Various docs updates for v3.0 (#8353)
* Update cats score names in Scorer API docs

* Refer to performance in meta

* Update package naming/versions, lemmatizer details

* Minor formatting fixes

* Provide more explanation for cats_score_desc

* Provide language-specific lemmatizer defaults in API docs

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-06-14 12:19:36 +02:00
Adriane Boyd
63d748f80e
Add Catalan and Danish trf to website models (#8378) 2021-06-14 09:50:13 +02:00