Commit Graph

998 Commits

Author SHA1 Message Date
Adriane Boyd
d98d525bc8 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.1-3 2021-10-14 09:41:46 +02:00
Paul O'Leary McCann
b53e39455e
Fix UD POS docs links (fix #9013) (#9407)
* Fix UD POS docs links (fix #9013)

The previous link seems to have been for UD v1.

* Fix link
2021-10-11 11:51:19 +02:00
Paul O'Leary McCann
1ee6541ab0
Moving Japanese tokenizer extra info to Token.morph (#8977)
* Use morph for extra Japanese tokenizer info

Previously Japanese tokenizer info that didn't correspond to Token
fields was put in user data. Since spaCy core should avoid touching user
data, this moves most information to the Token.morph attribute. It also
adds the normalized form, which wasn't exposed before.

The subtokens, which are a list of full tokens, are still added to user
data, except with the default tokenizer granualarity. With the default
tokenizer settings the subtokens are all None, so in this case the user
data is simply not set.

* Update tests

Also adds a new test for norm data.

* Update docs

* Add Japanese morphologizer factory

Set the default to `extend=True` so that the morphologizer does not
clobber the values set by the tokenizer.

* Use the norm_ field for normalized forms

Before this commit, normalized forms were put in the "norm" field in the
morph attributes. I am not sure why I did that instead of using the
token morph, I think I just forgot about it.

* Skip test if sudachipy is not installed

* Fix import

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-01 19:19:26 +02:00
Paul O'Leary McCann
6e833b617a
Updating Troubleshooting Docs (#9329)
* Add link to Discussions FAQ

* Remove old FAQ entries

I think these are no longer relevant.

- no-cache-dir: affected pip versions are *very* old now
- narrow unicode: not an issue from py3.3+
- utf-8 osx: upstream bug closed in 2019

Some of the other issues are also maybe not frequent.
2021-10-01 12:28:22 +02:00
Elia Robyn Lake (Robyn Speer)
5b0b0ca809
Move WandB loggers into spacy-loggers (#9223)
* factor out the WandB logger into spacy-loggers

Signed-off-by: Elia Robyn Speer <gh@arborelia.net>

* depend on spacy-loggers so they are available

Signed-off-by: Elia Robyn Speer <gh@arborelia.net>

* remove docs of spacy.WandbLogger.v2 (moved to spacy-loggers)

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* Version number suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* update references to WandbLogger

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* make order of deps more consistent

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-09-29 11:12:50 +02:00
Ines Montani
6bb0324b81 Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
Ines Montani
beb4a8c524
Merge pull request #9199 from shigapov/master (resolves #9129) 2021-09-23 19:41:53 +10:00
Paul O'Leary McCann
1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Renat Shigapov
d5cc009faf
Merge branch 'explosion:master' into master 2021-09-13 08:43:48 +02:00
Renat Shigapov
e61d93f8c3
add NEL-visualisation to manual-usage 2021-09-13 08:38:58 +02:00
Paul O'Leary McCann
f89e1c34c9
Minor typo fix in docs 2021-09-11 14:22:05 +09:00
Sofie Van Landeghem
8895e3c9ad
matcher doc corrections (#9115)
* update error message to current UX

* clarify uppercase effect

* fix docstring
2021-09-02 09:26:33 +02:00
Robyn Speer
d60b748e3c
Fix surprises when asking for the root of a git repo (#9074)
* Fix surprises when asking for the root of a git repo

In the case of the first asset I wanted to get from git, the data I
wanted was the entire repository. I tried leaving "path" blank, which
gave a less-than-helpful error, and then I tried `path: "/"`, which
started copying my entire filesystem into the project. The path I should
have used was "".

I've made two changes to make this smoother for others:

- The 'path' within a git clone defaults to ""
- If the path points outside of the tmpdir that the git clone goes
into, we fail with an error

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* use a descriptive error instead of a default

plus some minor fixes from PR review

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* check for None values in assets

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
2021-09-01 22:52:08 +02:00
Sofie Van Landeghem
4d39430b82
Document use-case of freezing tok2vec (#8992)
* update error msg

* add sentence to docs

* expand note on frozen components
2021-08-26 09:50:35 +02:00
Paul O'Leary McCann
9391998c77
Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:37:21 +02:00
Ines Montani
4f769ff913 Update Prodigy project template for v1.11 [ci skip] 2021-08-12 13:46:20 +10:00
Adriane Boyd
175847f92c
Support list values and INTERSECTS in Matcher (#8784)
* Support list values and IS_INTERSECT in Matcher

* Support list values as token attributes for set operators, not just as
pattern values.

* Add `IS_INTERSECT` operator.

* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.

* Rename IS_INTERSECT to INTERSECTS
2021-08-02 19:39:26 +02:00
Ines Montani
30f20496d5
Merge pull request #8840 from polm/docs/evaluate-speed [ci skip] 2021-07-30 09:10:15 +10:00
Ines Montani
65d163fab5
Adjust formatting [ci skip] 2021-07-30 09:10:04 +10:00
Paul O'Leary McCann
a60cb13910 Update speed entry in metrics table 2021-07-29 16:35:19 +09:00
Paul O'Leary McCann
8867e60fbb
Update website/docs/usage/v3.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-29 14:56:56 +09:00
Paul O'Leary McCann
76ac95923a Add note to migration guide about lexeme tables (fix #7290)
This just adds the resolution from #6388 to the docs.
2021-07-27 19:19:25 +09:00
Paul O'Leary McCann
67ecdcc3ac
Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip] 2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes 2021-07-09 10:25:56 -06:00
Calum Sieppert
889c187bc2
Typo fixes 2021-07-07 16:53:04 -06:00
Adriane Boyd
6db647dfe0 Update v3.1 usage docs 2021-07-07 08:43:33 +02:00
Ines Montani
04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip] 2021-07-06 22:20:24 +10:00
Sofie Van Landeghem
b9f59118bf
Fix silent evaluation (#8581)
* fix silentness

* sneak in docs typo fix

* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Ines Montani
5bb7fe4b41 Update with HF hub integration [ci skip] 2021-07-06 19:30:59 +10:00
Cass
7d13fc799b
Fix a command typo in models.md
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00
Ines Montani
8423864b50
Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597) 2021-07-05 13:49:20 +02:00
Ines Montani
af9d984407
Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip] 2021-06-30 20:52:59 +10:00
Adriane Boyd
41292a1b84 Add note about updating with fill-config 2021-06-29 10:45:36 +02:00
Ines Montani
4544412442
Update wording [ci skip]
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-25 13:52:48 +10:00
Ines Montani
0d2e2b59bc Update intro [ci skip] 2021-06-24 22:53:20 +10:00
Ines Montani
68721af628 Formatting and preliminary intro [ci skip] 2021-06-24 20:32:23 +10:00
Adriane Boyd
92dc6b409e Notes on source with vectors 2021-06-24 10:34:07 +02:00
Adriane Boyd
35425d7e26 Add details for Catalan and Danish 2021-06-24 10:10:33 +02:00
Ines Montani
5daf450f51 Update upgrading notes [ci skip] 2021-06-24 18:06:28 +10:00
Ines Montani
528746129d Merge branch 'master' into docs/new-in-v3-1 2021-06-24 13:11:37 +10:00
Ines Montani
3e058dee62 Update features [ci skip] 2021-06-24 12:36:04 +10:00
Ines Montani
a1e4aca267 Fix sentence [ci skip] 2021-06-24 11:40:36 +10:00
Ines Montani
ca0d904faa Update details [ci skip] 2021-06-23 13:05:56 +10:00
themrmax
d96c422cfc
Fix broken link
change /api/registry to /api/top-level#registry
2021-06-22 15:34:06 -07:00
Ines Montani
e9b68d4f4c Update details and add example [ci skip] 2021-06-22 17:51:03 +10:00
Nick Sorros
31504f5982
Switch model and data path in prodigy project.yml recipe (#8467) 2021-06-22 09:41:45 +02:00
Ines Montani
bc93c34f54 Add "New in v3.1" guide 2021-06-22 15:23:18 +10:00
Ines Montani
02d2fdb123 Add link anchor [ci skip] 2021-06-20 11:29:19 +10:00
svlandeg
bb9d2f1546 extend example to ensure the text is preserved 2021-06-16 23:56:35 +02:00