Commit Graph

3064 Commits

Author SHA1 Message Date
Adriane Boyd
724831b066 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
fe9009911a Update rule-based-matching.md (#6421)
* Update rule-based-matching.md

Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc).

Clarified "Type" column header to "Value Type"

* Update rule-based-matching.md

Improved clarity of wording
2020-11-24 16:20:19 +01:00
Adriane Boyd
6f133877aa Update source install instructions
* Don't recommend an editable install in the default source
instructions.
* Use `pip install --no-build-isolation` for editable installs.
* Remove reference to `virtualenv`.
2020-11-24 14:44:13 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
svlandeg
218abaa69a typo 2020-11-20 22:36:49 +01:00
svlandeg
e861e928df more small corrections 2020-11-20 22:29:58 +01:00
svlandeg
5ac0867427 final fixes 2020-11-20 22:18:53 +01:00
svlandeg
331ec83493 edits and updates to implementing REL component docs 2020-11-20 21:41:52 +01:00
svlandeg
4a3e611abc small fixes and formatting 2020-11-20 15:55:05 +01:00
svlandeg
124f49feb6 update REL model code 2020-11-20 15:25:20 +01:00
svlandeg
636be3c791 Merge remote-tracking branch 'upstream/develop' into feature/trf-docs 2020-11-19 14:15:35 +01:00
Sofie Van Landeghem
165993d8e5
fix typo in transformer docs (#6404) 2020-11-19 14:11:38 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Adriane Boyd
3cf6479467 Fix JSON in #6395 2020-11-17 15:25:41 +01:00
Sam Edwardes
78913a4f95
Added spaCyTextBlob to universe.json (#6395) 2020-11-17 14:38:34 +01:00
Adriane Boyd
96726ec1f6
Fix DocBin init in training example (#6396) 2020-11-17 14:36:44 +01:00
Adriane Boyd
ed32fa80cd Update source install instructions
* Use `pip install` instead of `python setup.py install`
* For developers recommend:
  * `python setup.py build_ext --inplace -j N`
  * `python setup.py develop`
2020-11-16 10:13:51 +01:00
svlandeg
99d0412b6e add link to REL project 2020-11-15 18:35:56 +01:00
svlandeg
73fc1ed963 remove labels from morphologizer constructor 2020-11-11 21:48:50 +01:00
svlandeg
fcd79e0655 remove set_morphology from docs 2020-11-11 21:32:34 +01:00
Ines Montani
3ca5c7082d Use pip install . in quickstart [ci skip] 2020-11-10 17:27:49 +08:00
Ines Montani
de6453940e
Merge pull request #6305 from svlandeg/feature/score-docs [ci skip] 2020-11-10 02:52:11 +01:00
Ines Montani
4d337eedf2
Merge pull request #6322 from medspacy/master 2020-11-10 02:47:29 +01:00
Ines Montani
d7950c5ada
Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip] 2020-11-10 02:45:52 +01:00
Ines Montani
448bfbdc30 Remove conda from nightly install widget [ci skip] 2020-11-10 09:44:52 +08:00
svlandeg
789fb3d124 add docs for upstream argument of TransformerListener 2020-11-09 21:42:58 +01:00
Ines Montani
363ac73c72 Update docs [ci skip] 2020-11-09 12:43:26 +08:00
Adriane Boyd
8644ee3e3f
Update TIGER link and tag description (#6344) 2020-11-05 09:33:00 +01:00
Sofie Van Landeghem
8ef056cf98
fix embed_size in Entity Linker architecture (#6343) 2020-11-04 22:20:13 +01:00
Ines Montani
019a1dd5e8 Fix v3 overview [ci skip] 2020-11-03 18:10:06 +01:00
Adriane Boyd
a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
Alec Chapman
204c7c8a00 fix thumbnail link to be github raw url 2020-11-01 07:53:48 -07:00
Alec Chapman
73d22d96ff add medspacy to universe and fix example w/ cov-bsv 2020-10-29 07:53:56 -06:00
Adriane Boyd
8cc5ed6771 Add Macedonian to website languages 2020-10-29 08:49:56 +01:00
Adriane Boyd
dc816bba9d
Fix node name typo in dependency matcher example (#6311) 2020-10-28 16:32:46 +01:00
Adriane Boyd
4dd86306e9
Add Nepali to supported languages on website (#6315) 2020-10-28 16:32:07 +01:00
svlandeg
77688b0072 fix config 2020-10-26 11:14:34 +01:00
svlandeg
5878ff6bcd cleanup 2020-10-26 11:13:02 +01:00
svlandeg
e95d9caa87 small edits 2020-10-26 11:09:25 +01:00
svlandeg
a664994a81 adding score method to explanation of new component 2020-10-26 10:52:47 +01:00
Adriane Boyd
253480353c Remove zh from quickstart extras 2020-10-23 11:39:25 +02:00
Adriane Boyd
af26886fff Fix formatting 2020-10-23 11:38:14 +02:00
Adriane Boyd
c0b76f4c19 Add install step to "Compile from source" 2020-10-23 11:36:36 +02:00
Adriane Boyd
8fe7ede667 Add install step to source install quickstart 2020-10-23 11:34:43 +02:00
Adriane Boyd
4299a7f654 Setup / install / quickstart updates
* Add `cuda110` to setup.cfg and quickstart dropdown
* Switch to `pip` for pip-only packages in conda quickstart instructions
* Update zh pkuseg install message with version range and conda
* Remove `zh` from `extras_require` because the default doesn't require
additional packages
2020-10-23 11:27:54 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
Ines Montani
6523f2daac
Merge pull request #6273 from adrianeboyd/bugfix/detailed-scores-in-evaluate2 2020-10-20 10:03:09 +02:00
Adriane Boyd
fbe65b257b Convert accuracy numbers on website models page 2020-10-19 18:55:55 +02:00
Ines Montani
b6b1c1e23c
Merge pull request #6271 from walterhenry/develop-proof [ci skip] 2020-10-19 16:31:43 +02:00
walterhenry
db24dc5614 Proofread remarks
I think these may the last remarks for the nightly docs. Only two minor things actually.
2020-10-19 11:11:32 +02:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Ines Montani
e2f3c4e12d Fix robots [ci skip] 2020-10-16 17:44:13 +02:00
Adriane Boyd
e896803792 Add and update website license links 2020-10-16 17:01:52 +02:00
Ines Montani
c655742b8b Remove docs references to starters for now (see #6262) [ci skip] 2020-10-16 15:46:34 +02:00
Ines Montani
3851300e80 Update landing [ci skip] 2020-10-16 11:46:33 +02:00
Ines Montani
c968d1560f Fix docs example [ci skip] 2020-10-16 11:33:20 +02:00
Ines Montani
ba1e004049 Fix typo [ci skip] 2020-10-15 23:39:04 +02:00
Ines Montani
32dc4f4796 Sort models sidebar alphabetically [ci skip] 2020-10-15 22:47:16 +02:00
Ines Montani
20f80587d6
Merge pull request #6257 from walterhenry/develop-proof
A few tiny typo fixes to push through with release of nightly
2020-10-15 18:17:30 +02:00
walterhenry
75b7f86383 Three small typos
Some little typos since v3.0 is out.
2020-10-15 18:06:37 +02:00
Ines Montani
09dbbe75d7 Update docs [ci skip] 2020-10-15 17:27:24 +02:00
Ines Montani
7f05ccc170 Update docs [ci skip] 2020-10-15 12:35:30 +02:00
Ines Montani
4fa869e6f7 Update docs [ci skip] 2020-10-15 11:16:06 +02:00
Ines Montani
178760855f Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
Ines Montani
abeafcbc08 Update docs [ci skip] 2020-10-15 08:58:30 +02:00
Ines Montani
050aa1e0e2 Update languages.json [ci skip] 2020-10-14 20:51:50 +02:00
Ines Montani
a966c271f7 Update models docs [ci skip] 2020-10-14 20:50:23 +02:00
Ines Montani
a2d4aaee70
Apply suggestions from code review 2020-10-14 19:51:36 +02:00
Ines Montani
d94e241fce Merge branch 'develop' into pr/6253 2020-10-14 16:55:46 +02:00
Ines Montani
cb47f25cda
Merge pull request #6252 from svlandeg/fix/docs 2020-10-14 16:43:12 +02:00
walterhenry
6af585dba5 New batch of proofs
Just tiny fixes to the docs as a proofreader
2020-10-14 16:37:57 +02:00
svlandeg
478a14a619 fix few typos 2020-10-14 15:01:19 +02:00
Ines Montani
1aa8e8f2af Update docs [ci skip] 2020-10-14 14:58:45 +02:00
Ines Montani
4d99d2b94a Update docs [ci skip] 2020-10-13 11:38:52 +02:00
svlandeg
40276fd3be update NEL docs after latest refactor 2020-10-12 11:41:27 +02:00
svlandeg
08cb085f6c Merge remote-tracking branch 'upstream/develop' into fix/various 2020-10-09 17:01:27 +02:00
Ines Montani
97ff090e49 Fix docs example [ci skip] 2020-10-09 16:03:57 +02:00
Ines Montani
9fb3244672
Merge pull request #6231 from adrianeboyd/feature/include-static-vectors 2020-10-09 15:54:52 +02:00
Adriane Boyd
2dd79454af Update docs 2020-10-09 14:42:07 +02:00
svlandeg
853edace37 fix MultiHashEmbed example in documentation 2020-10-09 14:11:06 +02:00
Ines Montani
e50dc2c1c9 Update docs [ci skip] 2020-10-09 12:04:52 +02:00
Ines Montani
7c52def5da
Merge pull request #6227 from adrianeboyd/chore/update-3.0.0a36-from-master 2020-10-09 10:49:20 +02:00
Ines Montani
329b61ee7b Update docs [ci skip] 2020-10-09 10:36:06 +02:00
Šarūnas Navickas
287ba94a2f Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-09 10:14:40 +02:00
delzac
668507be1b Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-09 10:14:40 +02:00
Sofie Van Landeghem
d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani
5ebd1fc2cf Update docs [ci skip] 2020-10-08 16:23:12 +02:00
Ines Montani
741796e500 Update docs [ci skip] 2020-10-08 14:31:34 +02:00
Ines Montani
d1602e1ece Update docs [ci skip] 2020-10-08 11:56:50 +02:00
Ines Montani
064575d79d
Merge pull request #6216 from svlandeg/feature/nel-initialize 2020-10-08 11:14:12 +02:00
Ines Montani
43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
svlandeg
eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
svlandeg
bcaad28eda fix typos 2020-10-07 13:05:37 +02:00
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Ines Montani
ce14520789 Update docs [ci skip] 2020-10-06 14:35:17 +02:00
Ines Montani
2a17566da3 Update docs [ci skip] 2020-10-06 14:15:08 +02:00
Ines Montani
967377287a
Merge pull request #6210 from adrianeboyd/docs/various-v3-3 [ci skip] 2020-10-06 11:28:45 +02:00
Adriane Boyd
aa9c9f3bf0 Update Chinese usage for spacy-pkuseg 2020-10-06 11:21:17 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-06 11:19:36 +02:00
Ines Montani
2fd7122074 Update docs [ci skip] 2020-10-06 10:31:48 +02:00
Ines Montani
568e12215d
Merge pull request #6206 from svlandeg/fix/patterns-init 2020-10-06 10:27:23 +02:00
Ines Montani
2e961817cb Update docs [ci skip] 2020-10-06 10:23:01 +02:00
svlandeg
9b4cf7b0b6 update output of debug config command 2020-10-06 09:47:23 +02:00
svlandeg
fd0f60e2bc updates to data format for training and pretraining 2020-10-06 09:28:53 +02:00
svlandeg
ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
Ines Montani
1a554bdcb1 Update docs and docstring [ci skip] 2020-10-05 21:55:27 +02:00
Ines Montani
181039bd17
Merge pull request #6205 from explosion/feature/embed-features 2020-10-05 21:49:10 +02:00
Ines Montani
5ba418b08c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-05 21:44:01 +02:00
Ines Montani
8a39d5414e Update quickstart [ci skip] 2020-10-05 21:43:51 +02:00
Ines Montani
9ca283a899 Merge branch 'develop' into feature/project-spacy-version 2020-10-05 21:06:07 +02:00
Ines Montani
9aa07ad001 Update quickstarts [ci skip] 2020-10-05 21:05:41 +02:00
Ines Montani
706b7f6973 Update docs 2020-10-05 20:51:22 +02:00
Matthew Honnibal
919790cb47 Upd MultiHashEmbed docs 2020-10-05 20:28:21 +02:00
svlandeg
193e0d5a98 add docs for entity_ruler.initialize 2020-10-05 18:04:08 +02:00
svlandeg
65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
Ines Montani
e3acad6264 Update docs [ci skip] 2020-10-05 13:06:20 +02:00
Ines Montani
0f64556c04
Merge pull request #6197 from svlandeg/feature/pipe-docs [ci skip] 2020-10-05 11:55:40 +02:00
svlandeg
9a6c9b133b various small fixes 2020-10-05 01:05:37 +02:00
svlandeg
52b660e9dc initialize and update explanation 2020-10-05 00:39:36 +02:00
Ines Montani
3c36a57e84
Update data augmenters (#6196)
* Draft lower-case augmenter

* Make warning a debug log

* Update lowercase augmenter, docs and tests

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-04 17:46:29 +02:00
svlandeg
b0463fbf75 set_annotations explanation 2020-10-04 14:56:48 +02:00
Ines Montani
43d7652635
Merge pull request #6192 from explosion/feature/init-attr-ruler 2020-10-04 14:46:37 +02:00
Ines Montani
9b3a934361 Update docs [ci skip] 2020-10-04 14:14:55 +02:00
svlandeg
9f40d963fd highlight the two steps: the model and the pipeline component 2020-10-04 14:11:53 +02:00
Ines Montani
11347f34da Tidy up, tests and docs 2020-10-04 13:54:05 +02:00
svlandeg
452b8309f9 slight rewrite to hide some thinc implementation details 2020-10-04 13:26:46 +02:00
svlandeg
08ad349a18 tok2vec layer 2020-10-04 00:08:02 +02:00
svlandeg
2c4b2ee5e9 REL intro and get_candidates function 2020-10-03 23:27:05 +02:00
Ines Montani
989c59918c Update docs [ci skip] 2020-10-03 18:53:39 +02:00
Ines Montani
7c4ab7e82c Fix Lemmatizer.get_lookups_config 2020-10-03 17:16:10 +02:00
Ines Montani
dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani
3b8f352eda Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-03 16:08:27 +02:00
Ines Montani
35d695a031 Update docs 2020-10-03 16:08:24 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani
5fb776556a Update docs [ci skip] 2020-10-03 14:47:02 +02:00
Ines Montani
5413358ba1
Merge pull request #6188 from svlandeg/feature/small-fixes 2020-10-03 11:44:24 +02:00
Ines Montani
eb9b3ff9c5 Update install docs and quickstarts [ci skip] 2020-10-03 11:35:42 +02:00
svlandeg
02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
Sofie Van Landeghem
09dcb75076
small UX fix for DocBin (#6167)
* add informative warning when messing up store_user_data DocBin flags

* add informative warning when messing up store_user_data DocBin flags

* cleanup test

* rename to patterns_path
2020-10-02 15:43:32 +02:00
Ines Montani
f0b30aedad
Make lemmatizers use initialize logic (#6182)
* Make lemmatizer use initialize logic and tidy up

* Fix typo

* Raise for uninitialized tables
2020-10-02 15:42:36 +02:00
Ines Montani
df06f7a792 Update docs [ci skip] 2020-10-02 13:24:33 +02:00
Ines Montani
d2aa662ab2
Merge pull request #6179 from adrianeboyd/feature/token-morph-refactor-2 [ci skip] 2020-10-02 12:10:27 +02:00
Ines Montani
0f11c2150d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-02 11:38:05 +02:00
Ines Montani
32cdc1c4f4 Update docs [ci skip] 2020-10-02 11:38:03 +02:00
Ines Montani
6d8df081bd
Merge pull request #6180 from adrianeboyd/docs/minor-v3-2 [ci skip] 2020-10-02 11:37:25 +02:00
Adriane Boyd
351f352cdc Update Japanese docs and pin for sudachipy 2020-10-02 10:12:44 +02:00
Adriane Boyd
7670df04dd Update Chinese usage docs 2020-10-02 10:09:03 +02:00
Adriane Boyd
3908fff899 Remove tag map sidebar 2020-10-02 09:07:55 +02:00
Adriane Boyd
fd09e6b140 Update docs for Token.morph / Token.set_morph 2020-10-02 09:05:15 +02:00
Ines Montani
01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Ines Montani
6b94cee468 Fix docs [ci skip] 2020-10-02 01:11:19 +02:00
Ines Montani
50162b8726 Try to work around Sharp build issue [ci skip] 2020-10-01 22:27:45 +02:00
Ines Montani
b6b73a3ca8 Update docs [ci skip] 2020-10-01 17:45:29 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
svlandeg
1328c9fd14 consistently use --code instead of --code-path 2020-10-01 16:59:22 +02:00
Sofie Van Landeghem
a22215f427
Add FeatureExtractor from Thinc (#6170)
* move featureextractor from Thinc

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
Ines Montani
0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani
a103ab5f1a Update augmenter lookups and docs 2020-09-30 23:03:47 +02:00
Ines Montani
115481aca7 Update docs [ci skip] 2020-09-30 15:16:00 +02:00
walterhenry
1c65b3b2c0 Proofreading
A few more small things in Usage.
2020-09-30 11:33:40 +02:00
Ines Montani
469f0e539c Fix docs [ci skip] 2020-09-30 10:24:06 +02:00
Ines Montani
9bb958fd0a Fix debug data [ci skip] 2020-09-29 23:07:11 +02:00
Ines Montani
604be54a5c Support --code in evaluate CLI [ci skip] 2020-09-29 21:20:56 +02:00
Ines Montani
d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani
361f91e286
Merge pull request #6135 from walterhenry/develop-proof 2020-09-29 20:49:06 +02:00
Ines Montani
b486389eec
Update website/docs/api/doc.md 2020-09-29 20:48:43 +02:00
Ines Montani
d7469283c5 Update docs [ci skip] 2020-09-29 16:59:21 +02:00
Sofie Van Landeghem
6a04e5adea
encoding UTF8 (#6161) 2020-09-29 14:49:55 +02:00
walterhenry
1d80b3dc1b Proofreading
Finished with the API docs and started on the Usage, but Embedding & Transformers
2020-09-29 12:39:10 +02:00
walterhenry
c1c841940c Merge branch 'develop-proof' of https://github.com/walterhenry/spaCy into develop-proof 2020-09-29 11:47:43 +02:00
svlandeg
64d90039a1 encoding UTF8 2020-09-29 10:54:42 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
walterhenry
3360825e00 Proofreading
Another round of proofreading. All the API docs have been read through and I've grazed the Usage docs.
2020-09-28 16:50:15 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Ines Montani
f29d5b9b89 Update docs [ci skip] 2020-09-27 18:39:38 +02:00
Ines Montani
e06ff8b71d Update docs [ci skip] 2020-09-26 13:18:08 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani
02a1b6ab83 Update links [ci skip] 2020-09-25 13:21:43 +02:00
Ines Montani
2cfe9340a1 Link model components [ci skip] 2020-09-25 13:21:20 +02:00
Ines Montani
c7956a4047 Update models.js [ci skip] 2020-09-25 09:25:46 +02:00
Ines Montani
27c5795ea5 Fix version check in models directory [ci skip] 2020-09-25 09:23:29 +02:00
Ines Montani
2aa4d65734 Update docs [ci skip] 2020-09-24 20:41:09 +02:00
Adriane Boyd
3c062b3911
Add MORPH handling to Matcher (#6107)
* Add MORPH handling to Matcher

* Add `MORPH` to `Matcher` schema
* Rename `_SetMemberPredicate` to `_SetPredicate`
* Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate`
  * Add special handling for normalization and conversion of morph
    values into sets
  * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only
    matches for 0 or 1 values

* Update test

* Rename to IS_SUBSET and IS_SUPERSET
2020-09-24 16:55:09 +02:00
Sofie Van Landeghem
c7eedd3534
updates to NEL functionality (#6132)
* NEL: read sentences and ents from reference

* fiddling with sent_start annotations

* add KB serialization test

* KB write additional file with strings.json

* score_links function to calculate NEL P/R/F

* formatting

* documentation
2020-09-24 16:53:59 +02:00
Ines Montani
6bc5058d13 Update models directory [ci skip] 2020-09-24 14:53:34 +02:00
Ines Montani
58dde293ce
Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2 2020-09-24 14:44:42 +02:00
Ines Montani
74e1f192b4
Merge pull request #6134 from explosion/feature/training_before_to_disk 2020-09-24 14:44:11 +02:00
Ines Montani
3b58a8be2b Update docs 2020-09-24 14:32:42 +02:00
Ines Montani
88e54caa12 accuracy -> performance 2020-09-24 14:32:35 +02:00
Ines Montani
b92c8aae78 Merge branch 'develop' into pr/6135 2020-09-24 13:44:56 +02:00
Ines Montani
6836b66433 Update docs and resolve todos [ci skip] 2020-09-24 13:41:25 +02:00
walterhenry
3dd5f409ec Proofreading
Proofread some API docs
2020-09-24 13:15:28 +02:00
Adriane Boyd
1c63f02f99 Add API docs 2020-09-24 12:51:16 +02:00
Ines Montani
138c8d45db Update docs 2020-09-24 12:43:39 +02:00
Ines Montani
d7ab6a2ffe Update docs [ci skip] 2020-09-24 12:37:21 +02:00
Ines Montani
ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani
e2ffe51fb5 Update docs [ci skip] 2020-09-24 10:13:41 +02:00
Ines Montani
02008e9a55 Update docs [ci skip] 2020-09-23 22:02:31 +02:00
Ines Montani
c8bda92243 Update benchmarks [ci skip] 2020-09-23 20:05:02 +02:00
svlandeg
35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
Ines Montani
50a4425cda Adjust docs 2020-09-23 16:03:32 +02:00
Ines Montani
e4e7f5b00d Update docs [ci skip] 2020-09-23 15:44:40 +02:00
svlandeg
6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani
a9da33c4d9 Fix infobox with ID [ci skip] 2020-09-23 13:00:56 +02:00
Ines Montani
02b69dd0d5 Update models directory [ci skip] 2020-09-23 12:56:54 +02:00
Ines Montani
6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
Ines Montani
60a317520a
Merge pull request #6109 from svlandeg/feature/2rename 2020-09-23 09:47:12 +02:00
Ines Montani
566d048753 Fix project repo link [ci skip] 2020-09-23 09:43:51 +02:00
Ines Montani
930b116f00 Update docs [ci skip] 2020-09-23 09:35:21 +02:00
Ines Montani
d8f661c910 Update docs [ci skip] 2020-09-23 09:30:26 +02:00
svlandeg
b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00
Ines Montani
f9af7d365c Update docs [ci skip] 2020-09-22 09:45:41 +02:00
Ines Montani
49e80dbcac
Merge pull request #6103 from explosion/chore/tidy-up-tests-docs-get-doc 2020-09-22 09:45:04 +02:00
Adriane Boyd
e05d6d358d Update API sidebar MorphAnalysis link 2020-09-22 09:36:37 +02:00
Adriane Boyd
844db6ff12 Update architecture overview 2020-09-22 09:31:47 +02:00
Adriane Boyd
fc9c78da25 Add MorphAnalysis to API sidebar 2020-09-22 09:23:47 +02:00
Adriane Boyd
5fbb8dfcbc Merge remote-tracking branch 'upstream/develop' into docs/various-v3-2 2020-09-22 09:22:58 +02:00
Ines Montani
67fbcb3da5 Tidy up tests and docs 2020-09-21 20:43:54 +02:00
Ines Montani
a5f6ab4943
Merge pull request #6098 from adrianeboyd/feature/doc-init 2020-09-21 18:35:20 +02:00
Adriane Boyd
f212303729 Add sent_starts to Doc.__init__
Add sent_starts to `Doc.__init__`. Officially specify `is_sent_start`
values but also convert to and accept `sent_start` internally.
2020-09-21 17:59:09 +02:00
Adriane Boyd
6aa91c7ca0 Make user_data keyword-only 2020-09-21 16:00:06 +02:00
Ines Montani
e548654aca Update docs [ci skip] 2020-09-21 14:46:55 +02:00
Adriane Boyd
9b8d0b7f90 Alphabetize API sidebars 2020-09-21 13:46:21 +02:00
Adriane Boyd
bc02e86494 Extend Doc.__init__ with additional annotation
Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to
`Doc.__init__` to initialize the most common doc/token values.
2020-09-21 13:36:24 +02:00
Ines Montani
9d32cac736 Update docs [ci skip] 2020-09-21 10:55:36 +02:00
Adriane Boyd
cc71ec901f Fix typo in saving and loading usage docs 2020-09-21 09:08:55 +02:00
Adriane Boyd
3aa57ce6c9 Update alignment mode in Doc.char_span docs 2020-09-21 09:07:20 +02:00
Ines Montani
b9d2b29684 Update docs [ci skip] 2020-09-20 17:49:09 +02:00
Ines Montani
012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani
744f259b9c Update landing [ci skip] 2020-09-20 16:37:23 +02:00
Ines Montani
554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
Ines Montani
0406200a1e Update docs [ci skip] 2020-09-18 15:13:13 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
Ines Montani
d32ce121be Fix docs [ci skip] 2020-09-18 13:41:12 +02:00
Ines Montani
a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal
6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Ines Montani
1bb8b4f824 Merge branch 'master' into develop 2020-09-17 17:46:20 +02:00
Ines Montani
6bd0d25fb9
Merge pull request #6085 from explosion/docs/static-vectors-intro [ci skip] 2020-09-17 17:14:45 +02:00
Ines Montani
a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Ines Montani
2e3ce9f42f Merge branch 'feature/init-config-pretrain' of https://github.com/svlandeg/spaCy into pr/6084 2020-09-17 16:58:49 +02:00
Ines Montani
3d8e010655 Change order 2020-09-17 16:58:46 +02:00
Ines Montani
c4b414b282
Update website/docs/api/cli.md 2020-09-17 16:58:09 +02:00
Sofie Van Landeghem
e5ceec5df0
Update website/docs/api/cli.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-09-17 16:56:20 +02:00
Sofie Van Landeghem
127ce0c574
Update website/docs/api/cli.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-09-17 16:55:53 +02:00
Matthew Honnibal
ec751068f3 Draft text for static vectors intro 2020-09-17 16:42:53 +02:00
svlandeg
5fade4feb7 fix cli abbrev 2020-09-17 16:15:20 +02:00
svlandeg
ddfc1fc146 add pretraining option to init config 2020-09-17 16:05:40 +02:00
svlandeg
c8c84f1ccd Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 15:43:04 +02:00
svlandeg
130ffa5fbf fix typos in docs 2020-09-17 14:59:41 +02:00
Ines Montani
c8fa2247e3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-17 12:34:15 +02:00
Ines Montani
6761028c6f Update docs [ci skip] 2020-09-17 12:34:11 +02:00
svlandeg
0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg
8cedb2f380 Merge branch 'fix/corpus' of https://github.com/svlandeg/spaCy into fix/corpus 2020-09-17 09:27:55 +02:00
svlandeg
781fae678b Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 09:24:36 +02:00
Sofie Van Landeghem
21dcf92964
Update website/docs/api/data-formats.md
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-17 09:21:36 +02:00
Adriane Boyd
7e4cd7575c
Refactor Docs.is_ flags (#6044)
* Refactor Docs.is_ flags

* Add derived `Doc.has_annotation` method

  * `Doc.has_annotation(attr)` returns `True` for partial annotation

  * `Doc.has_annotation(attr, require_complete=True)` returns `True` for
    complete annotation

* Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced`
and `is_nered`

* Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs
for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The
list is the `DocBin` attributes list plus `SPACY` and `LENGTH`.

Notes on `Doc.has_annotation`:

* `HEAD` is converted to `DEP` because heads don't have an unset state

* Accept `IS_SENT_START` as a synonym of `SENT_START`

Additional changes:

* Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for
`DocBin`

* In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override
`SENT_START`

* In `Doc.from_array()` using `attrs` other than
`Doc._get_array_attrs()` (i.e., a user's custom list rather than our
default internal list) with both `HEAD` and `SENT_START` shows a warning
that `HEAD` will override `SENT_START`

* `set_children_from_heads` does not require dependency labels to set
sentence boundaries and sets `sent_start` for all non-sentence starts to
`-1`

* Fix call to set_children_form_heads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-17 00:14:01 +02:00
svlandeg
55f8d5478e fix example output 2020-09-15 22:09:30 +02:00
svlandeg
51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Ines Montani
2214d1bb7b
Merge pull request #6067 from explosion/feature/spacy-blank-from-config 2020-09-15 14:18:33 +02:00
Ines Montani
b7faa38960 Update docs [ci skip] 2020-09-15 12:44:03 +02:00
Ines Montani
0edd695bf6 Update docs 2020-09-15 11:41:49 +02:00
Ines Montani
99549a5ace Fix consistency and update docs 2020-09-15 11:37:37 +02:00
Ines Montani
154752f9c2 Update docs and consistency [ci skip] 2020-09-15 00:32:49 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
b854e0bef9 Update styleguide [ci skip] 2020-09-14 11:25:57 +02:00
Ines Montani
9afb1d9965
Merge pull request #6063 from svlandeg/feature/doc_cleanup [ci skip] 2020-09-14 10:35:43 +02:00
Ines Montani
35156429c4 Update docs [ci skip] 2020-09-14 10:34:50 +02:00
Ines Montani
081413f210 Update docs [ci skip] 2020-09-13 23:46:51 +02:00
Ines Montani
85e5910102 Update docs [ci skip] 2020-09-13 23:09:19 +02:00
Ines Montani
5ebb2a2ac8 Update docs [ci skip] 2020-09-13 22:36:20 +02:00
Ines Montani
47acb45850 Update docs [ci skip] 2020-09-13 22:30:33 +02:00
Ines Montani
2e3d067a7b Update docs [ci skip] 2020-09-13 19:29:06 +02:00
Ines Montani
99b26fe492 Update docs [ci skip] 2020-09-13 17:59:38 +02:00
Sofie Van Landeghem
744df9814a
define threshold for scoring textcat in TextCat config (#6055)
* define threshold for scoring textcat in TextCat config

* fix unit test and documentation
2020-09-13 14:15:52 +02:00
Ines Montani
1316071086 Update docs [ci skip] 2020-09-13 11:31:50 +02:00
Ines Montani
24e138b8ac Update docs [ci skip] 2020-09-12 17:55:02 +02:00
Ines Montani
475a310b13 Update docs [ci skip] 2020-09-12 17:45:19 +02:00
Ines Montani
368ecf705a Update docs [ci skip] 2020-09-12 17:40:50 +02:00
svlandeg
c4f324d5f1 doc fixes 2020-09-12 17:38:54 +02:00
Ines Montani
8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
Ines Montani
0b2e07215d Support overwriting name on spacy package 2020-09-11 11:38:28 +02:00
Ines Montani
4fec8c39a3 Update project teaser [ci skip] 2020-09-10 13:23:03 +02:00
Ines Montani
9f08ea80b4
Merge pull request #6047 from svlandeg/feature/doc-fixes
Fix branch for spacy clone + UX
2020-09-10 13:05:41 +02:00
Ines Montani
763e302dcc Update project widgets and examples [ci skip] 2020-09-10 13:04:16 +02:00
svlandeg
97d99f7efa Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes 2020-09-10 11:51:34 +02:00
Ines Montani
908f3a4494 Update default projects repo [ci skip] 2020-09-10 11:42:14 +02:00
Ines Montani
15bc3a37b4 Add --branch to project clone 2020-09-10 11:08:15 +02:00
Ines Montani
b7afd09d27 Update formatting [ci skip] 2020-09-10 11:07:09 +02:00
svlandeg
9073d99fc9 fix link to shape inference section 2020-09-10 10:22:59 +02:00
Ines Montani
1955aaaa20
Merge pull request #6045 from svlandeg/feature/more-layers-docs [ci skip] 2020-09-09 21:46:40 +02:00
Ines Montani
2e567a47c2 Update docs and formatting 2020-09-09 21:26:10 +02:00
svlandeg
aa27e3f1f2 PyTorch spelling 2020-09-09 16:27:21 +02:00
svlandeg
c89e07927e document individual component API pages 2020-09-09 16:18:38 +02:00
Sofie Van Landeghem
cb66ea7400
Remove simple_ner code (#6041)
* remove simple_ner code

* remove unused _biluo and _iob files
2020-09-09 16:11:27 +02:00
svlandeg
a8aa9a8068 document Pipe API details, crossreferences etc 2020-09-09 15:56:27 +02:00
svlandeg
9a7c6cc61a references to usage page on layers and architectures 2020-09-09 14:47:32 +02:00
svlandeg
e80898092b Merge branch 'feature/more-layers-docs' of https://github.com/svlandeg/spaCy into feature/more-layers-docs 2020-09-09 14:44:28 +02:00
svlandeg
4c080b3a98 details on Thinc shape inference 2020-09-09 13:57:05 +02:00
svlandeg
39aa740777 Merge remote-tracking branch 'upstream/develop' into feature/more-layers-docs 2020-09-09 11:59:34 +02:00
svlandeg
e39242c4e6 formatting 2020-09-09 11:25:35 +02:00
Ines Montani
24053d83ec Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-09 11:20:14 +02:00
Ines Montani
406aed78ee Update docs [ci skip] 2020-09-09 11:20:07 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Marek Grzenkowicz
a26f864ed3
Clarify how to choose pretrained weights files (closes #6027) [ci skip] (#6039) 2020-09-08 21:13:50 +02:00
svlandeg
a16afb79e3 add section on Thinc implementation details 2020-09-08 20:43:09 +02:00
svlandeg
1c476b4b41 how to register and use custom function 2020-09-08 20:22:20 +02:00
svlandeg
b35a26ea5d example wrapped Torch model and chaining with Thinc 2020-09-08 18:32:58 +02:00
svlandeg
bd8f9b188b small fixes 2020-09-08 17:24:36 +02:00
Ines Montani
d98ae9d918 Update docs [ci skip] 2020-09-08 10:33:48 +02:00
Ines Montani
bb62e3c8fc Fix dropdown [ci skip] 2020-09-06 23:43:50 +02:00
Ines Montani
c443c82722 Update docs [ci skip] 2020-09-05 13:41:10 +02:00
Ines Montani
b3e338d65e Update docs [ci skip] 2020-09-04 20:58:36 +02:00
Ines Montani
157caf4dfa WIP: update docs [ci skip] 2020-09-04 16:30:31 +02:00
Ines Montani
f174c7b1f3 Merge branch 'develop' into pr/6018 2020-09-04 15:54:49 +02:00
Ines Montani
f06eed800e
Merge pull request #6029 from explosion/master-tmp 2020-09-04 15:11:55 +02:00
Ines Montani
33d9c64977 Fix outbound link and update package lock [ci skip] 2020-09-04 14:44:38 +02:00
Ines Montani
f9550b4493 Fix components in meta.json and website [ci skip] 2020-09-04 14:42:12 +02:00
Ines Montani
c28f73ddfd Update package-lock.json 2020-09-04 14:41:55 +02:00
Ines Montani
ba6cf9821f Replace docs analytics [ci skip] 2020-09-04 14:28:28 +02:00
Ines Montani
8651022774 Fix outbound link [ci skip] 2020-09-04 14:27:46 +02:00
Ines Montani
afdf14c717 Remove Google Analytics [ci skip] 2020-09-04 14:21:41 +02:00
Ines Montani
864a697e63 Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
Adriane Boyd
b927893309
Merge branch 'develop' into feature/dependency-matcher-v3 2020-09-04 13:03:30 +02:00
Ines Montani
2189046869
Merge pull request #6024 from explosion/chore/registry-renaming 2020-09-04 10:54:10 +02:00
Brad Jascob
2160aafec6
Updates spaCy Universe for amrlib (#6020)
* Updates spaCy Universe for amrlib

* Updates to doc based on feedback
2020-09-04 10:03:35 +02:00
Ines Montani
4daf138136 Fix alphabetic ordering [ci skip] 2020-09-03 23:01:50 +02:00
Ines Montani
b1eb98b15c Remove todos [ci skip] 2020-09-03 17:43:58 +02:00
Ines Montani
23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani
5afe6447cd registry.assets -> registry.misc 2020-09-03 17:31:14 +02:00
Ines Montani
c063e55eb7 Add prefix to batchers 2020-09-03 17:30:41 +02:00
Ines Montani
804f120361 Don't use registered function version in title 2020-09-03 17:29:47 +02:00
Ines Montani
c53b1433b9 Adjust more arguments [ci skip] 2020-09-03 17:12:24 +02:00
Ines Montani
121809dd1e Fix anchor [ci skip] 2020-09-03 16:49:56 +02:00
Ines Montani
25a595dc10 Fix typos and wording [ci skip] 2020-09-03 16:37:45 +02:00
Ines Montani
b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Ines Montani
b02ad8045b Update docs [ci skip] 2020-09-03 10:10:13 +02:00
Ines Montani
1815c613c9 Update docs [ci skip] 2020-09-03 10:07:45 +02:00
Adriane Boyd
960d9cfadc Officially support DependencyMatcher
Add official support for the `DependencyMatcher`. Redesign the pattern
specification. Fix and extend operator implementations. Update API docs
and add usage docs.

Patterns
--------

Refactor pattern structure to:

```
{
  "LEFT_ID": str,
  "REL_OP": str,
  "RIGHT_ID": str,
  "RIGHT_ATTRS": dict,
}
```

The first node contains only `RIGHT_ID` and `RIGHT_ATTRS` and all
subsequent nodes contain all four keys.

New operators
-------------

Because of the way patterns are constructed from left to right, it's
helpful to have `follows` operators along with `precedes` operators. Add
operators for simple precedes / follows alongside immediate precedes /
follows.

* `.*`: precedes
* `;`: immediately follows
* `;*`: follows

Operator fixes
--------------

* `<` and `<<` do not include the node itself
* Fix reversed order for all operators involving linear precedence (`.`,
  all sibling operators)
* Linear precedence operators do not match nodes outside the same parse

Additional fixes
----------------

* Use v3 Matcher API
* Support `get` and `remove`
* Support pickling
2020-09-02 17:45:29 +02:00
svlandeg
19298de352 small fix 2020-09-02 17:43:11 +02:00
svlandeg
bbaea530f6 sublayers paragraph 2020-09-02 17:36:22 +02:00
svlandeg
1be7ff02a6 swapping section 2020-09-02 15:26:07 +02:00
svlandeg
57e432ba2a editor tip as Accordion instead of Infobox 2020-09-02 14:26:57 +02:00
svlandeg
d19ec6c67b small rewrites in types paragraph 2020-09-02 14:25:18 +02:00
svlandeg
821b2d4e63 update examples 2020-09-02 14:15:50 +02:00
svlandeg
e29a33449d rewrite intro, simpel Model example 2020-09-02 13:41:18 +02:00
svlandeg
422df9c2e2 Merge remote-tracking branch 'upstream/develop' into feature/docs-layers
# Conflicts:
#	website/docs/usage/layers-architectures.md
2020-09-02 13:17:11 +02:00
Ines Montani
70238543c8 Update layers/arch docs structure [ci skip] 2020-09-02 13:04:35 +02:00
svlandeg
6fd7f140ec custom-architectures section 2020-09-02 11:14:06 +02:00
svlandeg
3d9ae9286f small fixes 2020-09-02 10:46:38 +02:00
Ines Montani
690bd77669 Add todos [ci skip] 2020-09-01 14:04:36 +02:00
Ines Montani
70b226f69d Support ignore marker in project document [ci skip] 2020-09-01 12:49:04 +02:00
Ines Montani
9af82f3f11
Merge pull request #6003 from explosion/feature/matcher-as-spans 2020-08-31 17:50:56 +02:00
Sofie Van Landeghem
3ac620f09d
fix config example [ci skip] 2020-08-31 17:40:04 +02:00
Ines Montani
3929431af1 Update docs [ci skip] 2020-08-31 17:06:33 +02:00
Ines Montani
add9de5487 Deprecate (Phrase)Matcher.pipe 2020-08-31 17:01:24 +02:00
svlandeg
2c3b64a567 console logging example 2020-08-31 16:56:13 +02:00
Ines Montani
bca6bf8dda Update docs [ci skip] 2020-08-31 16:39:53 +02:00
Ines Montani
db9f8896f5 Add docs [ci skip] 2020-08-31 16:10:41 +02:00
svlandeg
fe6c08218e fixes 2020-08-31 14:51:49 +02:00
svlandeg
0e0abb0378 fix 2020-08-31 14:50:29 +02:00
svlandeg
56ba691ecd small fixes 2020-08-31 14:46:00 +02:00
svlandeg
e47ea88aeb revert annotations refactor 2020-08-31 14:40:55 +02:00
svlandeg
13ee742fb4 example of custom logger 2020-08-31 14:24:41 +02:00
svlandeg
2c90a06fee some more information about the loggers 2020-08-31 13:43:17 +02:00
svlandeg
c18eb63483 Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
# Conflicts:
#	website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example (#5989)
* Update suffixes example

The current example will throw `TypeError: can only concatenate list (not "tuple") to list`

* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Sofie Van Landeghem
ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Adriane Boyd
216efaf5f5 Restrict tokenizer exceptions to ORTH and NORM 2020-08-31 09:55:01 +02:00
Ines Montani
9b86312bab Update docs [ci skip] 2020-08-29 18:43:19 +02:00
Adriane Boyd
870774f475
Merge branch 'develop' into docs/morph-usage-v3 2020-08-29 16:00:50 +02:00
Ines Montani
45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components 2020-08-29 15:58:41 +02:00
Adriane Boyd
f9ed31a757 Update usage docs for lemmatization and morphology 2020-08-29 15:56:50 +02:00
Ines Montani
bc0730be3f Update docs [ci skip] 2020-08-29 12:53:14 +02:00
Ines Montani
450bf806b0
Merge pull request #5991 from adrianeboyd/docs/sent-usage-v3
Update sentence segmentation usage docs
2020-08-29 12:40:06 +02:00
Ines Montani
66d76f5126 Update docs 2020-08-29 12:36:05 +02:00
svlandeg
9f00a20ce4 proofreading and custom examples 2020-08-28 21:50:42 +02:00
svlandeg
5230529de2 add loggers registry & logger docs sections 2020-08-28 21:44:04 +02:00
Adriane Boyd
48df50533d Update sentence segmentation usage docs
Update sentence segmentation usage docs to incorporate `senter`.
2020-08-28 10:58:16 +02:00
svlandeg
72a87095d9 add loggers registry 2020-08-27 20:26:28 +02:00
svlandeg
aa9e0c9c39 small fix 2020-08-27 19:56:52 +02:00
svlandeg
8cde6ccb7d Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs 2020-08-27 19:56:09 +02:00
svlandeg
556e975a30 various fixes 2020-08-27 19:24:44 +02:00
Ines Montani
ff4175e839 Add more info to debug config 2020-08-27 18:17:58 +02:00
svlandeg
329e490560 small import fixes 2020-08-27 14:50:43 +02:00
svlandeg
28e4ba7270 fix references to TransformerListener 2020-08-27 14:33:28 +02:00
svlandeg
4d37ac3f33 configure_custom_sent_spans example 2020-08-27 14:14:16 +02:00
svlandeg
c68169f83f fix link 2020-08-27 10:19:43 +02:00
svlandeg
acc794c975 example of writing to other custom attribute 2020-08-27 10:10:10 +02:00
svlandeg
559b65f2e0 adjust references to null_annotation_setter to trfdata_setter 2020-08-27 09:43:32 +02:00
Ines Montani
696f167478 Add diff example to docs [ci skip] 2020-08-26 15:57:54 +02:00
Adriane Boyd
90d88729e0
Add AttributeRuler.score (#5963)
* Add AttributeRuler.score

Add scoring for TAG / POS / MORPH / LEMMA if these are present in the
assigned token attributes.

Add default score weights (that don't really make a lot of sense) so
that the scores are in the default config in some form.

* Update docs
2020-08-26 15:39:30 +02:00
svlandeg
ec069627fe rename to TransformerListener 2020-08-26 13:31:01 +02:00
Ines Montani
627617a079 Tidy up and add docs [ci skip] 2020-08-26 13:24:55 +02:00
svlandeg
15902c5aa2 fix link 2020-08-26 11:51:57 +02:00
svlandeg
feb86d5206 clarify default 2020-08-26 11:21:30 +02:00
Ines Montani
f31c4462ca Update docs [ci skip] 2020-08-25 13:27:59 +02:00
Ines Montani
8ac5ef1284 Update docs 2020-08-25 11:54:37 +02:00
Matthew Honnibal
8038b87f04
Various small tweaks to project CLI (#5965)
* Fix up/download of http and local paths

* Support git_sparse_checkout for assets

* Fix scorer

* Handle already-present directories for git assets

* Improve convert command

* Fix support for existant files in git assets

* Support branches in git sparse checkout

* Format

* Fix git assets

* Document git block in assets

* Fix test

* Fix test

* Revert "Fix test"

This reverts commit cf3097260f.

* Revert "Fix test"

This reverts commit 964d636e27.

* Dont multiply p/r/f by 100

* Display scores * 100 during training
2020-08-25 00:30:52 +02:00
Ines Montani
967d69ec50 Fix website deployment [ci skip] 2020-08-24 14:28:24 +02:00
Ines Montani
26405710e0 Add icon credit [ci skip] 2020-08-24 10:28:15 +02:00
Matthew Honnibal
e559867605
Allow spacy project to push and pull to/from remote storage (#5949)
* Add utils for working with remote storage

* WIP add remote_cache for project

* WIP add push and pull commands

* Use pathy in remote_cache

* Updarte util

* Update remote_cache

* Update util

* Update project assets

* Update pull script

* Update push script

* Fix type annotation in util

* Work on remote storage

* Remove site and env hash

* Fix imports

* Fix type annotation

* Require pathy

* Require pathy

* Fix import

* Add a util to handle project variable substitution

* Import push and pull commands

* Fix pull command

* Fix push command

* Fix tarfile in remote_storage

* Improve printing

* Fiddle with status messages

* Set version to v3.0.0a9

* Draft docs for spacy project remote storages

* Update docs [ci skip]

* Use Thinc config to simplify and unify template variables

* Auto-format

* Don't import Pathy globally for now

Causes slow and annoying Google Cloud warning

* Tidy up test

* Tidy up and update tests

* Update to latest Thinc

* Update docs

* variables -> vars

* Update docs [ci skip]

* Update docs [ci skip]

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Ines Montani
f27aecac14 Update formatting [ci skip] 2020-08-23 11:57:56 +02:00
Ines Montani
98a9e063b6 Update docs [ci skip] 2020-08-22 17:15:05 +02:00
Matthew Honnibal
8dfc4cbfe7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-22 17:12:09 +02:00
Matthew Honnibal
048de64d4c Suggest edits 2020-08-22 17:11:28 +02:00
Ines Montani
adcf790b96 Update docs[ci skip] 2020-08-22 17:04:16 +02:00
Ines Montani
37ebff6997 Update docs [ci skip] 2020-08-22 16:47:03 +02:00
Matthew Honnibal
8685229891 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-22 16:06:59 +02:00
Matthew Honnibal
d97695d09d Update embeddings-transformers.md 2020-08-22 15:41:35 +02:00
Ines Montani
c7c9b0451f Update docs [ci skip] 2020-08-22 13:52:52 +02:00
Ines Montani
9740f1712b Re-add font with lowercase name [ci skip] 2020-08-22 12:27:02 +02:00
Ines Montani
ff5ba14d06 Remove font [ci skip] 2020-08-22 12:26:50 +02:00
Ines Montani
71aeae89c5
Merge pull request #5948 from svlandeg/feature/docs-docs-docs [ci skip] 2020-08-22 12:18:47 +02:00
Ines Montani
27f81109d6 Update docs [ci skip] 2020-08-21 20:02:18 +02:00
Ines Montani
f102164a1f Update docs [ci skip] 2020-08-21 19:34:06 +02:00
svlandeg
1b7cfa7347 Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs 2020-08-21 18:36:18 +02:00
svlandeg
942adf0f4d comma 2020-08-21 18:36:02 +02:00
svlandeg
262552010d context manager with space (for consistency) 2020-08-21 18:34:02 +02:00
svlandeg
da48c6a2a2 several small updates 2020-08-21 18:25:26 +02:00
svlandeg
ad2332d4b7 alphabetize registries 2020-08-21 18:10:31 +02:00
svlandeg
dc98f69b57 alphabetize registries 2020-08-21 18:10:21 +02:00
svlandeg
c6659e37d8 small fixes 2020-08-21 18:02:20 +02:00
svlandeg
518a1f97f3 remove outdated TODO's 2020-08-21 17:55:15 +02:00
svlandeg
e92bd6e1c1 alphabetize training lists 2020-08-21 17:42:19 +02:00
Ines Montani
2cc4640385 Update docs [ci skip] 2020-08-21 16:21:55 +02:00
Ines Montani
74cb6d39d0 Update docs [ci skip] 2020-08-21 16:11:38 +02:00
Matthew Honnibal
f5bcc10268 Update architectures 2020-08-21 15:34:54 +02:00
Matthew Honnibal
7ed8f4504b Update API docs for architectures 2020-08-21 15:22:19 +02:00
svlandeg
dcc21e44cb delete empty file 2020-08-21 15:17:20 +02:00
svlandeg
3060e4ae65 Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs
# Conflicts:
#	website/src/widgets/quickstart-training-generator.js
2020-08-21 15:16:30 +02:00
svlandeg
cc926267f8 small fixes 2020-08-21 15:05:40 +02:00
Ines Montani
aa6a7cd6e7 Update docs and consistency [ci skip] 2020-08-21 13:49:18 +02:00
Ines Montani
52bd3a8b48 Update docs [ci skip] 2020-08-21 13:22:59 +02:00
Ines Montani
e60442d83a Adjust label casing in displaCy NER visualizer (resolves #4866)
- Accept any case for label names in ents and colors option, even if actual predicted label uses different casing
- Don't text-transform: uppercase visually, if it's important to users that the label is represented as-is in the UI
2020-08-21 11:51:31 +02:00
Ines Montani
04e4d59235 Update docs [ci skip] 2020-08-20 16:17:25 +02:00
Ines Montani
7f2e4244df
Merge pull request #5941 from svlandeg/feature/update-more-docs 2020-08-20 11:21:24 +02:00
Ines Montani
6ad59d59fe Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip] 2020-08-20 11:20:58 +02:00
Ines Montani
fb51b55eb9 Add comment [ci skip] 2020-08-20 11:20:43 +02:00
Sofie Van Landeghem
410b54e10e
Update website/docs/api/data-formats.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-20 11:15:34 +02:00
svlandeg
ae719b354f fix typos 2020-08-20 10:20:40 +02:00
svlandeg
f728c00cbb Merge remote-tracking branch 'upstream/develop' into feature/update-more-docs
# Conflicts:
#	website/docs/api/data-formats.md
2020-08-20 10:02:13 +02:00
svlandeg
229033831a add explanation of raw_text 2020-08-20 10:00:45 +02:00
Ines Montani
2253d26b82 Update vectors and similarity docs [ci skip] 2020-08-19 21:18:26 +02:00
Ines Montani
ea6640ea72
Merge pull request #5939 from explosion/feature/thinc-v8.0.0a28
Update Thinc and config variables
2020-08-19 21:14:36 +02:00
Ines Montani
15e6feed01 Update docs [ci skip] 2020-08-19 20:37:54 +02:00
svlandeg
09f3cfc985 add version 2020-08-19 19:58:45 +02:00
svlandeg
7d9f00bdbf waltzing schedule 2020-08-19 19:53:00 +02:00
Ines Montani
3dd390b1a1 Update Thinc and config variables 2020-08-19 19:46:12 +02:00
svlandeg
85b39639e1 small fix 2020-08-19 19:17:36 +02:00
svlandeg
d8f6abdc23 add linking TODO back in 2020-08-19 18:00:35 +02:00
svlandeg
169b5bcda0 Merge remote-tracking branch 'upstream/develop' into feature/update-docs
# Conflicts:
#	website/docs/usage/training.md
2020-08-19 17:58:25 +02:00
svlandeg
7119295a8a badgers intro 2020-08-19 17:53:22 +02:00
svlandeg
4906a2ae6c custom functions intro 2020-08-19 17:32:35 +02:00
svlandeg
7a2e6a96f5 fix typo 2020-08-19 16:54:16 +02:00
svlandeg
648499157a rename "custom models" to "custom functions" 2020-08-19 16:53:51 +02:00
Ines Montani
63921161c8 Update docs [ci skip] 2020-08-19 16:04:21 +02:00
svlandeg
d3a8321172 fix typos 2020-08-19 15:12:12 +02:00
svlandeg
60fedb8518 fix 2 more API lines 2020-08-19 14:55:32 +02:00
svlandeg
2dfd919585 add kb_loader and get_candidates back to EL API 2020-08-19 14:52:49 +02:00
Ines Montani
e2f2ef3a5a Update init config and recommendations
- As much as I dislike YAML, it seemed like a better format here because it allows us to add comments if we want to explain the different recommendations
- Don't include the generated JS in the repo by default and build it on the fly when running or deploying the site. This ensures it's always up to date.
- Simplify jinja_to_js script and use fewer dependencies
2020-08-19 13:33:15 +02:00
Ines Montani
225f8866a1 Fix consistency 2020-08-19 12:47:57 +02:00
Ines Montani
9c25656ccc Update docs [ci skip] 2020-08-19 12:14:41 +02:00
Ines Montani
2285e59765
Merge pull request #5933 from svlandeg/feature/more-v3-docs [ci skip] 2020-08-19 11:29:02 +02:00
Ines Montani
13291e97ba Update docs [ci skip] 2020-08-19 00:28:37 +02:00
svlandeg
6ed67d495a format 2020-08-18 19:43:20 +02:00
svlandeg
f9fe5eb323 clean up example 2020-08-18 19:35:23 +02:00
svlandeg
a8acedd4ba example of custom reader and batcher 2020-08-18 19:15:16 +02:00
svlandeg
0d55b6ebb4 formatting 2020-08-18 18:55:56 +02:00
svlandeg
abba639565 Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs 2020-08-18 18:55:12 +02:00
Sofie Van Landeghem
358cbb21e3
Define candidate generator in EL config (#5876)
* candidate generator as separate part of EL config

* update comment

* ent instead of str as input for candidate generation

* Span instead of str: correct type indication

* fix types

* unit test to create new candidate generator

* fix replace_pipe argument passing

* move error message, general cleanup

* add vocab back to KB constructor

* provide KB as callable from Vocab arg

* rename to kb_loader, fix KB serialization as part of the EL pipe

* fix typo

* reformatting

* cleanup

* fix comment

* fix wrongly duplicated code from merge conflict

* rename dump to to_disk

* from_disk instead of load_bulk

* update test after recent removal of set_morphology in tagger

* remove old doc
2020-08-18 16:10:36 +02:00
Ines Montani
82f0e20318 Update docs and consistency [ci skip] 2020-08-18 14:39:40 +02:00
Matthew Honnibal
b72bd1767f Remove todo 2020-08-18 13:52:22 +02:00
Matthew Honnibal
574fd53289 Add precision/recall description 2020-08-18 13:51:08 +02:00
Matthew Honnibal
96a9c65f97 Add model architectures intro 2020-08-18 13:50:55 +02:00
svlandeg
705e1cb06c typo in link 2020-08-18 12:04:05 +02:00
svlandeg
f7b76d2d83 Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs 2020-08-18 11:57:52 +02:00
svlandeg
8dcda351ec typo's and quick note on default values 2020-08-18 10:23:27 +02:00
Ines Montani
ef6cf3b276 Update docs [ci skip] 2020-08-18 01:29:34 +02:00
Ines Montani
1c3bcfb488 Update docs and util consistency 2020-08-18 01:22:59 +02:00
Ines Montani
728fec0194 Update docs [ci skip] 2020-08-18 00:49:19 +02:00
Ines Montani
9299166c75
Merge pull request #5925 from explosion/docs/vectors [ci skip]
Update the 'vectors' docs page
2020-08-17 21:45:09 +02:00
Ines Montani
990c6b4c32 Update docs and CLI [ci skip] 2020-08-17 21:38:20 +02:00
svlandeg
4fe4bab1c9 typo fixes 2020-08-17 17:10:15 +02:00
svlandeg
da80c18660 merge develop into branch 2020-08-17 16:57:18 +02:00
Ines Montani
3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Matthew Honnibal
052d82aa4e Suggest vectors changes 2020-08-17 15:32:30 +02:00
svlandeg
961e818be6 p/r definitions 2020-08-17 15:02:39 +02:00
svlandeg
6b6f7f3e73 fix windows compat 2020-08-17 14:48:58 +02:00
svlandeg
319692aa53 fix typos 2020-08-17 14:05:48 +02:00
Matthew Honnibal
61dfdd9fbd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-16 20:30:01 +02:00
Matthew Honnibal
be07567ac6 Update transformers page 2020-08-16 20:29:50 +02:00
Matthew Honnibal
8e5f99ee25 Update transformer docs intro. Also write system requirements 2020-08-16 20:13:24 +02:00
Ines Montani
2ac4b0ef3e Finish Transformer docs [ci skip] 2020-08-16 15:56:32 +02:00
Ines Montani
6ae83bde0c Fix CLI consistency [ci skip] 2020-08-16 15:46:29 +02:00
Ines Montani
a570c304df Update quickstart, template and docs 2020-08-15 14:50:29 +02:00
Ines Montani
8736bfc052 Add comment about auto-generated file [ci skip] 2020-08-13 23:27:25 +02:00
Ines Montani
88b0a96801 Update for new Thinc and adjust config 2020-08-13 17:38:30 +02:00
Matthew Honnibal
965805f372 Add draft transformer template 2020-08-13 15:21:42 +02:00
Matthew Honnibal
efcf15bddf Fix quickstart cpu template 2020-08-13 15:21:26 +02:00
Ines Montani
7d526d0d40 Update docs and quickstart widget [ci skip] 2020-08-13 01:17:40 +02:00
Ines Montani
950832f087
Tidy up pipes (#5906)
* Tidy up pipes

* Fix init, defaults and raise custom errors

* Update docs

* Update docs [ci skip]

* Apply suggestions from code review

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* Tidy up error handling and validation, fix consistency

* Simplify get_examples check

* Remove unused import [ci skip]

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-11 23:29:31 +02:00
Ines Montani
b7ec06e331 Update docs [ci skip] 2020-08-11 20:57:23 +02:00
Ines Montani
10f42e3a39 Update docs [ci skip] 2020-08-11 00:09:49 +02:00
Ines Montani
2778d04377 Update docs [ci skip] 2020-08-10 23:41:09 +02:00
Ines Montani
adf2b1c8a9 Update graphic [ci skip] 2020-08-10 17:20:04 +02:00
Ines Montani
023ba7ae26 Update docs 2020-08-10 17:13:11 +02:00
Ines Montani
c099f6eece Add Token.lex 2020-08-10 16:43:52 +02:00
Ines Montani
64f2f84098 Update docstrings and docs [ci skip] 2020-08-10 13:45:22 +02:00
Ines Montani
12052bd8f6 Update docs [ci skip] 2020-08-10 01:20:10 +02:00
Ines Montani
0832cdd443 Fix formatting [ci skip] 2020-08-10 00:46:32 +02:00
Ines Montani
d611cbef43 Update docs [ci skip] 2020-08-10 00:42:26 +02:00
Ines Montani
c044460823 Update docs [ci skip] 2020-08-10 00:01:38 +02:00
Ines Montani
05dcab10aa Fix typo 2020-08-09 22:34:03 +02:00
Ines Montani
d5c78c7a34 Update docs and fix consistency 2020-08-09 22:31:52 +02:00
Ines Montani
a15c5fb191 Update docstrings and docs 2020-08-09 16:10:48 +02:00
Ines Montani
8d2baa153d Update tokenizer docs and add test 2020-08-09 15:24:01 +02:00
Ines Montani
46bc513a4e Update docs [ci skip] 2020-08-07 20:14:31 +02:00
Ines Montani
fe29ceec9e Merge branch 'develop' into docs/model-docstrings 2020-08-07 18:42:01 +02:00
Ines Montani
470b6f8073 Update docs 2020-08-07 18:41:15 +02:00
Ines Montani
3901b088ff Update graphics and 101 [ci skip] 2020-08-07 17:14:13 +02:00
Ines Montani
5e1421e5a6 Update docs [ci skip] 2020-08-07 16:23:12 +02:00
Ines Montani
b7e34c1451 Update docs [ci skip] 2020-08-07 16:13:13 +02:00
Ines Montani
6f3649923c
Merge pull request #5893 from explosion/feature/validate-arg 2020-08-07 15:47:20 +02:00
Ines Montani
e829d3bf14 Update docs [ci skip] 2020-08-07 15:46:20 +02:00
Adriane Boyd
e962784531
Add Lemmatizer and simplify related components (#5848)
* Add Lemmatizer and simplify related components

* Add `Lemmatizer` pipe with `lookup` and `rule` modes using the
`Lookups` tables.
* Reduce `Tagger` to a simple tagger that sets `Token.tag` (no pos or lemma)
* Reduce `Morphology` to only keep track of morph tags (no tag map, lemmatizer,
or morph rules)
* Remove lemmatizer from `Vocab`
* Adjust many many tests

Differences:

* No default lookup lemmas
* No special treatment of TAG in `from_array` and similar required
* Easier to modify labels in a `Tagger`
* No extra strings added from morphology / tag map

* Fix test

* Initial fix for Lemmatizer config/serialization

* Adjust init test to be more generic

* Adjust init test to force empty Lookups

* Add simple cache to rule-based lemmatizer

* Convert language-specific lemmatizers

Convert language-specific lemmatizers to component lemmatizers. Remove
previous lemmatizer class.

* Fix French and Polish lemmatizers

* Remove outdated UPOS conversions

* Update Russian lemmatizer init in tests

* Add minimal init/run tests for custom lemmatizers

* Add option to overwrite existing lemmas

* Update mode setting, lookup loading, and caching

* Make `mode` an immutable property
* Only enforce strict `load_lookups` for known supported modes
* Move caching into individual `_lemmatize` methods

* Implement strict when lang is not found in lookups

* Fix tables/lookups in make_lemmatizer

* Reallow provided lookups and allow for stricter checks

* Add lookups asset to all Lemmatizer pipe tests

* Rename lookups in lemmatizer init test

* Clean up merge

* Refactor lookup table loading

* Add helper from `load_lemmatizer_lookups` that loads required and
optional lookups tables based on settings provided by a config.

Additional slight refactor of lookups:

* Add `Lookups.set_table` to set a table from a provided `Table`
* Reorder class definitions to be able to specify type as `Table`

* Move registry assets into test methods

* Refactor lookups tables config

Use class methods within `Lemmatizer` to provide the config for
particular modes and to load the lookups from a config.

* Add pipe and score to lemmatizer

* Simplify Tagger.score

* Add missing import

* Clean up imports and auto-format

* Remove unused kwarg

* Tidy up and auto-format

* Update docstrings for Lemmatizer

Update docstrings for Lemmatizer.

Additionally modify `is_base_form` API to take `Token` instead of
individual features.

* Update docstrings

* Remove tag map values from Tagger.add_label

* Update API docs

* Fix relative link in Lemmatizer API docs
2020-08-07 15:27:13 +02:00
Adriane Boyd
4aecccf153 Update API docs for AttributeRuler.__init__ 2020-08-07 15:17:25 +02:00
Ines Montani
a8404c3517 validation -> validate 2020-08-07 14:43:47 +02:00
Ines Montani
1d01d89b79 Update CLI docs and evaluate command [ci skip] 2020-08-07 14:40:58 +02:00
Ines Montani
ef2c67cca5
Add DocBin to/from_disk methods and update docs (#5892)
* Add DocBin to/from_disk methods and update docs

* Use DocBin.from_disk in Corpus
2020-08-07 14:30:59 +02:00
Ines Montani
4ca08c6d5d
Merge pull request #5891 from adrianeboyd/docs/attribute-ruler-api
Add AttributeRuler API docs
2020-08-07 13:55:12 +02:00
Adriane Boyd
b8d0c23857 Add AttributeRuler API docs
With additional minor updates to AttributeRuler docstrings.
2020-08-07 12:43:23 +02:00
svlandeg
824f4b2107 casing consistent 2020-08-06 23:20:13 +02:00
svlandeg
b17db0e994 Merge remote-tracking branch 'upstream/develop' into feature/el-docs
# Conflicts:
#	website/docs/usage/training.md
2020-08-06 19:48:52 +02:00
svlandeg
49ddeb99ea add textcat architectures documentation 2020-08-06 19:44:47 +02:00
Ines Montani
e5995904d6 Update docs 2020-08-06 19:30:43 +02:00
svlandeg
e8fd0c1f1e EL architectures documentation 2020-08-06 17:41:26 +02:00
svlandeg
f396f091dc update EL API 2020-08-06 16:40:48 +02:00
svlandeg
81d0b1c390 update EL pipe arguments 2020-08-06 16:22:50 +02:00
svlandeg
0b4d1e1bc4 'debug data' instead of 'debug-data' 2020-08-06 15:47:31 +02:00
svlandeg
881e3f8fd0 add docbin explanation and example 2020-08-06 15:29:44 +02:00
Ines Montani
5d417d3b19 WIP: Update docs [ci skip] 2020-08-06 13:10:15 +02:00
Ines Montani
4d34efa697 Tidy up docs components [ci skip] 2020-08-06 01:22:49 +02:00
Ines Montani
30f316c688 Fix server-side rendering [ci skip] 2020-08-06 00:51:55 +02:00
Ines Montani
06e80d95cd
Sync develop with nightly docs state (#5883)
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2020-08-06 00:28:14 +02:00
Ines Montani
5cc0d89fad
Simplify config overrides in CLI and deserialization (#5880) 2020-08-05 23:35:09 +02:00
Ines Montani
50311a4d37 Update docs [ci skip] 2020-08-05 20:29:53 +02:00
Ines Montani
2a4d56e730 Update docs 2020-08-05 15:01:00 +02:00
Ines Montani
cdec46493f Update docs 2020-08-05 15:00:54 +02:00
Bram Vanroy
9e45d064bb
Update universe details spacy_conll (#5871) 2020-08-05 14:34:12 +02:00
Adriane Boyd
c62fd878a3
Allow Doc.char_span to snap to token boundaries (#5849)
* Allow Doc.char_span to snap to token boundaries

Add a `mode` option to allow `Doc.char_span` to snap to token
boundaries. The `mode` options:

* `strict`: character offsets must match token boundaries (default, same as
before)
* `inside`: all tokens completely within the character span
* `outside`: all tokens at least partially covered by the character span

Add a new helper function `token_by_char` that returns the token
corresponding to a character position in the text. Update
`token_by_start` and `token_by_end` to use `token_by_char` for more
efficient searching.

* Remove unused import

* Rename mode to alignment_mode

Rename `mode` to `alignment_mode` with the options
`strict`/`contract`/`expand`. Any unrecognized modes are silently
converted to `strict`.
2020-08-04 13:36:32 +02:00
Ines Montani
4c055f0aa7
Add init CLI and init config (#5854)
* Add init CLI and init config draft

* Improve config validation

* Auto-format

* Don't export anything in debug config

* Update docs
2020-08-02 15:18:30 +02:00
Ines Montani
e393ebd78b
Merge pull request #5851 from explosion/feature/better-pipe-analysis 2020-08-01 14:20:27 +02:00
Ines Montani
b40f44419b Simplify pipe analysis
- remove unused code
- don't print by default
- integrate attrs info into analysis output
2020-08-01 13:40:06 +02:00
Ines Montani
93144bde97 Update code block style [ci skip] 2020-07-31 18:55:55 +02:00
Ines Montani
98c6a85c8b Update docs [ci skip] 2020-07-31 18:55:38 +02:00
Ines Montani
e9e8fa2466 Update docs and types 2020-07-31 17:02:54 +02:00
Ines Montani
6365837ca9
Merge pull request #5833 from explosion/feature/scorer-adjustments 2020-07-31 14:00:39 +02:00
Ines Montani
5a221f79c2 Revert "Remove keyword-only from Scorer API docs" [ci skip]
This reverts commit 7a6ac47dc1.
2020-07-31 14:00:21 +02:00
Ines Montani
160f1a5f94 Update docs [ci skip] 2020-07-31 13:26:39 +02:00
Adriane Boyd
9b509aa87f Move Language.evaluate scorer config to new arg
Move `Language.evaluate` scorer config from `component_cfg` to separate
argument `scorer_cfg`.
2020-07-31 11:05:16 +02:00
Adriane Boyd
9d79916792 Merge branch 'develop' into feature/scorer-adjustments 2020-07-31 10:48:14 +02:00
Ines Montani
3449c45fd9 Update docs [ci skip] 2020-07-29 19:48:26 +02:00
Ines Montani
9c80cb673d Update docs [ci skip] 2020-07-29 19:41:34 +02:00
Ines Montani
9f69afdd1e Update docs [ci skip] 2020-07-29 19:09:44 +02:00
Ines Montani
7a21775cd0
Merge pull request #5834 from explosion/feature/vectors 2020-07-29 18:49:26 +02:00
Ines Montani
6a5c853edb Fix docs [ci skip] 2020-07-29 18:45:12 +02:00
Ines Montani
158d8c1e48 Update docs [ci skip] 2020-07-29 18:44:10 +02:00
Matthew Honnibal
f7adc9d3b7 Start rewriting vectors docs 2020-07-29 17:10:06 +02:00
Ines Montani
b0f57a0cac Update docs and consistency 2020-07-29 15:14:07 +02:00
Ines Montani
e0ffe36e79 Update docstrings, docs and types 2020-07-29 11:36:42 +02:00
Adriane Boyd
7a6ac47dc1 Remove keyword-only from Scorer API docs 2020-07-29 10:40:30 +02:00
Ines Montani
ac24adec73 Small adjustments to Scorer and docs 2020-07-28 21:39:42 +02:00
Ines Montani
256b24b720 Update arch docs WIP [ci skip] 2020-07-28 20:33:52 +02:00
Ines Montani
ae4d8a6ffd Update docstrings, docs and pipe consistency 2020-07-28 13:37:31 +02:00
Ines Montani
0094cb0d04 Remove scores list from config and document 2020-07-28 11:22:24 +02:00
Ines Montani
9b704c3db3
Merge pull request #5819 from explosion/feature/component-scores 2020-07-28 10:40:56 +02:00
Ines Montani
2f83848b1f Fix title [ci skip] 2020-07-27 18:25:38 +02:00
Ines Montani
894e20c466 Merge branch 'develop' into feature/component-scores 2020-07-27 18:14:39 +02:00
Ines Montani
d8b519c23c API docs, docstrings and argument consistency 2020-07-27 18:11:45 +02:00
Ines Montani
10b84e1e27 Add flag to toggle sdist creation on package [ci skip] 2020-07-27 16:52:23 +02:00
Adriane Boyd
fdf09cb231 Update Scorer API docs for score_cats 2020-07-27 15:34:42 +02:00
Adriane Boyd
2880d8a555
Normalize spelling for spaCy (#5822) 2020-07-27 10:09:33 +02:00
Martino Mensio
2f6b8132ef
Sentence transformers added to spaCy universe (#5814)
* fix details for spacy-universal-sentence-encoder

* added sentence-transformers
2020-07-27 09:44:33 +02:00
Nipun Sadvilkar
a66ad89fcb
✏️ typo in pysbd code example (#5821) 2020-07-27 09:43:39 +02:00
Ines Montani
7dd53d0964 Fix typo [ci skip] 2020-07-27 00:34:00 +02:00
Ines Montani
7adbaf9a5b Update docs [ci skip] 2020-07-27 00:29:45 +02:00
Matthew Honnibal
fb5dbe30b5 Trim training 101 2020-07-26 13:43:22 +02:00
Matthew Honnibal
e6a7deb7cc Edits to the training 101 section 2020-07-26 13:42:08 +02:00
Ines Montani
c288dba8e7 Update docs [ci skip] 2020-07-25 18:51:12 +02:00
Ines Montani
eb9acae34d
Merge pull request #5791 from adrianeboyd/docs/morphology 2020-07-25 15:10:21 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file (#5810)
* fix the wrong hash url in adding-languages.md file

change the #101 url hash path to #language-data

* filled in the spaCy Contributor Agreement 

filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Adriane Boyd
2bcceb80c4
Refactor the Scorer to improve flexibility (#5731)
* Refactor the Scorer to improve flexibility

Refactor the `Scorer` to improve flexibility for arbitrary pipeline
components.

* Individual pipeline components provide their own `evaluate` methods
that score a list of `Example`s and return a dictionary of scores
* `Scorer` is initialized either:
  * with a provided pipeline containing components to be scored
  * with a default pipeline containing the built-in statistical
    components (senter, tagger, morphologizer, parser, ner)
* `Scorer.score` evaluates a list of `Example`s and returns a dictionary
of scores referring to the scores provided by the components in the
pipeline

Significant differences:

* `tags_acc` is renamed to `tag_acc` to be consistent with `token_acc`
and the new `morph_acc`, `pos_acc`, and `lemma_acc`
* Scoring is no longer cumulative: `Scorer.score` scores a list of
examples rather than a single example and does not retain any state
about previously scored examples
* PRF values in the returned scores are no longer multiplied by 100

* Add kwargs to Morphologizer.evaluate

* Create generalized scoring methods in Scorer

* Generalized static scoring methods are added to `Scorer`
  * Methods require an attribute (either on Token or Doc) that is
used to key the returned scores

Naming differences:

* `uas`, `las`, and `las_per_type` in the scores dict are renamed to
`dep_uas`, `dep_las`, and `dep_las_per_type`

Scoring differences:

* `Doc.sents` is now scored as spans rather than on sentence-initial
token positions so that `Doc.sents` and `Doc.ents` can be scored with
the same method (this lowers scores since a single incorrect sentence
start results in two incorrect spans)

* Simplify / extend hasattr check for eval method

* Add hasattr check to tokenizer scoring
* Simplify to hasattr check for component scoring

* Reset Example alignment if docs are set

Reset the Example alignment if either doc is set in case the
tokenization has changed.

* Add PRF tokenization scoring for tokens as spans

Add PRF scores for tokens as character spans. The scores are:

* token_acc: # correct tokens / # gold tokens
* token_p/r/f: PRF for (token.idx, token.idx + len(token))

* Add docstring to Scorer.score_tokenization

* Rename component.evaluate() to component.score()

* Update Scorer API docs

* Update scoring for positive_label in textcat

* Fix TextCategorizer.score kwargs

* Update Language.evaluate docs

* Update score names in default config
2020-07-25 12:53:02 +02:00
Adriane Boyd
41525901ef Move MorphAnalysis to Other section 2020-07-23 08:58:22 +02:00
Adriane Boyd
8f44584bef Update MorphAnalysis.get and related examples 2020-07-23 08:51:31 +02:00
Adriane Boyd
941b9e33f7 Add Token.morph_ 2020-07-22 17:59:45 +02:00
Ines Montani
be476e495e
Merge pull request #5787 from adrianeboyd/docs/morphologizer
Initial draft of Morphologizer API docs
2020-07-22 17:16:57 +02:00
Adriane Boyd
d3385f4be2 Add Morphology and MorphAnalysis to overview 2020-07-21 13:06:22 +02:00
Adriane Boyd
fcd3a4abe3 Add morph to Token API docs 2020-07-21 13:05:58 +02:00
Adriane Boyd
14df00ae98 Add Morphology and MorphAnalsysis API docs
Add initial draft of `Morphology` and `MorphAnalysis` API docs.
2020-07-21 10:33:46 +02:00
Ines Montani
644074b954 Merge branch 'develop' into master-tmp 2020-07-20 14:58:04 +02:00
Adriane Boyd
986f7e4d69 Initial draft of Morphologizer API docs 2020-07-20 12:53:02 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe (#5777)
* Update universe.json

Add cov-bsv to "resources"

* Update universe.json

* add contributor agreement
2020-07-19 13:35:31 +02:00
Adriane Boyd
39ebcd9ec9
Refactor Chinese tokenizer configuration (#5736)
* Refactor Chinese tokenizer configuration

Refactor `ChineseTokenizer` configuration so that it uses a single
`segmenter` setting to choose between character segmentation, jieba, and
pkuseg.

* replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting
`segmenter` with the supported values: `char`, `jieba`, `pkuseg`
* make the default segmenter plain character segmentation `char` (no
additional libraries required)

* Fix Chinese serialization test to use char default

* Warn if attempting to customize other segmenter

Add a warning if `Chinese.pkuseg_update_user_dict` is called when
another segmenter is selected.
2020-07-19 13:34:37 +02:00
Adriane Boyd
cd5af72c9a
Update pkuseg version (#5774)
* Update pkuseg version in Chinese tokenizer warnings
* Update pkuseg version in `Makefile`
* Remove warning about python3.8 wheels in docs
2020-07-19 11:09:49 +02:00
Ines Montani
68fade8f76 Add Plausible [ci skip] 2020-07-19 00:02:29 +02:00
Ines Montani
6f4e4aceb3 Add Plausible [ci skip] 2020-07-18 23:50:29 +02:00
Ines Montani
872938ec76
Merge pull request #5747 from explosion/feature/refactor-config-args 2020-07-14 00:00:22 +02:00
Ines Montani
5f6f4ff594 Remove object subclassing 2020-07-12 14:03:23 +02:00
Ines Montani
c96535e338 Update command docstrings and docs 2020-07-12 13:53:49 +02:00
Ines Montani
3f948b9c74 Update docs 2020-07-12 12:32:28 +02:00
Ines Montani
11bbc82c24 Update cli.md [ci skip] 2020-07-10 23:37:52 +02:00
Ines Montani
9455b060d2 Update cli.md 2020-07-10 22:57:22 +02:00
Ines Montani
7b5717cac3 Merge branch 'develop' into feature/refactor-config-args 2020-07-10 22:50:07 +02:00
Ines Montani
e6a6587a9a Update projects.md [ci skip] 2020-07-10 22:41:27 +02:00
Ines Montani
f2cd982e7b Update training.md 2020-07-10 22:34:27 +02:00
Ines Montani
52e9b5b472 Fix formatting 2020-07-09 23:25:58 +02:00
Ines Montani
28cdae898a Update projects.md 2020-07-09 22:35:54 +02:00
Ines Montani
7bcf9f7cfb Document new features 2020-07-09 21:10:36 +02:00
Ines Montani
ea01831f6a Update projects docs etc. 2020-07-09 19:43:25 +02:00
Ines Montani
175d34d8f9 Update sidebar menu 2020-07-09 11:44:09 +02:00
Ines Montani
9ee5b71412 Update cli.md 2020-07-09 11:44:00 +02:00
Ines Montani
9ae4040183 Update API docs 2020-07-08 13:34:35 +02:00
svlandeg
c94279ac1b remove tensors, fix predict, get_loss and set_annotations 2020-07-08 13:11:54 +02:00
svlandeg
90b100c39f remove component.Model, update constructor, losses is return value of update 2020-07-08 12:14:30 +02:00
gandersen101
893133873d Fix quote issue in spaczz universe.json 2020-07-07 19:16:28 -05:00
Ines Montani
109849bd31 Fix and update universe.json [ci skip] 2020-07-07 21:12:28 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json (#5717)
* Adding spaczz package to universe.json

* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json (#5716)
* Add texthero to universe.json

* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Ines Montani
2298e129e6 Update example and training docs 2020-07-07 20:30:12 +02:00
svlandeg
2b60e894cb fix component constructors, update, begin_training, reference to GoldParse 2020-07-07 19:17:19 +02:00
svlandeg
14a796e3f9 add Example API with examples of Example usage 2020-07-07 14:46:41 +02:00
Ines Montani
bb3ee38cf9 Update WIP 2020-07-06 22:22:37 +02:00
Ines Montani
44da24ddd0 Update doc.md 2020-07-06 18:17:00 +02:00
Ines Montani
44790c1c32 Update docs and add keyword-only tag 2020-07-06 18:14:57 +02:00
Ines Montani
a35236e5f0 Update v3 docs WIP [ci skip] 2020-07-06 15:57:44 +02:00
Ines Montani
63247cbe87 Update v3 docs [ci skip] 2020-07-05 16:11:16 +02:00
Matthew Honnibal
3e78e82a83
Experimental character-based pretraining (#5700)
* Use cosine loss in Cloze multitask

* Fix char_embed for gpu

* Call resume_training for base model in train CLI

* Fix bilstm_depth default in pretrain command

* Implement character-based pretraining objective

* Use chars loss in ClozeMultitask

* Add method to decode predicted characters

* Fix number characters

* Rescale gradients for mlm

* Fix char embed+vectors in ml

* Fix pipes

* Fix pretrain args

* Move get_characters_loss

* Fix import

* Fix import

* Mention characters loss option in pretrain

* Remove broken 'self attention' option in pretrain

* Revert "Remove broken 'self attention' option in pretrain"

This reverts commit 56b820f6af.

* Document 'characters' objective of pretrain
2020-07-05 15:48:39 +02:00
Ines Montani
dc8c9d912f Update docs [ci skip] 2020-07-04 16:47:24 +02:00
Ines Montani
4498dfe99d Update docs 2020-07-04 16:25:30 +02:00
Ines Montani
1e0d54edd1 Update docs 2020-07-04 14:23:10 +02:00
Ines Montani
fe224dc2dd Merge branch 'develop' into nightly.spacy.io 2020-07-03 16:48:27 +02:00
Ines Montani
06f1ecb308 Update v3 docs 2020-07-03 16:48:21 +02:00
Ines Montani
cdf9ee1716 Add stub for Example API docs [ci skip] 2020-07-03 15:46:10 +02:00
Ines Montani
fa8e097c04 Update convert docs [ci skip] 2020-07-03 15:42:04 +02:00
Jan Jessewitsch
e4dcac4a4b
Merging multiple docs into one (#5032)
* Add static method to Doc to allow merging of multiple docs.

* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().

* Add test for Doc.from_docs() implementation.

* Fix using numpy's concatenate in Doc.from_docs.

* Replace typing's type annotations in from_docs.

* Simply remove type annotations in from_docs.

* Add documentation for Doc.from_docs to api.

* Simplify from_docs, its test and the api doc for codebase consistency.

* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.

* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.

* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.

* Add MORPH to attrs

* Update warnings calls

* Remove out-dated error from merge

* Rename space_delimiter to ensure_whitespace

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
Adriane Boyd
a723fa02a1
DocBin: add version number, missing attributes and strings (#5685)
* Add version number to DocBin

Add a version number to DocBin for future use.

* Add POS to all attributes in DocBin

* Add morph string to strings in DocBin

* Update DocBin API

* Add string for ENT_KB_ID in DocBin
2020-07-02 17:41:50 +02:00
Ines Montani
b5268955d7 Update matcher usage examples [ci skip] 2020-07-02 15:39:45 +02:00
Ines Montani
a4cfe9fc33 Remove inline notes on v2 changes [ci skip] 2020-07-01 22:29:22 +02:00
Ines Montani
3dff412f58 Merge branch 'nightly.spacy.io' into develop [ci skip] 2020-07-01 21:33:47 +02:00
Ines Montani
fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
Ines Montani
38f226bda8 Update images [ci skip] 2020-07-01 15:33:54 +02:00
Ines Montani
6e28760316 Fix 404 [ci skip] 2020-07-01 15:02:55 +02:00
Ines Montani
7037512e55 Handle robots.txt for nightly/special deploys [ci skip] 2020-07-01 14:50:58 +02:00
Ines Montani
997f6eeca7 Adjust nightly site url [ci skip] 2020-07-01 14:42:59 +02:00
Ines Montani
e1eb48e932 Add nightly social image [ci skip] 2020-07-01 14:41:13 +02:00
Ines Montani
5d02f71653 Add nightly favicon and Binder [ci skip] 2020-07-01 14:33:33 +02:00
Ines Montani
dc6d9c2fac Auto-infer nightly state from branch 2020-07-01 14:05:11 +02:00
Ines Montani
02334aeafc Make alert more prominent 2020-07-01 13:25:13 +02:00
Ines Montani
a0204e7d9a Revert change for now 2020-07-01 13:15:34 +02:00
Ines Montani
53ffee91b4 Experiment with hiding "new" tags 2020-07-01 13:11:00 +02:00
Ines Montani
f9a56a6993 Update site to support nightly mode 2020-07-01 13:03:04 +02:00
Ines Montani
5e24b8d481 Set to nightly 2020-07-01 13:02:30 +02:00
Ines Montani
26df4efa94 Add new in v3.0 2020-07-01 13:02:17 +02:00
Ines Montani
18a900abc2 Fix markup 2020-07-01 13:02:07 +02:00
Ines Montani
414dc7ace1 Merge branch 'spacy.io' into spacy.io-develop 2020-07-01 11:47:47 +02:00
Álvaro Abella Bascarán
7111b9de2e Fix in docs: pipe(docs) instead of pipe(texts) (#5680)
Very minor fix in docs, specifically in this part:

```
 matcher = PhraseMatcher(nlp.vocab)
>   for doc in matcher.pipe(texts, batch_size=50):
>       pass
```

`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:01:12 +02:00
Álvaro Abella Bascarán
ff0dbe5c64
Fix in docs: pipe(docs) instead of pipe(texts) (#5680)
Very minor fix in docs, specifically in this part:

```
 matcher = PhraseMatcher(nlp.vocab)
>   for doc in matcher.pipe(texts, batch_size=50):
>       pass
```

`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
2020-06-30 20:00:50 +02:00
Matthias Hertel
305221f3e5 Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:55 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:23 +02:00
Adriane Boyd
d777d9cc38 Extend v2.3 migration guide (#5653)
* Extend preloaded vocab section

* Add section on tag maps
2020-06-26 14:13:01 +02:00
Adriane Boyd
c4d0209472
Extend v2.3 migration guide (#5653)
* Extend preloaded vocab section

* Add section on tag maps
2020-06-26 14:12:29 +02:00
Adriane Boyd
a2660bd9c6 Fix backslashes in warnings config diff (#5640)
Fix backslashes in warnings config diff in v2.3 migration section.
2020-06-24 10:26:57 +02:00
Adriane Boyd
fd4287c178
Fix backslashes in warnings config diff (#5640)
Fix backslashes in warnings config diff in v2.3 migration section.
2020-06-24 10:26:12 +02:00
Adriane Boyd
4f73ced914 Extend what's new in v2.3 with vocab / is_oov (#5635) 2020-06-23 16:50:43 +02:00
Adriane Boyd
7ce451c211
Extend what's new in v2.3 with vocab / is_oov (#5635) 2020-06-23 16:48:59 +02:00
Adriane Boyd
fcdecefacf Add warnings example in v2.3 migration guide (#5627) 2020-06-22 14:38:06 +02:00
Adriane Boyd
bc1cb30b21
Add warnings example in v2.3 migration guide (#5627) 2020-06-22 14:37:24 +02:00
Ines Montani
52728d8fa3 Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
Adriane Boyd
66889de166 Warning for sudachipy 0.4.5 (#5611) 2020-06-19 13:45:23 +02:00
Adriane Boyd
931d80de72
Warning for sudachipy 0.4.5 (#5611) 2020-06-19 12:43:41 +02:00
Ines Montani
959bc616dd Merge branch 'master' into spacy.io 2020-06-16 22:50:11 +02:00
Ines Montani
6d712f3e06
Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor 2020-06-16 13:49:25 -07:00
Adriane Boyd
02369f91d3 Fix spacy convert argument 2020-06-16 20:41:17 +02:00
Adriane Boyd
f0fd77648f Change example title to Dr.
Change example title to Dr. so the current model does exclude the title
in the initial example.
2020-06-16 20:36:21 +02:00
Adriane Boyd
a6abdfbc3c Fix numpy.zeros() dtype for Doc.from_array 2020-06-16 20:35:45 +02:00
Adriane Boyd
9aff317ca7 Update POS in tagging example 2020-06-16 20:26:57 +02:00
Adriane Boyd
457babfa0c Update alignment example for new gold.align 2020-06-16 20:22:03 +02:00
Ines Montani
19b9ea0436 Fix languages.json 2020-06-16 18:34:11 +02:00
Ines Montani
ed240458f6 Try and upgrade gatsby 2020-06-16 18:28:24 +02:00
Ines Montani
41003a5117 Update Binder version [ci skip] 2020-06-16 17:41:23 +02:00
Ines Montani
fd89f44c0c Update Binder URL [ci skip] 2020-06-16 17:34:26 +02:00
Ines Montani
44af53bdd9 Add pkuseg warnings and auto-format [ci skip] 2020-06-16 17:13:35 +02:00
Ines Montani
a9e5b840ee Fix typos and auto-format [ci skip] 2020-06-16 16:38:45 +02:00
Ines Montani
e9d3e177f0 Merge branch 'master' into v2.3.x 2020-06-16 16:31:38 +02:00
Ines Montani
bb54f54369 Fix model accuracy table [ci skip] 2020-06-16 16:10:12 +02:00
Adriane Boyd
d5110ffbf2
Documentation updates for v2.3.0 (#5593)
* Update website models for v2.3.0

* Add docs for Chinese word segmentation

* Tighten up Chinese docs section

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Auto-format and update version

* Update matcher.md

* Update languages and sorting

* Typo in landing page

* Infobox about token_match behavior

* Add meta and basic docs for Japanese

* POS -> TAG in models table

* Add info about lookups for normalization

* Updates to API docs for v2.3

* Update adding norm exceptions for adding languages

* Add --omit-extra-lookups to CLI API docs

* Add initial draft of "What's New in v2.3"

* Add new in v2.3 tags to Chinese and Japanese sections

* Add tokenizer to migration section

* Add new in v2.3 flags to init-model

* Typo

* More what's new in v2.3

Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
Martino Mensio
de00f967ce
adding spacy-universal-sentence-encoder (#5534)
* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
2020-06-08 20:26:30 +02:00
Sofie Van Landeghem
4d1ba6feb4
add tag variant for 2.3 (#5542) 2020-06-04 19:16:33 +02:00
Ines Montani
810fce3bb1 Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
svlandeg
5f0a91cf37 fix conv-depth parameter 2020-05-29 09:56:29 +02:00
Rajat
8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Ines Montani
262d306eaa unicode -> str consistency 2020-05-24 17:23:00 +02:00
Ines Montani
5d3806e059 unicode -> str consistency 2020-05-24 17:20:58 +02:00
Sofie Van Landeghem
ae1c179f3a
Remove the nested quote 2020-05-23 17:58:19 +02:00
Jannis
aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00
Matthew Honnibal
f6078d866a
Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match
Revert token_match priority changes from #4374 and extend token match options
2020-05-22 14:42:51 +02:00
Ines Montani
65c7e82de2 Auto-format and remove 2.3 feature [ci skip] 2020-05-22 13:50:30 +02:00
Adriane Boyd
e4a1b5dab1 Rename to url_match
Rename to `url_match` and update docs.
2020-05-22 12:41:03 +02:00
Adriane Boyd
730fa493a4 Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-22 12:18:00 +02:00
Ines Montani
ee027de032 Update universe and display of videos [ci skip] 2020-05-21 21:54:23 +02:00
Ines Montani
53da6bd672 Add course to landing [ci skip] 2020-05-21 20:45:33 +02:00
Ines Montani
24f72c669c Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
Kevin Lu
c7c4cd5fe1
Changed pyate code example in universe.json 2020-05-20 09:11:32 -07:00
Kevin Lu
0a5b140235
Update universe.json 2020-05-19 20:12:21 -07:00
Sofie Van Landeghem
0d94737857
Feature toggle_pipes (#5378)
* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-18 22:27:10 +02:00
Ines Montani
f333c2a011
Merge pull request #5386 from svlandeg/fix/nel-docs 2020-05-10 12:00:09 +02:00
Travis Hoppe
d4cc18b746
Added author information for NLPre (#5414)
* Add author links for NLPre and update category

* Add contributor statement
2020-05-08 11:28:54 +02:00
adrianeboyd
4a15b559ba
Clarify Token.pos as UPOS (#5419) 2020-05-08 10:36:25 +02:00
adrianeboyd
a2345618f1
Fix Token API docs from #5375 (#5418) 2020-05-08 10:25:02 +02:00
Adriane Boyd
565e0eef73 Add tokenizer option for token match with affixes
To fix the slow tokenizer URL (#4374) and allow `token_match` to take
priority over prefixes and suffixes by default, introduce a new
tokenizer option for a token match pattern that's applied after prefixes
and suffixes but before infixes.
2020-05-05 10:35:33 +02:00
Adriane Boyd
792c8af8cf Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-05 09:25:57 +02:00
svlandeg
ebaed7dcfa Few more updates to the EL documentation 2020-04-30 10:17:06 +02:00
adrianeboyd
bdff76dede
Various updates/additions to CLI scripts (#5362)
* `debug-data`: determine coverage of provided vectors

* `evaluate`: support `blank:lg` model to make it possible to just evaluate
tokenization

* `init-model`: add option to truncate vectors to N most frequent vectors
from word2vec file

* `train`:

  * if training on GPU, only run evaluation/timing on CPU in the first
    iteration

  * if training is aborted, exit with a non-0 exit status
2020-04-29 12:56:46 +02:00
Sofie Van Landeghem
cfdaf99b80
Fix passing of component configuration (#5374)
* add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument

* add fix and test for Issue 5137
2020-04-29 12:56:17 +02:00
Ines Montani
63885c1836 Remove u string and auto-format [ci skip] 2020-04-29 12:54:57 +02:00
Sofie Van Landeghem
f67343295d
Update NEL examples and documentation (#5370)
* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
2020-04-29 12:53:53 +02:00
adrianeboyd
a6e521cd79
Add is_sent_end token property (#5375)
Reconstruction of the original PR #4697 by @MiniLau.

Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Ines Montani
a77754120d
Merge pull request #5177 from nlptechbook/patch-5 2020-04-29 12:52:21 +02:00
Ines Montani
1cbb272a6b
Update website/meta/universe.json 2020-04-29 12:51:44 +02:00
Ines Montani
732629b0dd
Update website/meta/universe.json 2020-04-29 12:51:37 +02:00
adrianeboyd
90ce34db42
Add cuda101 and cuda102 options to setup (#5377)
* Add cuda101 and cuda102 options to setup

* Update cudaNNN options in docs
2020-04-29 12:51:12 +02:00
Louis Guitton
a27c4014f5
Add mlflow to spaCy universe (#5352)
* Add mlflow to universe

* Use mlflow black logo
2020-04-29 10:18:03 +02:00
adrianeboyd
792aa7b6ab
Remove references to textcat spans (#5360)
Remove references to unimplemented `TextCategorizer` span labels in
`GoldParse` and `Doc`.
2020-04-27 18:01:12 +02:00
adrianeboyd
90c754024f
Update nlp.vectors to nlp.vocab.vectors (#5357) 2020-04-27 10:53:05 +02:00
Mike
481574cbc8
[minor doc change] embedding vis. link is broken in website/docs/usage/examples.md (#5325)
* The embedding vis. link is broken

The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?

* contributor agreement

* Update Mlawrence95.md

* Update website/docs/usage/examples.md

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-21 20:35:12 +02:00
Ines Montani
b919844fce
Tidy up and fix alignment of landing cards (#5317) 2020-04-20 20:33:13 +02:00
laszabine
fb73d4943a
Amend documentation to Language.evaluate (#5319)
* Specified usage of arguments to Language.evaluate

* Created contributor agreement
2020-04-16 20:00:18 +02:00
Sébastien Harinck
688a328668 docs(website): fix issue on example in spacy-lookup 2020-04-15 16:47:29 +02:00
Thomas Thiebaud
1eef60c658
Add spacy_fastlang to universe (#5271)
* Add spacy_fastlang to universe

* Sign SCA
2020-04-15 13:50:46 +02:00
Sofie Van Landeghem
a3965ec13d
tag-map-path since 2.2.4 instead of 2.2.3 (#5289) 2020-04-14 14:53:47 +02:00
Marek Grzenkowicz
6a8a52650f
[Closes #5292] Fix typo in option name "--n-save_every" (#5293)
* Sign contributor agreement for chopeen

* Fix typo in option name and close #5292
2020-04-11 23:35:01 +02:00
Sofie Van Landeghem
7ad0fcf01d
fix json (#5267) 2020-04-08 12:58:09 +02:00
vincent d warmerdam
f329d5663a
add "whatlies" to spaCy universe (#5252)
* Add "whatlies"

We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)

* sign contributor thing

* Added fancy gif

as the image

* Update universe.json

Spellin error and spaCy clarification.
2020-04-06 11:29:30 +02:00
nlptechbook
ddf3c2430d
Update universe.json 2020-04-03 12:10:03 -04:00
Sofie Van Landeghem
1137420840
Small doc fixes (#5250)
* fix link

* torchtext instead tochtext
2020-04-03 13:01:43 +02:00
Nikhil Saldanha
d1ddfa1cb7 update docs for EntityRecognizer.predict
return type was wrongly written as a tuple, changed to syntax.StateClass
2020-03-28 18:13:02 +01:00
Sofie Van Landeghem
9b412516e7
Fixing pickling of the parser (#5218)
* fix __reduce__ for pickling parser

* setting the move object as 'state' during pickling

* unskip test_issue4725 - works again
2020-03-27 19:35:26 +01:00