Commit Graph

14413 Commits

Author SHA1 Message Date
Sofie Van Landeghem
59c2069eb1
Legacy docs (#7601)
* document legacy Tok2Vec architectures

* add TextCatEnsemble.v1 legacy documentation

* Separate legacy section in side bar
2021-03-30 12:43:14 +02:00
Adriane Boyd
348d1829c7
Preserve user data for DependencyMatcher on spans (#7528)
* Preserve user data for DependencyMatcher on spans

* Clean underscore in test

* Modify test to use extensions stored in user data
2021-03-30 12:26:22 +02:00
m0canu1
921feee092
Added more exception to the italian language from https://forum.wordr… (#7246)
* Added more exception to the italian language from https://forum.wordreference.com/threads/le-abbreviazioni-nella-lingua-italiana-abbreviations-in-italian.2464189/

* Remove unnecessary exception

Co-authored-by: Alexandru Mocanu <alexandru.mocanu@augeos.it>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-03-30 10:23:32 +02:00
Adriane Boyd
27a48f2802
Fix/update extension copying in Span.as_doc and Doc.from_docs (#7574)
* Adjust custom extension data when copying user data in `Span.as_doc()`
* Restrict `Doc.from_docs()` to adjusting offsets for custom extension
data
  * Update test to use extension
  * (Duplicate bug fix for character offset from #7497)
2021-03-30 09:49:12 +02:00
Santiago Castro
af07fc3bc1
Add support for CUDA 11.2 (#7583)
* Add support for CUDA 11.2

* Update the docs

* Format

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán
5b4dde38a3
fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606) 2021-03-30 09:45:49 +02:00
Adriane Boyd
3ae8661085
Fix tensor retokenization for non-numpy ops (#7527)
Implement manual `append` and `delete` for non-numpy ops.
2021-03-29 22:34:48 +11:00
Adriane Boyd
139f655f34
Merge doc.spans in Doc.from_docs() (#7497)
Merge data from `doc.spans` in `Doc.from_docs()`.

* Fix internal character offset set when merging empty docs (only
affects tokens and spans in `user_data` if an empty doc is in the list
of docs)
2021-03-29 22:34:01 +11:00
Adriane Boyd
d59f968d08
Keep sent starts without parse in retokenization (#7424)
In the retokenizer, only reset sent starts (with
`set_children_from_head`) if the doc is parsed. If there is no parse,
merged tokens have the unset `token.is_sent_start == None` by default after
retokenization.
2021-03-29 22:32:00 +11:00
Paul O'Leary McCann
faed54d659
Merge pull request #7537 from polm/docs/patience-negative
Remove mention of -1 for early stopping (fix #7535)
2021-03-26 21:11:53 +09:00
Paul O'Leary McCann
cdab341a75 Remove mention of -1 for early stopping (fix #7535)
Maybe this used to work differently, but currently a negative patience
just causes immediate termination.
2021-03-23 11:50:35 +09:00
Ines Montani
4bd3d01aaf
Merge pull request #7471 from polm/fix/listener-warnings 2021-03-22 12:45:02 +01:00
Ines Montani
d545ab4ca4
Merge pull request #7495 from adrianeboyd/bugfix/norm-ux
Update lexeme_norm checks
2021-03-22 12:44:52 +01:00
Ines Montani
be55f43163
Merge pull request #7473 from adrianeboyd/docs/v3-pipeline-deps-order 2021-03-22 12:43:07 +01:00
Ines Montani
3ee2fcfba0
Merge pull request #7483 from adrianeboyd/docs/various-v3-4 [ci skip] 2021-03-22 12:37:06 +01:00
Ines Montani
88e5a0dc16
Merge pull request #7504 from polm/fix/lexeme-docs [ci skip]
Fix mismatched backtick in Lexeme docs
2021-03-22 12:36:44 +01:00
Ines Montani
66ebd5c69e
Merge pull request #7491 from adrianeboyd/bugfix/corpus-depr-props
Update deprecated doc.is_sentenced in Corpus
2021-03-21 02:17:24 +01:00
Ines Montani
e3c3dbdb15
Merge pull request #7492 from adrianeboyd/bugfix/ux-matcher-attributes
Update matcher errors and docs
2021-03-21 02:17:13 +01:00
Adriane Boyd
0d2b723e8d Update entity setting section 2021-03-20 11:38:55 +01:00
Paul O'Leary McCann
e39c0dcf33 Fix mismatched backtick in Lexeme docs 2021-03-20 18:40:00 +09:00
Adriane Boyd
39153ef90f Update lexeme_norm checks
* Add util method for check
* Add new languages to list with lexeme norm tables
* Add check to all relevant components
* Add config details to warning message

Note that we're not actually inspecting the model config to see if
`NORM` is used as an attribute, so it may warn in cases where it's not
relevant.
2021-03-19 10:59:27 +01:00
Adriane Boyd
c771ec22f0 Update matcher errors and docs
* Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for
`Matcher` and `PhraseMatcher`
* Document `Matcher.__call__(allow_missing=)`
2021-03-19 10:11:18 +01:00
Adriane Boyd
48b90c8e1c Update deprecated doc.is_sentenced in Corpus 2021-03-19 09:43:52 +01:00
Adriane Boyd
6a9a467766
Update website/docs/usage/processing-pipelines.md
Co-authored-by: Ines Montani <ines@ines.io>
2021-03-19 08:12:49 +01:00
Ines Montani
34e13c1161
Merge pull request #7472 from erre-quadro/universe/spikex
Add SpikeX to spaCy universe
2021-03-19 02:08:36 +01:00
Ines Montani
4f9aaa2366
Merge pull request #7451 from adrianeboyd/chore/add-py.typed
Add py.typed
2021-03-19 02:08:16 +01:00
Ines Montani
66b900a76d
Merge pull request #7440 from adrianeboyd/bugfix/ru-pymorph2-lookup-lemmatize
Rename and update Russian pymorphy2 lookup lemmatize
2021-03-19 01:54:08 +01:00
Ines Montani
2c6fa8c890
Merge pull request #7489 from adrianeboyd/bugfix/callbacks-entry-points
Check for callbacks entry points
2021-03-19 01:53:53 +01:00
Ines Montani
b878bc74b9
Merge pull request #7488 from Findus23/no-is-not
replace "is not" with !=
2021-03-19 01:53:38 +01:00
Adriane Boyd
0ad9e16ec3 Check for callbacks entry points 2021-03-18 21:18:25 +01:00
Lukas Winkler
3c362ac520
replace "is not" with != 2021-03-18 21:09:11 +01:00
Adriane Boyd
6354b642c5
Fix typo 2021-03-18 19:01:10 +01:00
Adriane Boyd
40e5d3a980 Update saving/loading example 2021-03-18 16:56:10 +01:00
Adriane Boyd
0fb1881f36 Reformat processing pipelines 2021-03-18 13:31:42 +01:00
Adriane Boyd
acc58719da Update custom similarity hooks example 2021-03-18 13:31:42 +01:00
Adriane Boyd
c9e1a9ac17 Add multiprocessing section 2021-03-18 13:31:42 +01:00
Adriane Boyd
9a254d3995 Include all en_core_web_sm components in examples 2021-03-18 13:31:42 +01:00
Adriane Boyd
83c1b919a7 Fix positional/option in CLI types 2021-03-18 13:31:42 +01:00
Adriane Boyd
9fd41d6742 Remove Language.pipe cleanup arg 2021-03-18 13:31:42 +01:00
Paul O'Leary McCann
40bc01e668 Proactively remove unused listeners
With this the changes in initialize.py might be unecessary.

Requires testing.
2021-03-17 22:41:41 +09:00
Adriane Boyd
5da323fd86
Minor edits 2021-03-17 12:59:05 +01:00
Adriane Boyd
a5ffe8dfed Add details about pretrained pipeline design 2021-03-17 11:31:26 +01:00
Paul O'Leary McCann
ef77c88638 Don't warn about components not in the pipeline
See here:

https://github.com/explosion/spaCy/discussions/7463

Still need to check if there are any side effects of listeners being
present but not in the pipeline, but this commit will silence the
warnings.
2021-03-17 14:56:04 +09:00
Paolo Arduin
00e59be966 Add SpikeX to spaCy universe 2021-03-16 18:22:03 +01:00
Adriane Boyd
02b5c8a1a2 Add py.typed 2021-03-16 09:48:31 +01:00
Adriane Boyd
3bcf74aca7 Rename and update ru pymorphy2 lookup lemmatize
* To allow default lookup lemmatization with a blank Russian model,
rename pymorphy2 lookup mode to `pymorphy2_lookup`

* Bug fix: update pymorphy2 lookup lemmatize to return list rather than
string
2021-03-15 11:11:06 +01:00
bsweileh
61472e7cb3
Update _training.md - Fix broken link on backpropagation (#7431)
* Update _training.md

Fix broken link on backpropagation

* Add agreement

add spacy contributor agreement
2021-03-15 09:21:35 +01:00
Ines Montani
be44257cab
Merge pull request #7418 from adrianeboyd/docs/examples-readme
Add examples README
2021-03-13 04:28:07 +01:00
Ines Montani
c67d5a6eb0
Merge pull request #7394 from adrianeboyd/docs/ner-example-data-readme 2021-03-13 04:26:18 +01:00
Ines Montani
068b97a617
Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only 2021-03-13 04:25:50 +01:00