Ryn Daniels
057b8c64c0
Check for assets with size of 0 bytes ( #10026 )
...
* Check for assets with size of 0 bytes
* Update spacy/cli/project/assets.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-01-12 10:34:23 +01:00
Sofie Van Landeghem
5ba4171b19
Update LICENSE to include 2022 [ci skip]
2022-01-07 09:24:07 +01:00
Ines Montani
005e23a525
Merge pull request #9989 from explosion/docs/update-algolia-search-api [ci skip]
2022-01-05 14:14:42 +01:00
Ines Montani
a437ca6737
Update website to use new Algolia search API
2022-01-05 13:21:06 +01:00
Sofie Van Landeghem
067a44a417
Merge pull request #9987 from explosion/master
...
Update develop with commits from master
2022-01-05 11:49:50 +01:00
Lj Miranda
00e7bf5ffd
Add a few docs to the default_config.cfg ( #9981 )
...
* Clarify patience hyperparameter
The current value for patience doesn't seem to indicate that it's
pointing to the number of steps. It may be useful to specify that
explicitly.
Ref: https://github.com/explosion/spaCy/discussions/7450
Ref: https://github.com/explosion/spaCy/discussions/7465
* Update docs for max_steps
2022-01-05 09:16:40 +01:00
Duygu Altinok
55cf492218
Feat/debug data warn spread ents ( #9960 )
...
* added check for crossing boundaries
* formatted blacked
* Rephrasing slightly
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-01-04 18:22:10 +01:00
Sofie Van Landeghem
56dcb39fb7
Fix references to config file in the docs & UX ( #9961 )
...
* doc fixes around config file
* fix typo
* clarify default
2022-01-04 14:31:26 +01:00
Sofie Van Landeghem
029a48e340
fix type of lexeme.rank ( #9979 )
2022-01-04 13:15:25 +01:00
Sam Edwardes
6f65e2b544
Added spacypdfreader to universe.json ( #9963 )
2022-01-03 16:34:36 +09:00
Richard Hudson
cc21eac88a
Use \n rather than linesep for consistency with wasabi
2021-12-29 13:33:56 +01:00
Richard Hudson
85da92f041
Ignore Windows carriage return characters
2021-12-29 12:16:45 +01:00
Paul O'Leary McCann
f40e237c5a
Remove denomme from universe ( #9952 )
...
Package seems to have been deleted.
2021-12-29 11:41:29 +01:00
Richard Hudson
f7f9cc72e7
Fixed supports_ansi problem for Windows tests
2021-12-29 11:22:48 +01:00
Florian Cäsar
86e71e7b19
Fix Scorer.score_cats for missing labels ( #9443 )
...
* Fix Scorer.score_cats for missing labels
* Add test case for Scorer.score_cats missing labels
* semantic nitpick
* black formatting
* adjust test to give different results depending on multi_label setting
* fix loss function according to whether or not missing values are supported
* add note to docs
* small fixes
* make mypy happy
* Update spacy/pipeline/textcat.py
Co-authored-by: Florian Cäsar <florian.caesar@pm.me>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2021-12-29 11:04:39 +01:00
Richard Hudson
264ead3274
Removed incorrect automatically added import statement
2021-12-29 10:11:48 +01:00
Sofie Van Landeghem
b8106e0f95
Merge pull request #9951 from explosion/master
...
Update develop branch with master
2021-12-29 10:11:43 +01:00
Richard Hudson
8e55efcbd9
Check SUPPORTS_ANSI when rendering
2021-12-29 09:30:35 +01:00
Richard Hudson
08370604d3
Change order of imports
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-12-29 09:22:06 +01:00
Richard Hudson
678bc61086
Apply suggestions from code review
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-12-29 09:21:23 +01:00
Richard Hudson
e3e8495b41
Updated requirements.txt
2021-12-29 08:47:56 +01:00
Yoav Vollansky
9d63dfacfc
Update UNIVERSE.md ( #9941 )
...
typo
2021-12-27 13:46:04 +01:00
Peter Baumgartner
72abf9e102
MultiHashEmbed vector docs correction ( #9918 )
2021-12-27 11:18:08 +01:00
Richard Hudson
92943f8a23
Removed unused import
2021-12-23 17:47:56 +01:00
Richard Hudson
2cae470180
More type corrections
2021-12-23 17:35:47 +01:00
Richard Hudson
106fb53509
More type corrections
2021-12-23 17:24:28 +01:00
Richard Hudson
5c850b2ac3
Corrected types
2021-12-23 17:01:43 +01:00
Richard Hudson
e713aa0938
Add surrounding tokens functionality
2021-12-23 16:13:40 +01:00
Duygu Altinok
7ec1452f5f
added ellided forms ( #9878 )
...
* added ellided forms
* rearranged a bit
* rearranged a bit
* added stopword tests
* blacked tests file
2021-12-23 13:41:01 +01:00
Andrew Janco
3cfeb518ee
Handle "_" value for token pos in conllu data ( #9903 )
...
* change '_' to '' to allow Token.pos, when no value for token pos in conllu data
* Minor code style
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-12-21 15:46:33 +01:00
Adriane Boyd
837d241b68
Make floret murmurhash endian-neutral ( #9735 )
2021-12-20 17:11:31 +01:00
Adriane Boyd
1163073756
Remove outdated patterns MANIFEST.in ( #9912 )
2021-12-20 16:40:20 +01:00
Adriane Boyd
18e5638af0
Extend cupy to v10.x ( #9911 )
...
* Add extra for `cupy-cuda115`
2021-12-20 15:48:35 +01:00
Sofie Van Landeghem
7847839003
Merge pull request #9891 from explosion/master
...
Update develop with master
2021-12-17 14:01:27 +01:00
Daniël de Kok
93e9bf681f
Merge pull request #9873 from danieldk/temporarily-pin-mypy
...
Pin mypy to 0.910 until there is a compatible pydantic version
2021-12-16 10:28:31 +01:00
Daniël de Kok
b08f1ac17d
Pin mypy to 0.910 until there is a compatible pydantic version
2021-12-16 09:31:45 +01:00
Adriane Boyd
94fbd88521
Use dict.copy().items() instead of list(.items()) ( #9868 )
2021-12-16 09:17:33 +01:00
Edward
018827e9fd
Add healthsea to universe ( #9838 )
...
* Add healthsea to universe
* Update website/meta/universe.json
* Add thumbnail
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-12-15 17:57:19 +01:00
antonpibm
ac45ae3779
Update Tokenizer documentation to reflect token_match and url_match signatures ( #9859 )
2021-12-15 09:34:33 +01:00
Ines Montani
ba0fa7a64e
Support Google Sheets embeds in docs ( #9861 )
2021-12-15 09:27:08 +01:00
Richard Hudson
ed788c5def
Add render_instances function
2021-12-08 19:24:32 +01:00
Richard Hudson
bd00611259
Add render_text
2021-12-08 17:47:29 +01:00
Richard Hudson
49f3fd39b9
Refactoring
2021-12-08 16:42:39 +01:00
Richard Hudson
183d535ef4
Add permitted values
2021-12-08 14:58:02 +01:00
Richard Hudson
9f7f234b0f
Added tabular view
2021-12-08 14:30:38 +01:00
Richard Hudson
e04950ef3c
Fixed problems with non-projective trees
2021-12-07 12:04:41 +01:00
Adriane Boyd
800737b416
Set version to v3.2.1 ( #9823 )
2021-12-07 10:51:45 +01:00
Haakon Meland Eriksen
251119455d
Remove NER words from stop words in Norwegian ( #9820 )
...
Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations.
Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data.
See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831
2021-12-07 09:45:10 +01:00
Adriane Boyd
51a3b60027
Document Tagger neg_prefix, fix typo ( #9821 )
2021-12-07 09:42:40 +01:00
Adriane Boyd
a0cdc2b007
Use Language.pipe in evaluate ( #9800 )
2021-12-06 20:39:15 +01:00