spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-11 03:01:25 +03:00

Author	SHA1	Message	Date
Adriane Boyd	f55b876326	Merge pull request #10387 from adrianeboyd/chore/v3.0.8 Set version to v3.0.8	2022-02-28 12:53:53 +01:00
Adriane Boyd	ebcc7d830f	Update slow readers test to use textcat_multilabel (#9300 )	2022-02-28 11:22:06 +01:00
Adriane Boyd	694c318f4f	Address random results in slow readers tests (#9544 ) * Set random seed for dataset shuffling * Use more dev examples for non-zero scores	2022-02-28 11:19:43 +01:00
Ines Montani	308b1706a7	Allow conftest.py to run twice for build envs	2022-02-28 09:22:34 +01:00
Adriane Boyd	3420506954	Set version to v3.0.8	2022-02-28 09:02:03 +01:00
Adriane Boyd	f71de10405	Merge pull request #10346 from adrianeboyd/chore/v3.0-backport-10324 Fix Tok2Vec for empty batches (#10324)	2022-02-21 16:41:13 +01:00
Adriane Boyd	5caccbd19e	Switch to latest CI images (#9773 )	2022-02-21 15:02:52 +01:00
Daniël de Kok	6a4a00c447	Pin mypy to 0.910 until there is a compatible pydantic version	2022-02-21 15:01:36 +01:00
Adriane Boyd	749631ad28	Fix Tok2Vec for empty batches (#10324 ) * Add test for tok2vec with vectors and empty docs * Add shortcut for empty batch in Tok2Vec.predict * Avoid types	2022-02-21 14:33:16 +01:00
Adriane Boyd	034ac0acf4	Merge pull request #8787 from adrianeboyd/chore/backport-v3.0.7 Backport bug fixes to v3.0.x	2021-07-21 16:53:50 +02:00
Adriane Boyd	02e18926c3	Revert "Backport bugfixes from v3.1.0 to v3.0 (#8739 )" (#8786 ) This reverts commit `f94168a41e`.	2021-07-21 15:32:37 +02:00
Adriane Boyd	f94168a41e	Backport bugfixes from v3.1.0 to v3.0 (#8739 ) * Fix scoring normalization (#7629) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup * Use a context manager when reading model (fix #7036) (#8244) * Fix other open calls without context managers (#8245) * Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246) * Don't add duplicate patterns (fix #8216) * Refactor EntityRuler init This simplifies the EntityRuler init code. This is helpful as prep for allowing the EntityRuler to reset itself. * Make EntityRuler.clear reset matchers Includes a new test for this. * Tidy PhraseMatcher instantiation Since the attr can be None safely now, the guard if is no longer required here. Also renamed the `_validate` attr. Maybe it's not needed? * Fix NER test * Add test to make sure patterns aren't increasing * Move test to regression tests * Exclude generated .cpp files from package (#8271) * Fix non-deterministic deduplication in Greek lemmatizer (#8421) * Fix setting empty entities in Example.from_dict (#8426) * Filter W036 for entity ruler, etc. (#8424) * Preserve paths.vectors/initialize.vectors setting in quickstart template * Various fixes for spans in Docs.from_docs (#8487) * Fix spans offsets if a doc ends in a single space and no space is inserted * Also include spans key in merged doc for empty spans lists * Fix duplicate spacy package CLI opts (#8551) Use `-c` for `--code` and not additionally for `--create-meta`, in line with the docs. * Raise an error for textcat with <2 labels (#8584) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs * Add Macedonian models to website (#8637) * Fix Azerbaijani init, extend lang init tests (#8656) * Extend langs in initialize tests * Fix az init * Fix ru/uk lemmatizer mp with spawn (#8657) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible. * Use 0-vector for OOV lexemes (#8639) * Set version to v3.0.7 Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2021-07-19 09:20:40 +02:00
Adriane Boyd	0080454140	Set version to v3.0.7	2021-07-16 16:38:15 +02:00
Adriane Boyd	6db938959d	Use 0-vector for OOV lexemes (#8639 )	2021-07-16 15:48:47 +02:00
Adriane Boyd	99a3f26d7f	Fix ru/uk lemmatizer mp with spawn (#8657 ) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible.	2021-07-16 15:48:47 +02:00
Adriane Boyd	c62566ffce	Fix Azerbaijani init, extend lang init tests (#8656 ) * Extend langs in initialize tests * Fix az init	2021-07-16 15:48:47 +02:00
Adriane Boyd	066718b1dc	Add Macedonian models to website (#8637 )	2021-07-16 15:48:47 +02:00
Adriane Boyd	81e71a61f8	Raise an error for textcat with <2 labels (#8584 ) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs	2021-07-16 15:48:42 +02:00
Adriane Boyd	6aa3fede76	Fix duplicate spacy package CLI opts (#8551 ) Use `-c` for `--code` and not additionally for `--create-meta`, in line with the docs.	2021-07-16 15:48:19 +02:00
Adriane Boyd	71396273a5	Various fixes for spans in Docs.from_docs (#8487 ) * Fix spans offsets if a doc ends in a single space and no space is inserted * Also include spans key in merged doc for empty spans lists	2021-07-16 15:48:19 +02:00
Adriane Boyd	e51fff5432	Preserve paths.vectors/initialize.vectors setting in quickstart template	2021-07-16 15:48:19 +02:00
Adriane Boyd	c78eb28dfa	Filter W036 for entity ruler, etc. (#8424 )	2021-07-16 15:48:19 +02:00
Adriane Boyd	e3f1d4a7d0	Fix setting empty entities in Example.from_dict (#8426 )	2021-07-16 15:48:19 +02:00
Adriane Boyd	81515b4690	Fix non-deterministic deduplication in Greek lemmatizer (#8421 )	2021-07-16 15:48:19 +02:00
Adriane Boyd	8b9355d758	Exclude generated .cpp files from package (#8271 )	2021-07-16 15:47:55 +02:00
Paul O'Leary McCann	ad026dc5fd	Don't add duplicate patterns all the time in EntityRuler (fix #8216 ) (#8246 ) * Don't add duplicate patterns (fix #8216) * Refactor EntityRuler init This simplifies the EntityRuler init code. This is helpful as prep for allowing the EntityRuler to reset itself. * Make EntityRuler.clear reset matchers Includes a new test for this. * Tidy PhraseMatcher instantiation Since the attr can be None safely now, the guard if is no longer required here. Also renamed the `_validate` attr. Maybe it's not needed? * Fix NER test * Add test to make sure patterns aren't increasing * Move test to regression tests	2021-07-16 15:47:55 +02:00
Paul O'Leary McCann	1db18732e0	Fix other open calls without context managers (#8245 )	2021-07-16 15:47:55 +02:00
Paul O'Leary McCann	a834b03216	Use a context manager when reading model (fix #7036 ) (#8244 )	2021-07-16 15:47:55 +02:00
Sofie Van Landeghem	55e5f8ede3	Fix scoring normalization (#7629 ) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup	2021-07-16 15:47:55 +02:00
Adriane Boyd	bb97e7bf8a	Update validate CLI to fix compat and ignore warnings (#8423 )	2021-07-14 23:28:08 +02:00
Adriane Boyd	480a3bf3be	Make JsonlReader path optional (#8396 ) To avoid config errors during training when `[corpora.pretrain.path]` is `None` with the default `spacy.JsonlCorpus.v1` reader, make the reader path optional, similar to `spacy.Corpus.v1`.	2021-06-15 14:55:15 +02:00
Paul O'Leary McCann	94e1346f44	Change span lemmas to use original whitespace (fix #8368 ) (#8391 ) * Change span lemmas to use original whitespace (fix #8368) This is a redo of #8371 based off master. The test for this required some changes to existing tests. I don't think the changes were significant but I'd like someone to check them. * Remove mystery docstring This sentence was uncompleted for years, and now we will never know how it ends.	2021-06-15 13:24:54 +02:00
Paul O'Leary McCann	2c105cdbce	Raise error if deps not provided with heads (#8335 ) * Fill in deps if not provided with heads Before this change, if heads were passed without deps they would be silently ignored, which could be confusing. See #8334. * Use "dep" instead of a blank string This is the customary placeholder dep. It might be better to show an error here instead though. * Throw error on heads without deps * Add a test * Fix tests * Formatting * Fix all tests * Fix a test I missed * Revise error message * Clean up whitespace Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-06-15 13:23:32 +02:00
Sofie Van Landeghem	0fd0d949c4	fix 's typo's across code base (#8384 )	2021-06-15 10:57:08 +02:00
Adriane Boyd	507422149f	Various docs updates for v3.0 (#8353 ) * Update cats score names in Scorer API docs * Refer to performance in meta * Update package naming/versions, lemmatizer details * Minor formatting fixes * Provide more explanation for cats_score_desc * Provide language-specific lemmatizer defaults in API docs Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2021-06-14 12:19:36 +02:00
Sofie Van Landeghem	8729307e67	register extract_ngrams layer (#8358 ) * register extract_ngrams layer * fix import * bump spacy-legacy to 3.0.6 * revert bump (wrong PR)	2021-06-14 10:30:30 +02:00
Ines Montani	3259faad42	Update YouTube embed [ci skip]	2021-06-14 10:21:01 +10:00
Ines Montani	7f0f674a1b	Fix universe.json and auto-format [ci skip]	2021-06-14 10:18:06 +10:00
Adriane Boyd	f4008bdb13	Restrict pymorphy2 requirement to pymorphy2 mode (#8299 ) For the Russian and Ukrainian lemmatizers, restrict the `pymorphy2` requirement to the mode `pymorphy2` so that lookup or other lemmatizer modes can be loaded without installing `pymorphy2`.	2021-06-11 10:19:22 +02:00
Francisco Aranda	0a1a4c665d	update spacy-wordnet code example (#8327 ) * update spacy-wordnet code example - include spaCy 2.x and 3.x init alternatives - upgrade recognai logo * fix escape chars	2021-06-10 21:53:11 +02:00
Adriane Boyd	6d2789452e	Restrict cython to <3.0 (#8337 )	2021-06-10 11:03:30 +02:00
Adriane Boyd	d52ab13b5f	Update CI: update ubuntu image, add download test (#8298 ) * Update CI: update ubuntu image, add download test * Switch instances to `ubuntu-18.04` * Add model download test, currently only for one job with python 3.8 * Fix variable name * Set variables explicitly	2021-06-07 14:46:07 +02:00
graue70	f34dd0b98f	Fix typos in comments (#8279 )	2021-06-07 10:43:54 +02:00
Jean-Hugues Roy	ff5cf3606c	Improvements to French stopwords list (#7941 ) * "y" etc. Many changes described in pull request * Update spacy/lang/fr/stop_words.py * Update spacy/lang/fr/stop_words.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-06-02 11:50:49 +02:00
Vito De Tullio	3672464e25	applying suggestion to avoid mypy errors (#8265 ) * applying suggestion to avoid mypy errors * sign contributor agreement	2021-06-02 19:25:30 +10:00
Adriane Boyd	4aa1a7d5a3	Remove unsupported attrs from attrs.IDS (#8132 ) The attributes `PROB`, `CLUSTER` and `SENT_END` are not supported by `Lexeme.get_struct_attr` so should not be included through `attrs.IDS` as supported attributes in `Doc.to_array` and other methods.	2021-06-02 19:16:57 +10:00
Paul O'Leary McCann	5aba213349	Fix skweak Github URL Github entry should not contain url, just user/repo	2021-05-31 18:00:43 +09:00
Kristian Boda	dc8d8d15d2	Add hmrb to spaCy Universe (#8129 ) * docs: add hmrb to spacy universe * docs: add sentence on spacy versions * docs: update description and images * misc: add spaCy Contributor Agreement	2021-05-31 18:40:48 +10:00
Dhruv Naik	283f64a98d	Fix bug from Entityruler: ent_ids returns None for phrases (#8169 ) * bugfix for explosion/spaCy#8168 * add test for explosion/spaCy#8168	2021-05-31 18:38:53 +10:00
Michael K	b0467d2972	Add project urls to package metadata (#7728 ) This adds the links to PyPI. To see that in action check out https://pypi.org/project/Django/ (source code: `b8c9e9fae1/setup.cfg (L27-L32)`)	2021-05-31 18:38:29 +10:00

1 2 3 4 5 ...

14569 Commits