spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-05 12:50:20 +03:00

Author	SHA1	Message	Date
Adriane Boyd	4af02ac9e4	Set version to v3.0.9	2022-12-13 13:56:56 +01:00
Adriane Boyd	67c6ef2b2a	Increase tolerance for almost equal checks in textcat regression test	2022-12-13 13:56:56 +01:00
Adriane Boyd	c4af89f956	Clean up warnings in the test suite (#11331 )	2022-12-12 17:27:00 +01:00
Adriane Boyd	d4acae856a	Update flake8 version in reqs and CI * Update some unneeded forward refs related to flake8 checks	2022-12-12 14:30:06 +01:00
Adriane Boyd	0f87720411	Rename test helper method with non-test_ name (#11701 )	2022-12-12 14:02:50 +01:00
Adriane Boyd	c8009c2734	Cast to uint64 for all array-based doc representations (#11933 ) * Convert all individual values explicitly to uint64 for array-based doc representations * Temporarily test with latest numpy v1.24.0rc * Remove unnecessary conversion from attr_t * Reduce number of individual casts * Convert specifically from int32 to uint64 * Revert "Temporarily test with latest numpy v1.24.0rc" This reverts commit `eb0e3c5006`. * Also use int32 in tests	2022-12-12 14:02:50 +01:00
Paul O'Leary McCann	d4d4d69cb4	Config generation fails for GPU without transformers (#11899 ) If you don't have spacy-transformers installed, but try to use `init config` with the GPU flag, you'll get an error. The issue is that the `use_transformers` flag in the config is conflated with the GPU flag, and then there's an attempt to access transformers config info that may not exist. There may be a better way to do this, but this stops the error.	2022-12-12 14:02:50 +01:00
Paul O'Leary McCann	337ebda793	Add in errors used in the beam code that were removed at some point (#11935 ) I don't think there's any way to use the beam code at the moment, but as long as it's around the errors it refers to should also be present.	2022-12-12 14:02:50 +01:00
Adriane Boyd	5c975565dc	Add smart_open requirement, update deprecated options (#11864 ) * Switch from deprecated `ignore_ext` to `compression` * Add upload/download test for local files	2022-12-12 14:02:50 +01:00
Adriane Boyd	ebcc7d830f	Update slow readers test to use textcat_multilabel (#9300 )	2022-02-28 11:22:06 +01:00
Adriane Boyd	694c318f4f	Address random results in slow readers tests (#9544 ) * Set random seed for dataset shuffling * Use more dev examples for non-zero scores	2022-02-28 11:19:43 +01:00
Ines Montani	308b1706a7	Allow conftest.py to run twice for build envs	2022-02-28 09:22:34 +01:00
Adriane Boyd	3420506954	Set version to v3.0.8	2022-02-28 09:02:03 +01:00
Adriane Boyd	749631ad28	Fix Tok2Vec for empty batches (#10324 ) * Add test for tok2vec with vectors and empty docs * Add shortcut for empty batch in Tok2Vec.predict * Avoid types	2022-02-21 14:33:16 +01:00
Adriane Boyd	0080454140	Set version to v3.0.7	2021-07-16 16:38:15 +02:00
Adriane Boyd	6db938959d	Use 0-vector for OOV lexemes (#8639 )	2021-07-16 15:48:47 +02:00
Adriane Boyd	99a3f26d7f	Fix ru/uk lemmatizer mp with spawn (#8657 ) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible.	2021-07-16 15:48:47 +02:00
Adriane Boyd	c62566ffce	Fix Azerbaijani init, extend lang init tests (#8656 ) * Extend langs in initialize tests * Fix az init	2021-07-16 15:48:47 +02:00
Adriane Boyd	81e71a61f8	Raise an error for textcat with <2 labels (#8584 ) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs	2021-07-16 15:48:42 +02:00
Adriane Boyd	6aa3fede76	Fix duplicate spacy package CLI opts (#8551 ) Use `-c` for `--code` and not additionally for `--create-meta`, in line with the docs.	2021-07-16 15:48:19 +02:00
Adriane Boyd	71396273a5	Various fixes for spans in Docs.from_docs (#8487 ) * Fix spans offsets if a doc ends in a single space and no space is inserted * Also include spans key in merged doc for empty spans lists	2021-07-16 15:48:19 +02:00
Adriane Boyd	e51fff5432	Preserve paths.vectors/initialize.vectors setting in quickstart template	2021-07-16 15:48:19 +02:00
Adriane Boyd	c78eb28dfa	Filter W036 for entity ruler, etc. (#8424 )	2021-07-16 15:48:19 +02:00
Adriane Boyd	e3f1d4a7d0	Fix setting empty entities in Example.from_dict (#8426 )	2021-07-16 15:48:19 +02:00
Adriane Boyd	81515b4690	Fix non-deterministic deduplication in Greek lemmatizer (#8421 )	2021-07-16 15:48:19 +02:00
Paul O'Leary McCann	ad026dc5fd	Don't add duplicate patterns all the time in EntityRuler (fix #8216 ) (#8246 ) * Don't add duplicate patterns (fix #8216) * Refactor EntityRuler init This simplifies the EntityRuler init code. This is helpful as prep for allowing the EntityRuler to reset itself. * Make EntityRuler.clear reset matchers Includes a new test for this. * Tidy PhraseMatcher instantiation Since the attr can be None safely now, the guard if is no longer required here. Also renamed the `_validate` attr. Maybe it's not needed? * Fix NER test * Add test to make sure patterns aren't increasing * Move test to regression tests	2021-07-16 15:47:55 +02:00
Paul O'Leary McCann	1db18732e0	Fix other open calls without context managers (#8245 )	2021-07-16 15:47:55 +02:00
Paul O'Leary McCann	a834b03216	Use a context manager when reading model (fix #7036 ) (#8244 )	2021-07-16 15:47:55 +02:00
Sofie Van Landeghem	55e5f8ede3	Fix scoring normalization (#7629 ) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup	2021-07-16 15:47:55 +02:00
Adriane Boyd	bb97e7bf8a	Update validate CLI to fix compat and ignore warnings (#8423 )	2021-07-14 23:28:08 +02:00
Adriane Boyd	480a3bf3be	Make JsonlReader path optional (#8396 ) To avoid config errors during training when `[corpora.pretrain.path]` is `None` with the default `spacy.JsonlCorpus.v1` reader, make the reader path optional, similar to `spacy.Corpus.v1`.	2021-06-15 14:55:15 +02:00
Paul O'Leary McCann	94e1346f44	Change span lemmas to use original whitespace (fix #8368 ) (#8391 ) * Change span lemmas to use original whitespace (fix #8368) This is a redo of #8371 based off master. The test for this required some changes to existing tests. I don't think the changes were significant but I'd like someone to check them. * Remove mystery docstring This sentence was uncompleted for years, and now we will never know how it ends.	2021-06-15 13:24:54 +02:00
Paul O'Leary McCann	2c105cdbce	Raise error if deps not provided with heads (#8335 ) * Fill in deps if not provided with heads Before this change, if heads were passed without deps they would be silently ignored, which could be confusing. See #8334. * Use "dep" instead of a blank string This is the customary placeholder dep. It might be better to show an error here instead though. * Throw error on heads without deps * Add a test * Fix tests * Formatting * Fix all tests * Fix a test I missed * Revise error message * Clean up whitespace Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-06-15 13:23:32 +02:00
Sofie Van Landeghem	0fd0d949c4	fix 's typo's across code base (#8384 )	2021-06-15 10:57:08 +02:00
Sofie Van Landeghem	8729307e67	register extract_ngrams layer (#8358 ) * register extract_ngrams layer * fix import * bump spacy-legacy to 3.0.6 * revert bump (wrong PR)	2021-06-14 10:30:30 +02:00
Adriane Boyd	f4008bdb13	Restrict pymorphy2 requirement to pymorphy2 mode (#8299 ) For the Russian and Ukrainian lemmatizers, restrict the `pymorphy2` requirement to the mode `pymorphy2` so that lookup or other lemmatizer modes can be loaded without installing `pymorphy2`.	2021-06-11 10:19:22 +02:00
graue70	f34dd0b98f	Fix typos in comments (#8279 )	2021-06-07 10:43:54 +02:00
Jean-Hugues Roy	ff5cf3606c	Improvements to French stopwords list (#7941 ) * "y" etc. Many changes described in pull request * Update spacy/lang/fr/stop_words.py * Update spacy/lang/fr/stop_words.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-06-02 11:50:49 +02:00
Vito De Tullio	3672464e25	applying suggestion to avoid mypy errors (#8265 ) * applying suggestion to avoid mypy errors * sign contributor agreement	2021-06-02 19:25:30 +10:00
Adriane Boyd	4aa1a7d5a3	Remove unsupported attrs from attrs.IDS (#8132 ) The attributes `PROB`, `CLUSTER` and `SENT_END` are not supported by `Lexeme.get_struct_attr` so should not be included through `attrs.IDS` as supported attributes in `Doc.to_array` and other methods.	2021-06-02 19:16:57 +10:00
Dhruv Naik	283f64a98d	Fix bug from Entityruler: ent_ids returns None for phrases (#8169 ) * bugfix for explosion/spaCy#8168 * add test for explosion/spaCy#8168	2021-05-31 18:38:53 +10:00
Narayan Acharya	6b79714080	Address missing config overrides post load of models (#8208 )	2021-05-31 18:36:52 +10:00
Sofie Van Landeghem	fff662e41f	Ensemble textcat with listener (#8012 ) * add unit test for two listeners, with a textcat ensemble in the middle * return zero gradients instead of None in accumulate_gradient	2021-05-31 18:21:06 +10:00
Sofie Van Landeghem	ff91e6dac7	Show warning if entity_ruler runs without patterns (#7807 ) * Show warning if entity_ruler runs without patterns * Show warning if matcher runs without patterns * fix wording * unit test for warning once (WIP) * warn W036 only once * cleanup * create filter_warning helper	2021-05-31 18:20:27 +10:00
Paul O'Leary McCann	d1a221a374	Add all symbols in Unicode Currency Symbols block (#8212 ) * Add all symbols in Unicode Currency Symbols block In #8102 it came up that the rupee symbol was treated different from dollar / euro / yen symbols. This adds many symbols not already included. * Fix test * Fix training test	2021-05-31 18:03:40 +10:00
Ines Montani	5957ab74f7	Merge pull request #8112 from svlandeg/bugfix/replace-trf	2021-05-28 11:35:17 +10:00
Sofie Van Landeghem	3c58c0323f	fix docs (#8200 )	2021-05-27 10:48:59 +02:00
Sofie Van Landeghem	290bd6ed39	ensure tolerance is properly passed on (#8158 )	2021-05-27 18:10:28 +10:00
Sofie Van Landeghem	202943bc8c	KB & NEL to/from bytes (#8113 ) * unit test for pickling KB * add pickling test for NEL * KB to_bytes and from_bytes * NEL to_bytes and from_bytes * xfail pickle tests for now * fix docs * cleanup	2021-05-20 18:11:30 +10:00
Adriane Boyd	2c545c4c5b	Fix offsets in Span.get_lca_matrix (#8116 ) * Fix range in Span.get_lca_matrix Fix the adjusted token index / lca matrix index ranges for `_get_lca_matrix` for spans. * The range for `k` should correspond to the adjusted indices in `lca_matrix` with the `start` indexed at `0` * Update test for v3.x	2021-05-17 16:54:23 +02:00

1 2 3 4 5 ...

8691 Commits