spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-13 01:32:32 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	e796aab4b3	Resizable textcat (#7862 ) * implement textcat resizing for TextCatCNN * resizing textcat in-place * simplify code * ensure predictions for old textcat labels remain the same after resizing (WIP) * fix for softmax * store softmax as attr * fix ensemble weight copy and cleanup * restructure slightly * adjust documentation, update tests and quickstart templates to use latest versions * extend unit test slightly * revert unnecessary edits * fix typo * ensemble architecture won't be resizable for now * use resizable layer (WIP) * revert using resizable layer * resizable container while avoid shape inference trouble * cleanup * ensure model continues training after resizing * use fill_b parameter * use fill_defaults * resize_layer callback * format * bump thinc to 8.0.4 * bump spacy-legacy to 3.0.6	2021-06-16 11:45:00 +02:00
Adriane Boyd	5646fcbe46	Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1	2021-06-15 15:05:17 +02:00
graue70	f34dd0b98f	Fix typos in comments (#8279 )	2021-06-07 10:43:54 +02:00
Adriane Boyd	9dfd3c9484	Use warnings.warn instead of logger.warning	2021-06-04 17:44:08 +02:00
Sofie Van Landeghem	f0277bdeab	Show warning if entity_ruler runs without patterns (#7807 ) * Show warning if entity_ruler runs without patterns * Show warning if matcher runs without patterns * fix wording * unit test for warning once (WIP) * warn W036 only once * cleanup * create filter_warning helper	2021-06-04 17:37:38 +02:00
Paul O'Leary McCann	d959603d51	Don't add duplicate patterns all the time in EntityRuler (fix #8216 ) (#8246 ) * Don't add duplicate patterns (fix #8216) * Refactor EntityRuler init This simplifies the EntityRuler init code. This is helpful as prep for allowing the EntityRuler to reset itself. * Make EntityRuler.clear reset matchers Includes a new test for this. * Tidy PhraseMatcher instantiation Since the attr can be None safely now, the guard if is no longer required here. Also renamed the `_validate` attr. Maybe it's not needed? * Fix NER test * Add test to make sure patterns aren't increasing * Move test to regression tests	2021-06-03 09:05:26 +02:00
Paul O'Leary McCann	d54631f68b	Fix other open calls without context managers (#8245 )	2021-05-31 19:04:29 +10:00
Dhruv Naik	283f64a98d	Fix bug from Entityruler: ent_ids returns None for phrases (#8169 ) * bugfix for explosion/spaCy#8168 * add test for explosion/spaCy#8168	2021-05-31 18:38:53 +10:00
Sofie Van Landeghem	fff662e41f	Ensemble textcat with listener (#8012 ) * add unit test for two listeners, with a textcat ensemble in the middle * return zero gradients instead of None in accumulate_gradient	2021-05-31 18:21:06 +10:00
Sofie Van Landeghem	ff91e6dac7	Show warning if entity_ruler runs without patterns (#7807 ) * Show warning if entity_ruler runs without patterns * Show warning if matcher runs without patterns * fix wording * unit test for warning once (WIP) * warn W036 only once * cleanup * create filter_warning helper	2021-05-31 18:20:27 +10:00
Paul O'Leary McCann	04239e94c7	Use a context manager when reading model (fix #7036 ) (#8244 )	2021-05-31 17:36:17 +10:00
Sofie Van Landeghem	202943bc8c	KB & NEL to/from bytes (#8113 ) * unit test for pickling KB * add pickling test for NEL * KB to_bytes and from_bytes * NEL to_bytes and from_bytes * xfail pickle tests for now * fix docs * cleanup	2021-05-20 18:11:30 +10:00
Adriane Boyd	6788d90f61	Preserve existing ENT_KB_ID annotation in NER (#7988 ) * Preserve existing ENT_KB_ID annotation in NER Preserve `ent_kb_id` annotation on existing entity spans, which is not preserved by the transition system. * Simplify kb_id assignment * Simplify further	2021-05-06 18:49:55 +10:00
Adriane Boyd	d2bdaa7823	Replace negative rows with 0 in StaticVectors (#7674 ) * Replace negative rows with 0 in StaticVectors Replace negative row indices with 0-vectors in `StaticVectors`. * Increase versions related to StaticVectors * Increase versions of all architctures and layers related to `StaticVectors` * Improve efficiency of 0-vector operations Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5 * Update config defaults to new versions * Update docs	2021-04-22 18:04:15 +10:00
Sofie Van Landeghem	27dbbb9903	Bugfix/nel crossing sentence (#7630 ) * ensure each entity gets a KB ID, even when it's not within a sentence * cleanup	2021-04-12 18:08:01 +10:00
Adriane Boyd	8008e2f75b	Use morph hash in lemmatizer cache key (#7690 ) Use the morph hash rather than the `MorphAnalysis` object in the cache key so that the `Lemmatizer` can be pickled.	2021-04-08 13:22:38 +02:00
Adriane Boyd	39153ef90f	Update lexeme_norm checks * Add util method for check * Add new languages to list with lexeme norm tables * Add check to all relevant components * Add config details to warning message Note that we're not actually inspecting the model config to see if `NORM` is used as an attribute, so it may warn in cases where it's not relevant.	2021-03-19 10:59:27 +01:00
Ines Montani	37fc495f5d	Merge pull request #7353 from jankrepl/fix_entity_rules_labels	2021-03-09 15:09:24 +01:00
Sofie Van Landeghem	932887b950	textcat scoring fix and multi_label docs (#6974 ) * add multi-label textcat to menu * add infobox on textcat API * add info to v3 migration guide * small edits * further fixes in doc strings * add infobox to textcat architectures * add textcat_multilabel to overview of built-in components * spelling * fix unrelated warn msg * Add textcat_multilabel to quickstart [ci skip] * remove separate documentation page for multilabel_textcategorizer * small edits * positive label clarification * avoid duplicating information in self.cfg and fix textcat.score * fix multilabel textcat too * revert threshold to storage in cfg * revert threshold stuff for multi-textcat Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:04:22 +11:00
Jan Krepl	f26b61e001	Make sure sorted	2021-03-09 10:49:53 +01:00
Adriane Boyd	10c930cc96	Re-refactor Sentencizer with Pipe API (#7176 ) Reapply the refactoring (#4721) so that `Sentencizer` uses the faster `predict` and `set_annotations` for both `__call__` and `pipe`.	2021-02-26 09:48:14 +01:00
Sofie Van Landeghem	b92f81d5da	fix NEL config and IO, and n_sents functionality (#7100 ) * fix NEL config and IO, and n_sents functionality * add docs * fix test	2021-02-22 14:49:52 +11:00
Sofie Van Landeghem	ba5a50f62b	NEL docs & UX (#7129 ) * EL set_kb docs fix * custom warning for set_kb mistake	2021-02-22 11:04:22 +11:00
Matthew Honnibal	0fb8d437c0	Fix sentence fragments bug (#7056 , #7035 ) (#7057 ) * Add test for #7035 * Update test for issue 7056 * Fix test * Fix transitions method used in testing * Fix state eol detection when rebuffer * Clean up redundant fix	2021-02-14 13:38:13 +11:00
Ines Montani	9ba715ed16	Tidy up and auto-format	2021-02-13 12:55:56 +11:00
svlandeg	aa3ad8825d	loop instead of any	2021-02-12 13:14:30 +01:00
svlandeg	a52d466bfc	any instead of all	2021-02-11 20:50:55 +01:00
Ines Montani	ad9ce3c8f6	Fix issue #6950 : allow pickling Tok2Vec with listeners	2021-02-11 11:37:39 +11:00
Sofie Van Landeghem	a323ef90df	ensure the loss value is cast as float (#6928 )	2021-02-07 07:51:56 +08:00
René Octavio Queiroz Dias	999ff03b19	fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911 ) * docs: Add agreement * bug: Regression test Issue #6908 * fix: Changed from Dict to Iterable[str] Fix #6908 * Update test to use make_tempdir * fix: Fix WindowsPath error Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-02-04 23:37:13 +01:00
Sofie Van Landeghem	f638306598	remove link_components flag again (#6883 )	2021-02-02 10:08:40 +08:00
Ines Montani	d0c3775712	Replace links to nightly docs [ci skip]	2021-01-30 20:09:38 +11:00
Ines Montani	e6accb3a9e	Tidy up and auto-format	2021-01-30 12:52:33 +11:00
Ines Montani	2102082478	Make Tok2Vec.remove_listener return bool Whether listener was removed	2021-01-29 21:41:38 +11:00
Ines Montani	0f3e3eedc2	Add Tok2vec.remove_listener	2021-01-29 19:36:38 +11:00
Sofie Van Landeghem	837a4f53c2	Error handling in nlp.pipe (#6817 ) * add error handler for pipe methods * add unit tests * remove pipe method that are the same as their base class * have Language keep track of a default error handler * cleanup * formatting * small refactor * add documentation	2021-01-29 08:51:21 +08:00
Sofie Van Landeghem	6b68ad027b	Fix beam NER resizing (#6834 ) * move label check to sub methods * add tests	2021-01-27 23:39:14 +11:00
Matthew Honnibal	05050210f3	Dont add labels implicitly for parser	2021-01-27 13:04:47 +11:00
Matthew Honnibal	1d20e21f3e	Add labels implicitly for parser and ner	2021-01-27 12:54:47 +11:00
Matthew Honnibal	f049df1715	Revert "Set annotations in update" (#6810 ) * Revert "Set annotations in update (#6767)" This reverts commit `e680efc7cc`. * Fix version * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/transition_parser.pyx * Update spacy/pipeline/transition_parser.pyx * Update website/docs/api/multilabel_textcategorizer.md * Update website/docs/api/tok2vec.md * Update website/docs/usage/layers-architectures.md * Update website/docs/usage/layers-architectures.md * Update website/docs/api/transformer.md * Update website/docs/api/textcategorizer.md * Update website/docs/api/tagger.md * Update spacy/pipeline/entity_linker.py * Update website/docs/api/sentencerecognizer.md * Update website/docs/api/pipe.md * Update website/docs/api/morphologizer.md * Update website/docs/api/entityrecognizer.md * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/multitask.pyx * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/textcat.py * Update spacy/pipeline/textcat.py * Update spacy/pipeline/textcat.py * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/trainable_pipe.pyx * Update spacy/pipeline/trainable_pipe.pyx * Update spacy/pipeline/transition_parser.pyx * Update spacy/pipeline/transition_parser.pyx * Update website/docs/api/entitylinker.md * Update website/docs/api/dependencyparser.md * Update spacy/pipeline/trainable_pipe.pyx	2021-01-25 22:18:45 +08:00
Sofie Van Landeghem	e680efc7cc	Set annotations in update (#6767 ) * bump to 3.0.0rc4 * do set_annotations in component update calls * update docs and remove set_annotations flag * fix EL test	2021-01-20 11:49:25 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Ines Montani	e697609fef	Update docstrings and types [ci skip]	2021-01-18 22:31:26 +11:00
Adriane Boyd	bf0cdae8d4	Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory	2021-01-17 19:54:41 +08:00
Adriane Boyd	43a752a2a0	Fix assertion in default get oracle sequence usage (#6738 ) Remove assertion for default debug value in `get_oracle_sequence_from_state`.	2021-01-16 16:07:39 +01:00
Matthew Honnibal	f0c696b4aa	Fix failed merge of #6694 patch	2021-01-16 13:44:11 +11:00
Adriane Boyd	9328dd5625	Handle unset token.morph in Morphologizer (#6704 ) * Handle unset token.morph in Morphologizer Handle unset `token.morph` in `Morphologizer.initialize` and `Morphologizer.get_loss`. If both `token.morph` and `token.pos` are unset, treat the annotation as missing rather than empty. * Add token.has_morph()	2021-01-15 17:20:10 +01:00
Matthew Honnibal	7b3f0c6f1b	Questionable fix for parser training bug with misaligned sentences (#6694 ) * Questionable fix for parser training bug with misaligned sentences * Fix Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-01-15 14:18:24 +01:00
Ines Montani	b0b743597c	Tidy up and auto-format	2021-01-15 11:57:36 +11:00
svlandeg	fec9b81aa2	Merge remote-tracking branch 'upstream/develop' into feature/missing-dep	2021-01-13 17:46:12 +01:00

1 2 3 4 5 ...

448 Commits