spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-05 12:21:27 +03:00

History

Matthew Honnibal f9946154d9 Add SpanCategorizer component (#6747 ) * Draft spancat model * Add spancat model * Add test for extract_spans * Add extract_spans layer * Upd extract_spans * Add spancat model * Add test for spancat model * Upd spancat model * Update spancat component * Upd spancat * Update spancat model * Add quick spancat test * Import SpanCategorizer * Fix SpanCategorizer component * Import SpanGroup * Fix span extraction * Fix import * Fix import * Upd model * Update spancat models * Add scoring, update defaults * Update and add docs * Fix type * Update spacy/ml/extract_spans.py * Auto-format and fix import * Fix comment * Fix type * Fix type * Update website/docs/api/spancategorizer.md * Fix comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Better defense Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix labels list Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/extract_spans.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/pipeline/spancat.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Set annotations during update * Set annotations in spancat * fix imports in test * Update spacy/pipeline/spancat.py * replace MaxoutLogistic with LinearLogistic * fix config * various small fixes * remove set_annotations parameter in update * use our beloved tupley format with recent support for doc.spans * bugfix to allow renaming the default span_key (scores weren't showing up) * use different key in docs example * change defaults to better-working parameters from project (WIP) * register spacy.extract_spans.v1 for legacy purposes * Upd dev version so can build wheel * layers instead of architectures for smaller building blocks * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Include additional scores from overrides in combined score weights * Parameterize spans key in scoring Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so that it's possible to evaluate multiple `spancat` components in the same pipeline. * Use the (intentionally very short) default spans key `sc` in the `SpanCategorizer` * Adjust the default score weights to include the default key * Adjust the scorer to use `spans_{spans_key}` as the prefix for the returned score * Revert addition of `attr_name` argument to `score_spans` and adjust the key in the `getter` instead. Note that for `spancat` components with a custom `span_key`, the score weights currently need to be modified manually in `[training.score_weights]` for them to be available during training. To suppress the default score weights `spans_sc_p/r/f` during training, set them to `null` in `[training.score_weights]`. * Update website/docs/api/scorer.md * Fix scorer for spans key containing underscore * Increment version * Add Spans to Evaluate CLI (#8439) * Add Spans to Evaluate CLI * Change to spans_key * Add spans per_type output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix spancat GPU issues (#8455) * Fix GPU issues * Require thinc >=8.0.6 * Switch to glorot_uniform_init * Fix and test ngram suggester * Include final ngram in doc for all sizes * Fix ngrams for docs of the same length as ngram size * Handle batches of docs that result in no ngrams * Add tests Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Nirant <NirantK@users.noreply.github.com>		2021-06-24 12:35:27 +02:00
..
cli	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
displacy	Also exclude user hooks in displacy conversion (#7419 )	2021-03-12 09:41:59 +01:00
lang	Fix non-deterministic deduplication in Greek lemmatizer (#8421 )	2021-06-17 09:11:01 +02:00
matcher	Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1	2021-06-15 15:05:17 +02:00
ml	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
pipeline	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
tests	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
tokens	Various fixes for spans in Docs.from_docs (#8487 )	2021-06-23 15:51:35 +02:00
training	Fix setting empty entities in Example.from_dict (#8426 )	2021-06-18 10:41:50 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Show warning if entity_ruler runs without patterns (#7807 )	2021-06-04 17:37:38 +02:00
__main__.py	Tidy up	2020-06-22 00:45:40 +02:00
about.py	Set version to v3.1.0 (#8452 )	2021-06-21 10:41:40 +02:00
attrs.pxd	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
attrs.pyx	Remove unsupported attrs from attrs.IDS (#8132 )	2021-06-02 19:16:57 +10:00
compat.py	Use Literal type for nr_feature_tokens	2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg	pretrain architectures (#6451 )	2020-12-08 14:41:03 +08:00
default_config.cfg	Add training option to set annotations on update (#7767 )	2021-04-26 16:53:53 +02:00
errors.py	Use minor version for compatibility check (#8403 )	2021-06-21 09:39:22 +02:00
glossary.py	Add Chinese PTB tags to glossary (#7993 )	2021-05-06 18:43:03 +10:00
kb.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
kb.pyx	KB & NEL to/from bytes (#8113 )	2021-05-20 18:11:30 +10:00
language.py	Don't use the same vocab for source models (#8388 )	2021-06-21 09:33:33 +02:00
lexeme.pxd	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
lexeme.pyx	fix 's typo's across code base (#8384 )	2021-06-15 10:57:08 +02:00
lookups.py	Update load_lookups return type and docstring (#7907 )	2021-04-27 09:13:39 +02:00
morphology.pxd	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
morphology.pyx	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
pipe_analysis.py	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
py.typed	Add py.typed	2021-03-16 09:48:31 +01:00
schemas.py	Add training option to set annotations on update (#7767 )	2021-04-26 16:53:53 +02:00
scorer.py	Extend score_spans for overlapping & non-labeled spans (#7209 )	2021-04-08 12:19:17 +02:00
strings.pxd	Remove 'cleanup' of strings (#6007 )	2020-09-01 16:12:15 +02:00
strings.pyx	Make vocab update in get_docs deterministic (#7603 )	2021-04-09 11:53:13 +02:00
structs.pxd	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 )	2021-01-14 17:30:41 +11:00
symbols.pxd	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
symbols.pyx	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
tokenizer.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
tokenizer.pyx	Fix tokenizer cache flushing (#7836 )	2021-04-22 18:14:57 +10:00
typedefs.pxd	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master	2020-11-25 11:49:34 +01:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
vectors.pyx	Fix vectors data on GPU (#7626 )	2021-04-19 18:30:03 +10:00
vocab.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
vocab.pyx	Skip vector ngram backoff if minn is not set (#7925 )	2021-05-06 18:34:35 +10:00