spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-15 22:24:31 +03:00

History

Adriane Boyd bf0cdae8d4 Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory		2021-01-17 19:54:41 +08:00
..
__init__.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_analysis.py	Simplify pipe analysis	2020-08-01 13:40:06 +02:00
test_attributeruler.py	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
test_entity_linker.py	adding tests for trained models to ensure predict reproducibility	2020-10-13 21:07:13 +02:00
test_entity_ruler.py	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
test_functions.py	Add token_splitter component (#6726 )	2021-01-17 19:54:41 +08:00
test_initialize.py	Test with default value	2020-09-29 17:00:40 +02:00
test_lemmatizer.py	Use logger.warning instead of logger.warn (#6596 )	2020-12-21 08:25:10 +08:00
test_models.py	call NumpyOps instead of get_current_ops()	2020-10-14 16:55:00 +02:00
test_morphologizer.py	Handle unset token.morph in Morphologizer (#6704 )	2021-01-15 17:20:10 +01:00
test_pipe_factories.py	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
test_pipe_methods.py	Fix typo in test	2020-10-09 18:00:21 +02:00
test_sentencizer.py	Refactor Docs.is_ flags (#6044 )	2020-09-17 00:14:01 +02:00
test_senter.py	adding tests for trained models to ensure predict reproducibility	2020-10-13 21:07:13 +02:00
test_tagger.py	Sync missing and misaligned values in Tagger loss (#6689 )	2021-01-10 11:30:37 +11:00
test_textcat.py	Fix test	2021-01-15 12:51:02 +11:00
test_tok2vec.py	Fix types of Tok2Vec encoding architectures (#6442 )	2021-01-07 16:39:27 +11:00