spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-30 23:05:43 +03:00

History

Adriane Boyd bf0cdae8d4 Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory		2021-01-17 19:54:41 +08:00
..
_parser_internals	Fix assertion in default get oracle sequence usage (#6738 )	2021-01-16 16:07:39 +01:00
__init__.py	multi-label textcat component (#6474 )	2021-01-06 13:07:14 +11:00
attributeruler.py	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
dep_parser.pyx	refer to _parser_internals.nonproj.DELIMITER	2021-01-07 18:58:13 +01:00
entity_linker.py	fix embed_size in Entity Linker architecture (#6343 )	2020-11-04 22:20:13 +01:00
entityruler.py	Merge branch 'master' into pr/6444	2020-12-09 11:09:40 +11:00
functions.py	Add token_splitter component (#6726 )	2021-01-17 19:54:41 +08:00
lemmatizer.py	Use logger.warning instead of logger.warn (#6596 )	2020-12-21 08:25:10 +08:00
morphologizer.pyx	Handle unset token.morph in Morphologizer (#6704 )	2021-01-15 17:20:10 +01:00
multitask.pyx	remove labels from constructor	2020-11-11 21:34:12 +01:00
ner.pyx	Getting scores out of beam_ner (#6575 )	2021-01-06 12:02:32 +01:00
pipe.pxd	TrainablePipe (#6213 )	2020-10-08 21:33:49 +02:00
pipe.pyx	TrainablePipe (#6213 )	2020-10-08 21:33:49 +02:00
sentencizer.pyx	Handle missing reference values in scorer (#6286 )	2020-11-03 15:47:18 +01:00
senter.pyx	Handle missing reference values in scorer (#6286 )	2020-11-03 15:47:18 +01:00
tagger.pyx	Sync missing and misaligned values in Tagger loss (#6689 )	2021-01-10 11:30:37 +11:00
textcat_multilabel.py	Tidy up and auto-format	2021-01-15 11:57:36 +11:00
textcat.py	Tidy up and auto-format	2021-01-15 11:57:36 +11:00
tok2vec.py	Revert added_strings change (#6236 )	2020-10-10 18:55:07 +02:00
trainable_pipe.pxd	Revert added_strings change (#6236 )	2020-10-10 18:55:07 +02:00
trainable_pipe.pyx	always return losses	2020-10-14 15:00:49 +02:00
transition_parser.pxd	TrainablePipe (#6213 )	2020-10-08 21:33:49 +02:00
transition_parser.pyx	Add beam_parser and beam_ner components for v3 (#6369 )	2020-12-13 09:08:32 +08:00