mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 18:06:29 +03:00
bf0cdae8d4
* Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory |
||
---|---|---|
.. | ||
_parser_internals | ||
__init__.py | ||
attributeruler.py | ||
dep_parser.pyx | ||
entity_linker.py | ||
entityruler.py | ||
functions.py | ||
lemmatizer.py | ||
morphologizer.pyx | ||
multitask.pyx | ||
ner.pyx | ||
pipe.pxd | ||
pipe.pyx | ||
sentencizer.pyx | ||
senter.pyx | ||
tagger.pyx | ||
textcat_multilabel.py | ||
textcat.py | ||
tok2vec.py | ||
trainable_pipe.pxd | ||
trainable_pipe.pyx | ||
transition_parser.pxd | ||
transition_parser.pyx |