spaCy/website/docs
Adriane Boyd bf0cdae8d4
Add token_splitter component (#6726)
* Add long_token_splitter component

Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.

The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.

Notes:

* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.

* Adjust API, add test

* Fix name in factory
2021-01-17 19:54:41 +08:00
..
api Add token_splitter component (#6726) 2021-01-17 19:54:41 +08:00
images Update landing [ci skip] 2020-10-16 11:46:33 +02:00
models Update models docs [ci skip] 2020-10-14 20:50:23 +02:00
usage Add token_splitter component (#6726) 2021-01-17 19:54:41 +08:00
index.md 💫 Update website (#3285) 2019-02-17 19:31:19 +01:00
styleguide.md Update styleguide [ci skip] 2020-09-14 11:25:57 +02:00