spaCy/website/docs/usage
Adriane Boyd bf0cdae8d4
Add token_splitter component (#6726)
* Add long_token_splitter component

Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.

The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.

Notes:

* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.

* Adjust API, add test

* Fix name in factory
2021-01-17 19:54:41 +08:00
..
101 Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
_benchmarks-models.md Update docs [ci skip] 2020-10-15 17:27:24 +02:00
embeddings-transformers.md Add token_splitter component (#6726) 2021-01-17 19:54:41 +08:00
facts-figures.md Update docs [ci skip] 2020-10-15 17:27:24 +02:00
index.md Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
layers-architectures.md fix small typos (#6698) 2021-01-08 09:39:47 +01:00
linguistic-features.md Update docs [ci skip] 2020-11-09 12:43:26 +08:00
models.md Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
processing-pipelines.md Update docs [ci skip] 2020-11-09 12:43:26 +08:00
projects.md Apply suggestions from code review 2020-10-14 19:51:36 +02:00
rule-based-matching.md Merge branch 'master' into pr/6444 2020-12-09 11:09:40 +11:00
saving-loading.md Include custom code via spacy package command (#6531) 2020-12-10 20:36:46 +08:00
spacy-101.md Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
training.md Add initialize.before_init and after_init callbacks 2021-01-12 13:07:44 +01:00
v2-1.md Remove docs references to starters for now (see #6262) [ci skip] 2020-10-16 15:46:34 +02:00
v2-2.md Update v3 docs [ci skip] 2020-07-05 16:11:16 +02:00
v2-3.md Extend v2.3 migration guide (#5653) 2020-06-26 14:13:01 +02:00
v2.md Update docs [ci skip] 2020-09-12 17:05:10 +02:00
v3.md Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
visualizers.md Proofread remarks 2020-10-19 11:11:32 +02:00