mirror of
https://github.com/explosion/spaCy.git
synced 2025-11-02 17:07:49 +03:00
* Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory |
||
|---|---|---|
| .. | ||
| 101 | ||
| _benchmarks-models.md | ||
| embeddings-transformers.md | ||
| facts-figures.md | ||
| index.md | ||
| layers-architectures.md | ||
| linguistic-features.md | ||
| models.md | ||
| processing-pipelines.md | ||
| projects.md | ||
| rule-based-matching.md | ||
| saving-loading.md | ||
| spacy-101.md | ||
| training.md | ||
| v2-1.md | ||
| v2-2.md | ||
| v2-3.md | ||
| v2.md | ||
| v3.md | ||
| visualizers.md | ||