From bc0730be3f7f483ab7c3ffb33031fa3c7984d2e3 Mon Sep 17 00:00:00 2001
From: Ines Montani <ines@ines.io>
Date: Sat, 29 Aug 2020 12:53:14 +0200
Subject: [PATCH] Update docs [ci skip]

---
 website/docs/usage/embeddings-transformers.md | 12 ++---
 website/docs/usage/linguistic-features.md     | 45 +++++++++----------
 2 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md
index 751cff6a5..75be71845 100644
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@@ -229,6 +229,8 @@ By default, the `Transformer` component sets the
 [`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
 which lets you access the transformers outputs at runtime.
 
+<!-- TODO: update/confirm once we have final models trained -->
+
 ```cli
 $ python -m spacy download en_core_trf_lg
 ```
@@ -368,10 +370,10 @@ To change any of the settings, you can edit the `config.cfg` and re-run the
 training. To change any of the functions, like the span getter, you can replace
 the name of the referenced function – e.g. `@span_getters = "sent_spans.v1"` to
 process sentences. You can also register your own functions using the
-`span_getters` registry. For instance, the following custom function returns
-`Span` objects following sentence boundaries, unless a sentence succeeds a
-certain amount of tokens, in which case subsentences of at most `max_length`
-tokens are returned.
+[`span_getters` registry](/api/top-level#registry). For instance, the following
+custom function returns [`Span`](/api/span) objects following sentence
+boundaries, unless a sentence succeeds a certain amount of tokens, in which case
+subsentences of at most `max_length` tokens are returned.
 
 > #### config.cfg
 >
@@ -408,7 +410,7 @@ def configure_custom_sent_spans(max_length: int):
 To resolve the config during training, spaCy needs to know about your custom
 function. You can make it available via the `--code` argument that can point to
 a Python file. For more details on training with custom code, see the
-[training documentation](/usage/training#custom-code).
+[training documentation](/usage/training#custom-functions).
 
 ```cli
 python -m spacy train ./config.cfg --code ./code.py
diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md
index fe57d65ce..a0e58c9d2 100644
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@@ -750,14 +750,6 @@ subclass.
 
 ---
 
-<!--
-
-### Customizing the tokenizer {#tokenizer-custom}
-
-TODO: rewrite the docs on custom tokenization in a more user-friendly order, including details on how to integrate a fully custom tokenizer, representing a tokenizer in the config etc.
-
--->
-
 ### Adding special case tokenization rules {#special-cases}
 
 Most domains have at least some idiosyncrasies that require custom tokenization
@@ -1488,19 +1480,20 @@ for sent in doc.sents:
     print(sent.text)
 ```
 
-spaCy provides three alternatives for sentence segmentation:
+spaCy provides four alternatives for sentence segmentation:
 
-1. [Dependency parser](#sbd-parser): the statistical `parser` provides the most
-   accurate sentence boundaries based on full dependency parses.
-2. [Statistical sentence segmenter](#sbd-senter): the statistical `senter` is a
-   simpler and faster alternative to the parser that only sets sentence
-   boundaries.
-3. [Rule-based pipeline component](#sbd-component): the rule-based `sentencizer`
-   sets sentence boundaries using a customizable list of sentence-final
-   punctuation.
-
-You can also plug an entirely custom [rule-based function](#sbd-custom) into
-your [processing pipeline](/usage/processing-pipelines).
+1. [Dependency parser](#sbd-parser): the statistical
+   [`DependencyParser`](/api/dependencyparser) provides the most accurate
+   sentence boundaries based on full dependency parses.
+2. [Statistical sentence segmenter](#sbd-senter): the statistical
+   [`SentenceRecognizer`](/api/sentencerecognizer) is a simpler and faster
+   alternative to the parser that only sets sentence boundaries.
+3. [Rule-based pipeline component](#sbd-component): the rule-based
+   [`Sentencizer`](/api/sentencizer) sets sentence boundaries using a
+   customizable list of sentence-final punctuation.
+4. [Custom function](#sbd-custom): your own custom function added to the
+   processing pipeline can set sentence boundaries by writing to
+   `Token.is_sent_start`.
 
 ### Default: Using the dependency parse {#sbd-parser model="parser"}
 
@@ -1535,7 +1528,13 @@ smaller than the parser, its primary advantage is that it's easier to train
 custom models because it only requires annotated sentence boundaries rather than
 full dependency parses.
 
-<!-- TODO: correct senter loading -->
+<!-- TODO: update/confirm usage once we have final models trained -->
+
+> #### senter vs. parser
+>
+> The recall for the `senter` is typically slightly lower than for the parser,
+> which is better at predicting sentence boundaries when punctuation is not
+> present.
 
 ```python
 ### {executable="true"}
@@ -1547,10 +1546,6 @@ for sent in doc.sents:
     print(sent.text)
 ```
 
-The recall for the `senter` is typically slightly lower than for the parser,
-which is better at predicting sentence boundaries when punctuation is not
-present.
-
 ### Rule-based pipeline component {#sbd-component}
 
 The [`Sentencizer`](/api/sentencizer) component is a