diff --git a/website/docs/api/architectures.mdx b/website/docs/api/architectures.mdx index 4c5447f75..04b96d39d 100644 --- a/website/docs/api/architectures.mdx +++ b/website/docs/api/architectures.mdx @@ -390,7 +390,7 @@ in other components, see | | | -Mixed-precision support is currently an experimental feature. + Mixed-precision support is currently an experimental feature. @@ -467,7 +467,7 @@ one component. | **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ | -Mixed-precision support is currently an experimental feature. + Mixed-precision support is currently an experimental feature. diff --git a/website/docs/api/attributes.mdx b/website/docs/api/attributes.mdx index adacd3898..3142b741d 100644 --- a/website/docs/api/attributes.mdx +++ b/website/docs/api/attributes.mdx @@ -41,10 +41,9 @@ from string attribute names to internal attribute IDs is stored in The corresponding [`Token` object attributes](/api/token#attributes) can be accessed using the same names in lowercase, e.g. `token.orth` or `token.length`. -For attributes that represent string values, the internal integer ID is -accessed as `Token.attr`, e.g. `token.dep`, while the string value can be -retrieved by appending `_` as in `token.dep_`. - +For attributes that represent string values, the internal integer ID is accessed +as `Token.attr`, e.g. `token.dep`, while the string value can be retrieved by +appending `_` as in `token.dep_`. | Attribute | Description | | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | diff --git a/website/docs/api/cli.mdx b/website/docs/api/cli.mdx index fc2c46022..fc5b09dbc 100644 --- a/website/docs/api/cli.mdx +++ b/website/docs/api/cli.mdx @@ -474,8 +474,7 @@ report span characteristics such as the average span length and the span (or span boundary) distinctiveness. The distinctiveness measure shows how different the tokens are with respect to the rest of the corpus using the KL-divergence of the token distributions. To learn more, you can check out Papay et al.'s work on -[*Dissecting Span Identification Tasks with Performance Prediction* (EMNLP -2020)](https://aclanthology.org/2020.emnlp-main.396/). +[_Dissecting Span Identification Tasks with Performance Prediction_ (EMNLP 2020)](https://aclanthology.org/2020.emnlp-main.396/). diff --git a/website/docs/api/dependencymatcher.mdx b/website/docs/api/dependencymatcher.mdx index cae4221bf..0ed413340 100644 --- a/website/docs/api/dependencymatcher.mdx +++ b/website/docs/api/dependencymatcher.mdx @@ -87,7 +87,6 @@ come directly from | `A <++ B` | `B` is a right parent of `A`, i.e. `A` is a child of `B` and `A.i < B.i` _(not in Semgrex)_. | | `A <-- B` | `B` is a left parent of `A`, i.e. `A` is a child of `B` and `A.i > B.i` _(not in Semgrex)_. | - ## DependencyMatcher.\_\_init\_\_ {#init tag="method"} Create a `DependencyMatcher`. diff --git a/website/docs/api/entityruler.mdx b/website/docs/api/entityruler.mdx index c2ba33f01..909d4674b 100644 --- a/website/docs/api/entityruler.mdx +++ b/website/docs/api/entityruler.mdx @@ -99,9 +99,9 @@ be a token pattern (list) or a phrase pattern (string). For example: ## EntityRuler.initialize {#initialize tag="method" new="3"} Initialize the component with data and used before training to load in rules -from a [pattern file](/usage/rule-based-matching/#entityruler-files). This method -is typically called by [`Language.initialize`](/api/language#initialize) and -lets you customize arguments it receives via the +from a [pattern file](/usage/rule-based-matching/#entityruler-files). This +method is typically called by [`Language.initialize`](/api/language#initialize) +and lets you customize arguments it receives via the [`[initialize.components]`](/api/data-formats#config-initialize) block in the config. @@ -210,10 +210,10 @@ of dicts) or a phrase pattern (string). For more details, see the usage guide on | ---------- | ---------------------------------------------------------------- | | `patterns` | The patterns to add. ~~List[Dict[str, Union[str, List[dict]]]]~~ | - ## EntityRuler.remove {#remove tag="method" new="3.2.1"} -Remove a pattern by its ID from the entity ruler. A `ValueError` is raised if the ID does not exist. +Remove a pattern by its ID from the entity ruler. A `ValueError` is raised if +the ID does not exist. > #### Example > @@ -224,9 +224,9 @@ Remove a pattern by its ID from the entity ruler. A `ValueError` is raised if th > ruler.remove("apple") > ``` -| Name | Description | -| ---------- | ---------------------------------------------------------------- | -| `id` | The ID of the pattern rule. ~~str~~ | +| Name | Description | +| ---- | ----------------------------------- | +| `id` | The ID of the pattern rule. ~~str~~ | ## EntityRuler.to_disk {#to_disk tag="method"} diff --git a/website/docs/api/example.mdx b/website/docs/api/example.mdx index 63768d58f..f98a114a1 100644 --- a/website/docs/api/example.mdx +++ b/website/docs/api/example.mdx @@ -288,9 +288,9 @@ Calculate alignment tables between two tokenizations. ### Alignment attributes {#alignment-attributes"} -Alignment attributes are managed using `AlignmentArray`, which is a -simplified version of Thinc's [Ragged](https://thinc.ai/docs/api-types#ragged) -type that only supports the `data` and `length` attributes. +Alignment attributes are managed using `AlignmentArray`, which is a simplified +version of Thinc's [Ragged](https://thinc.ai/docs/api-types#ragged) type that +only supports the `data` and `length` attributes. | Name | Description | | ----- | ------------------------------------------------------------------------------------- | diff --git a/website/docs/api/index.mdx b/website/docs/api/index.mdx index a9dc408f6..2bd5c0ea6 100644 --- a/website/docs/api/index.mdx +++ b/website/docs/api/index.mdx @@ -3,6 +3,6 @@ title: Library Architecture next: /api/architectures --- -import Architecture101 from 'usage/101/\_architecture.md' +import Architecture101 from 'usage/101/_architecture.md' diff --git a/website/docs/api/kb.mdx b/website/docs/api/kb.mdx index b217a1678..b140bb6c1 100644 --- a/website/docs/api/kb.mdx +++ b/website/docs/api/kb.mdx @@ -106,7 +106,7 @@ to you. ## KnowledgeBase.get_alias_candidates {#get_alias_candidates tag="method"} -This method is _not_ available from spaCy 3.5 onwards. + This method is _not_ available from spaCy 3.5 onwards. From spaCy 3.5 on `KnowledgeBase` is an abstract class (with diff --git a/website/docs/api/matcher.mdx b/website/docs/api/matcher.mdx index 8cc446c6a..e16ea7333 100644 --- a/website/docs/api/matcher.mdx +++ b/website/docs/api/matcher.mdx @@ -64,7 +64,7 @@ matched: > ``` | OP | Description | -|---------|------------------------------------------------------------------------| +| ------- | ---------------------------------------------------------------------- | | `!` | Negate the pattern, by requiring it to match exactly 0 times. | | `?` | Make the pattern optional, by allowing it to match 0 or 1 times. | | `+` | Require the pattern to match 1 or more times. | diff --git a/website/docs/api/morphology.mdx b/website/docs/api/morphology.mdx index 20fcd1a40..565e520b5 100644 --- a/website/docs/api/morphology.mdx +++ b/website/docs/api/morphology.mdx @@ -105,11 +105,11 @@ representation. ## Attributes {#attributes} -| Name | Description | -| ------------- | ------------------------------------------------------------------------------------------------------------------------------ | -| `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `|`. ~~str~~ | -| `FIELD_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ | -| `VALUE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ | +| Name | Description | +| ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------- | +| `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is ` | `. ~~str~~ | +| `FIELD_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ | +| `VALUE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ | ## MorphAnalysis {#morphanalysis tag="class" source="spacy/tokens/morphanalysis.pyx"} diff --git a/website/docs/api/sentencizer.mdx b/website/docs/api/sentencizer.mdx index b75c7a2f1..f5017fbdb 100644 --- a/website/docs/api/sentencizer.mdx +++ b/website/docs/api/sentencizer.mdx @@ -38,7 +38,7 @@ how the component should be configured. You can override its settings via the > ``` | Setting | Description | -| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | | `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` | | `overwrite` 3.2 | Whether existing annotation is overwritten. Defaults to `False`. ~~bool~~ | | `scorer` 3.2 | The scoring method. Defaults to [`Scorer.score_spans`](/api/scorer#score_spans) for the attribute `"sents"` ~~Optional[Callable]~~ | diff --git a/website/docs/images/displacy-dep-founded.html b/website/docs/images/displacy-dep-founded.html index e22984ee1..8e3c47522 100644 --- a/website/docs/images/displacy-dep-founded.html +++ b/website/docs/images/displacy-dep-founded.html @@ -1,58 +1,155 @@ - - - Smith - - - - - founded - - - - - a - - - - - healthcare - - - - - company - - - - - - - nsubj + + + Smith + - - - - - - det + + founded + - - - - - - compound + + a + - - - - - - dobj + + healthcare + - - + + + company + + + + + + + + nsubj + + + + + + + + + + det + + + + + + + + + + compound + + + + + + + + + + dobj + + + + diff --git a/website/docs/images/displacy-ent-custom.html b/website/docs/images/displacy-ent-custom.html index 709c6f631..b4e1bae3c 100644 --- a/website/docs/images/displacy-ent-custom.html +++ b/website/docs/images/displacy-ent-custom.html @@ -1,33 +1,81 @@
But + style=" + line-height: 2.5; + font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, + 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'; + font-size: 18px; + " +> + But Google ORGis starting from behind. The company made a late push into hardware, and Apple ORG’s Siri, available on iPhones, and Amazon ORG’s Alexa software, which runs on its Echo and Dot devices, have clear leads in consumer - adoption.
+ adoption. + diff --git a/website/docs/images/displacy-ent-snek.html b/website/docs/images/displacy-ent-snek.html index c8b416d8d..6604d9b78 100644 --- a/website/docs/images/displacy-ent-snek.html +++ b/website/docs/images/displacy-ent-snek.html @@ -1,24 +1,57 @@
🌱🌿 🐍 SNEK ____ 🌳🌲 ____ 👨‍🌾 HUMAN diff --git a/website/docs/images/displacy-ent1.html b/website/docs/images/displacy-ent1.html index 708df8093..9fde5cf88 100644 --- a/website/docs/images/displacy-ent1.html +++ b/website/docs/images/displacy-ent1.html @@ -1,36 +1,83 @@
Apple ORG is looking at buying U.K. GPE startup for $1 billion MONEY diff --git a/website/docs/images/displacy-ent2.html b/website/docs/images/displacy-ent2.html index 5e1833ca0..01ab5c2bf 100644 --- a/website/docs/images/displacy-ent2.html +++ b/website/docs/images/displacy-ent2.html @@ -1,37 +1,84 @@
When Sebastian Thrun PERSON started working on self-driving cars at Google ORG in 2007 DATE diff --git a/website/docs/images/displacy-long.html b/website/docs/images/displacy-long.html index 8938f6a56..e298610aa 100644 --- a/website/docs/images/displacy-long.html +++ b/website/docs/images/displacy-long.html @@ -5,7 +5,13 @@ class="displacy" width="1975" height="399.5" - style="max-width: none; height: 399.5px; color: #000000; background: #ffffff; font-family: Arial" + style=" + max-width: none; + height: 399.5px; + color: #000000; + background: #ffffff; + font-family: Arial; + " > Apple diff --git a/website/docs/images/displacy-long2.html b/website/docs/images/displacy-long2.html index abe18c42a..c428bd2cb 100644 --- a/website/docs/images/displacy-long2.html +++ b/website/docs/images/displacy-long2.html @@ -1,84 +1,212 @@ - - - Autonomous - ADJ - - - - cars - NOUN - - - - shift - VERB - - - - insurance - NOUN - - - - liability - NOUN - - - - toward - ADP - - - - manufacturers - NOUN - - - - - - amod + + + Autonomous + ADJ - - - - - - nsubj + + cars + NOUN - - - - - - compound + + shift + VERB - - - - - - dobj + + insurance + NOUN - - - - - - prep + + liability + NOUN - - - - - - pobj + + toward + ADP - - + + + manufacturers + NOUN + + + + + + + amod + + + + + + + + + + nsubj + + + + + + + + + + compound + + + + + + + + + + dobj + + + + + + + + + + prep + + + + + + + + + + pobj + + + + diff --git a/website/docs/images/displacy-span-custom.html b/website/docs/images/displacy-span-custom.html index 97dd3b140..10cb6dd2d 100644 --- a/website/docs/images/displacy-span-custom.html +++ b/website/docs/images/displacy-span-custom.html @@ -1,31 +1,84 @@ -
+
Welcome to the - + Bank + style=" + background: #ddd; + top: 40px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #ddd; + top: 40px; + height: 4px; + border-top-left-radius: 3px; + border-bottom-left-radius: 3px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #ddd; + color: #000; + top: -0.5em; + padding: 2px 3px; + position: absolute; + font-size: 0.6em; + font-weight: bold; + line-height: 1; + border-radius: 3px; + " + > BANK - + of + style=" + background: #ddd; + top: 40px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > - + China + style=" + background: #ddd; + top: 40px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > . -
\ No newline at end of file +
diff --git a/website/docs/images/displacy-span.html b/website/docs/images/displacy-span.html index 9bbc6403c..cfee1dc7e 100644 --- a/website/docs/images/displacy-span.html +++ b/website/docs/images/displacy-span.html @@ -1,41 +1,123 @@ -
+
Welcome to the - + Bank + style=" + background: #7aecec; + top: 40px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #7aecec; + top: 40px; + height: 4px; + border-top-left-radius: 3px; + border-bottom-left-radius: 3px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #7aecec; + color: #000; + top: -0.5em; + padding: 2px 3px; + position: absolute; + font-size: 0.6em; + font-weight: bold; + line-height: 1; + border-radius: 3px; + " + > ORG - + of + style=" + background: #7aecec; + top: 40px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > - + China + style=" + background: #7aecec; + top: 40px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #feca74; + top: 57px; + height: 4px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #feca74; + top: 57px; + height: 4px; + border-top-left-radius: 3px; + border-bottom-left-radius: 3px; + left: -1px; + width: calc(100% + 2px); + position: absolute; + " + > + style=" + background: #feca74; + color: #000; + top: -0.5em; + padding: 2px 3px; + position: absolute; + font-size: 0.6em; + font-weight: bold; + line-height: 1; + border-radius: 3px; + " + > GPE . -
\ No newline at end of file +
diff --git a/website/docs/models/index.mdx b/website/docs/models/index.mdx index 203555651..560c04675 100644 --- a/website/docs/models/index.mdx +++ b/website/docs/models/index.mdx @@ -189,8 +189,8 @@ than the rule-based `sentencizer`. #### Switch from trainable lemmatizer to default lemmatizer -Since v3.3, a number of pipelines use a trainable lemmatizer. You can check whether -the lemmatizer is trainable: +Since v3.3, a number of pipelines use a trainable lemmatizer. You can check +whether the lemmatizer is trainable: ```python nlp = spacy.load("de_core_web_sm") diff --git a/website/docs/styleguide.mdx b/website/docs/styleguide.mdx index 673ced892..7dd8997fb 100644 --- a/website/docs/styleguide.mdx +++ b/website/docs/styleguide.mdx @@ -67,8 +67,7 @@ import { Colors, Patterns } from 'widgets/styleguide' ## Typography {#typography} -import { H1, H2, H3, H4, H5, Label, InlineList } from -'components/typography' +import { H1, H2, H3, H4, H5, Label, InlineList } from 'components/typography' > #### Markdown > @@ -103,12 +102,12 @@ in the sidebar menu.
-

Headline 1

-

Headline 2

-

Headline 3

-

Headline 4

-
Headline 5
- +

Headline 1

+

Headline 2

+

Headline 3

+

Headline 4

+
Headline 5
+
--- @@ -184,8 +183,9 @@ installed. -method 2 tagger, -parser +method 2 + tagger, parser + @@ -202,13 +202,25 @@ Link buttons come in two variants, `primary` and `secondary` and two sizes, with an optional `large` size modifier. Since they're mostly used as enhanced links, the buttons are implemented as styled links instead of native button elements. - - + + + +
- - + + + + ## Components diff --git a/website/docs/usage/101/_named-entities.mdx b/website/docs/usage/101/_named-entities.mdx index 2abc45cbd..1778ab0a9 100644 --- a/website/docs/usage/101/_named-entities.mdx +++ b/website/docs/usage/101/_named-entities.mdx @@ -1,9 +1,9 @@ A named entity is a "real-world object" that's assigned a name – for example, a person, a country, a product or a book title. spaCy can **recognize various -types of named entities in a document, by asking the model for a -prediction**. Because models are statistical and strongly depend on the -examples they were trained on, this doesn't always work _perfectly_ and might -need some tuning later, depending on your use case. +types of named entities in a document, by asking the model for a prediction**. +Because models are statistical and strongly depend on the examples they were +trained on, this doesn't always work _perfectly_ and might need some tuning +later, depending on your use case. Named entities are available as the `ents` property of a `Doc`: @@ -32,7 +32,11 @@ for ent in doc.ents: Using spaCy's built-in [displaCy visualizer](/usage/visualizers), here's what our example sentence and its named entities look like: -import DisplaCyEntHtml from 'images/displacy-ent1.html'; import { Iframe } from -'components/embed' +import DisplaCyEntHtml from 'images/displacy-ent1.html' +import { Iframe } from 'components/embed' -