💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
---
title: Lemmatizer
tag: class
2020-08-07 16:27:13 +03:00
source: spacy/pipeline/lemmatizer.py
new: 3
teaser: 'Pipeline component for lemmatization'
api_string_name: lemmatizer
api_trainable: false
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
---
2020-08-10 01:01:38 +03:00
Component for assigning base forms to tokens using rules based on part-of-speech
2022-03-28 12:13:50 +03:00
tags, or lookup tables. Different [`Language` ](/api/language ) subclasses can
implement their own lemmatizer components via
2020-08-10 01:01:38 +03:00
[language-specific factories ](/usage/processing-pipelines#factories-language ).
The default data used is provided by the
[`spacy-lookups-data` ](https://github.com/explosion/spacy-lookups-data )
extension package.
2022-03-28 12:13:50 +03:00
For a trainable lemmatizer, see [`EditTreeLemmatizer` ](/api/edittreelemmatizer ).
2020-08-10 01:01:38 +03:00
< Infobox variant = "warning" title = "New in v3.0" >
As of v3.0, the `Lemmatizer` is a **standalone pipeline component** that can be
added to your pipeline, and not a hidden part of the vocab that runs behind the
scenes. This makes it easier to customize how lemmas should be assigned in your
pipeline.
2020-08-29 16:56:50 +03:00
If the lemmatization mode is set to `"rule"` , which requires coarse-grained POS
(`Token.pos`) to be assigned, make sure a [`Tagger` ](/api/tagger ),
[`Morphologizer` ](/api/morphologizer ) or another component assigning POS is
available in the pipeline and runs _before_ the lemmatizer.
2020-08-10 01:01:38 +03:00
< / Infobox >
2021-09-01 13:09:39 +03:00
## Assigned Attributes {#assigned-attributes}
Lemmas generated by rules or predicted will be saved to `Token.lemma` .
| Location | Value |
| -------------- | ------------------------- |
| `Token.lemma` | The lemma (hash). ~~int~~ |
| `Token.lemma_` | The lemma. ~~str~~ |
2020-08-07 16:27:13 +03:00
## Config and implementation
2020-07-27 19:11:45 +03:00
2020-08-07 16:27:13 +03:00
The default config is defined by the pipeline component factory and describes
how the component should be configured. You can override its settings via the
`config` argument on [`nlp.add_pipe` ](/api/language#add_pipe ) or in your
2020-08-17 17:45:24 +03:00
[`config.cfg` for training ](/usage/training#config ). For examples of the lookups
2020-09-24 14:15:28 +03:00
data format used by the lookup and rule-based lemmatizers, see
2020-08-17 17:45:24 +03:00
[`spacy-lookups-data` ](https://github.com/explosion/spacy-lookups-data ).
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> #### Example
>
> ```python
2020-08-07 16:27:13 +03:00
> config = {"mode": "rule"}
> nlp.add_pipe("lemmatizer", config=config)
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> ```
2020-08-07 16:27:13 +03:00
2021-08-12 13:50:03 +03:00
| Setting | Description |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mode` | The lemmatizer mode, e.g. `"lookup"` or `"rule"` . Defaults to `lookup` if no language-specific lemmatizer is available (see the following table). ~~str~~ |
| `overwrite` | Whether to overwrite existing lemmas. Defaults to `False` . ~~bool~~ |
| `model` | **Not yet implemented:** the model to use. ~~Model~~ |
| _keyword-only_ | |
| `scorer` | The scoring method. Defaults to [`Scorer.score_token_attr` ](/api/scorer#score_token_attr ) for the attribute `"lemma"` . ~~Optional[Callable]~~ |
2021-06-14 13:19:36 +03:00
Many languages specify a default lemmatizer mode other than `lookup` if a better
lemmatizer is available. The lemmatizer modes `rule` and `pos_lookup` require
[`token.pos` ](/api/token ) from a previous pipeline component (see example
pipeline configurations in the
[pretrained pipeline design details ](/models#design-cnn )) or rely on third-party
2022-08-22 12:27:14 +03:00
libraries (`pymorphy3`).
2021-06-14 13:19:36 +03:00
| Language | Default Mode |
| -------- | ------------ |
| `bn` | `rule` |
2021-06-21 10:33:50 +03:00
| `ca` | `pos_lookup` |
2021-06-14 13:19:36 +03:00
| `el` | `rule` |
| `en` | `rule` |
| `es` | `rule` |
| `fa` | `rule` |
| `fr` | `rule` |
2021-06-21 10:33:50 +03:00
| `it` | `pos_lookup` |
2021-06-14 13:19:36 +03:00
| `mk` | `rule` |
| `nb` | `rule` |
| `nl` | `rule` |
| `pl` | `pos_lookup` |
2022-08-22 12:27:14 +03:00
| `ru` | `pymorphy3` |
2021-06-14 13:19:36 +03:00
| `sv` | `rule` |
2022-08-22 12:27:14 +03:00
| `uk` | `pymorphy3` |
2020-08-07 16:27:13 +03:00
```python
2020-09-12 18:05:10 +03:00
%%GITHUB_SPACY/spacy/pipeline/lemmatizer.py
2020-08-07 16:27:13 +03:00
```
## Lemmatizer.\_\_init\_\_ {#init tag="method"}
> #### Example
2019-10-01 22:36:04 +03:00
>
2020-08-07 16:27:13 +03:00
> ```python
> # Construction via add_pipe with default model
> lemmatizer = nlp.add_pipe("lemmatizer")
>
> # Construction via add_pipe with custom settings
2021-02-08 23:24:57 +03:00
> config = {"mode": "rule", "overwrite": True}
2020-08-07 16:27:13 +03:00
> lemmatizer = nlp.add_pipe("lemmatizer", config=config)
> ```
2019-10-01 22:36:04 +03:00
2020-08-07 16:27:13 +03:00
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.add_pipe` ](/api/language#add_pipe ).
2020-10-02 16:42:36 +03:00
| Name | Description |
| -------------- | --------------------------------------------------------------------------------------------------- |
| `vocab` | The shared vocabulary. ~~Vocab~~ |
| `model` | **Not yet implemented:** The model to use. ~~Model~~ |
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
| _keyword-only_ | |
| mode | The lemmatizer mode, e.g. `"lookup"` or `"rule"` . Defaults to `"lookup"` . ~~str~~ |
2022-07-01 08:28:03 +03:00
| overwrite | Whether to overwrite existing lemmas. ~~bool~~ |
2019-10-01 22:36:04 +03:00
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
## Lemmatizer.\_\_call\_\_ {#call tag="method"}
2020-08-07 16:27:13 +03:00
Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when the `nlp` object is called on a text
and all pipeline components are applied to the `Doc` in order.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> #### Example
>
> ```python
2020-08-07 16:27:13 +03:00
> doc = nlp("This is a sentence.")
> lemmatizer = nlp.add_pipe("lemmatizer")
> # This usually happens under the hood
> processed = lemmatizer(doc)
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> ```
2020-08-17 17:45:24 +03:00
| Name | Description |
| ----------- | -------------------------------- |
| `doc` | The document to process. ~~Doc~~ |
| **RETURNS** | The processed document. ~~Doc~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
2020-08-07 16:27:13 +03:00
## Lemmatizer.pipe {#pipe tag="method"}
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
2020-08-07 16:27:13 +03:00
Apply the pipe to a stream of documents. This usually happens under the hood
when the `nlp` object is called on a text and all pipeline components are
applied to the `Doc` in order.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> #### Example
>
> ```python
2020-08-07 16:27:13 +03:00
> lemmatizer = nlp.add_pipe("lemmatizer")
> for doc in lemmatizer.pipe(docs, batch_size=50):
> pass
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> ```
2020-08-17 17:45:24 +03:00
| Name | Description |
| -------------- | ------------------------------------------------------------- |
| `stream` | A stream of documents. ~~Iterable[Doc]~~ |
| _keyword-only_ | |
| `batch_size` | The number of documents to buffer. Defaults to `128` . ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
2020-10-02 16:42:36 +03:00
## Lemmatizer.initialize {#initialize tag="method"}
Initialize the lemmatizer and load any data resources. This method is typically
called by [`Language.initialize` ](/api/language#initialize ) and lets you
customize arguments it receives via the
[`[initialize.components]`](/api/data-formats#config-initialize) block in the
config. The loading only happens during initialization, typically before
training. At runtime, all data is loaded from disk.
> #### Example
>
> ```python
> lemmatizer = nlp.add_pipe("lemmatizer")
> lemmatizer.initialize(lookups=lookups)
> ```
>
> ```ini
> ### config.cfg
> [initialize.components.lemmatizer]
>
> [initialize.components.lemmatizer.lookups]
> @misc = "load_my_lookups.v1"
> ```
| Name | Description |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example` ](/api/example ) objects. Defaults to `None` . ~~Optional[Callable[[], Iterable[Example]]]~~ |
| _keyword-only_ | |
| `nlp` | The current `nlp` object. Defaults to `None` . ~~Optional[Language]~~ |
| `lookups` | The lookups object containing the tables such as `"lemma_rules"` , `"lemma_index"` , `"lemma_exc"` and `"lemma_lookup"` . If `None` , default tables are loaded from [`spacy-lookups-data` ](https://github.com/explosion/spacy-lookups-data ). Defaults to `None` . ~~Optional[Lookups]~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.lookup_lemmatize {#lookup_lemmatize tag="method"}
Lemmatize a token using a lookup-based approach. If no lemma is found, the
2020-10-02 14:24:33 +03:00
original string is returned.
2020-08-07 16:27:13 +03:00
2020-08-17 17:45:24 +03:00
| Name | Description |
| ----------- | --------------------------------------------------- |
| `token` | The token to lemmatize. ~~Token~~ |
| **RETURNS** | A list containing one or more lemmas. ~~List[str]~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.rule_lemmatize {#rule_lemmatize tag="method"}
Lemmatize a token using a rule-based approach. Typically relies on POS tags.
2020-08-17 17:45:24 +03:00
| Name | Description |
| ----------- | --------------------------------------------------- |
| `token` | The token to lemmatize. ~~Token~~ |
| **RETURNS** | A list containing one or more lemmas. ~~List[str]~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
## Lemmatizer.is_base_form {#is_base_form tag="method"}
Check whether we're dealing with an uninflected paradigm, so we can avoid
lemmatization entirely.
2020-08-17 17:45:24 +03:00
| Name | Description |
| ----------- | ---------------------------------------------------------------------------------------------------------------- |
| `token` | The token to analyze. ~~Token~~ |
| **RETURNS** | Whether the token's attributes (e.g., part-of-speech tag, morphological features) describe a base form. ~~bool~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.get_lookups_config {#get_lookups_config tag="classmethod"}
Returns the lookups configuration settings for a given mode for use in
2020-08-17 17:45:24 +03:00
[`Lemmatizer.load_lookups` ](/api/lemmatizer#load_lookups ).
2020-08-07 16:27:13 +03:00
2020-10-03 18:16:10 +03:00
| Name | Description |
| ----------- | -------------------------------------------------------------------------------------- |
| `mode` | The lemmatizer mode. ~~str~~ |
| **RETURNS** | The required table names and the optional table names. ~~Tuple[List[str], List[str]]~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.to_disk {#to_disk tag="method"}
Serialize the pipe to disk.
> #### Example
>
> ```python
> lemmatizer = nlp.add_pipe("lemmatizer")
> lemmatizer.to_disk("/path/to/lemmatizer")
> ```
2020-08-17 17:45:24 +03:00
| Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path` -like objects. ~~Union[str, Path]~~ |
| _keyword-only_ | |
| `exclude` | String names of [serialization fields ](#serialization-fields ) to exclude. ~~Iterable[str]~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.from_disk {#from_disk tag="method"}
Load the pipe from disk. Modifies the object in place and returns it.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> #### Example
>
> ```python
2020-08-07 16:27:13 +03:00
> lemmatizer = nlp.add_pipe("lemmatizer")
> lemmatizer.from_disk("/path/to/lemmatizer")
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
> ```
2020-08-17 17:45:24 +03:00
| Name | Description |
| -------------- | ----------------------------------------------------------------------------------------------- |
| `path` | A path to a directory. Paths may be either strings or `Path` -like objects. ~~Union[str, Path]~~ |
| _keyword-only_ | |
| `exclude` | String names of [serialization fields ](#serialization-fields ) to exclude. ~~Iterable[str]~~ |
| **RETURNS** | The modified `Lemmatizer` object. ~~Lemmatizer~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.to_bytes {#to_bytes tag="method"}
> #### Example
>
> ```python
> lemmatizer = nlp.add_pipe("lemmatizer")
> lemmatizer_bytes = lemmatizer.to_bytes()
> ```
Serialize the pipe to a bytestring.
2020-08-17 17:45:24 +03:00
| Name | Description |
| -------------- | ------------------------------------------------------------------------------------------- |
| _keyword-only_ | |
| `exclude` | String names of [serialization fields ](#serialization-fields ) to exclude. ~~Iterable[str]~~ |
| **RETURNS** | The serialized form of the `Lemmatizer` object. ~~bytes~~ |
2020-08-07 16:27:13 +03:00
## Lemmatizer.from_bytes {#from_bytes tag="method"}
Load the pipe from a bytestring. Modifies the object in place and returns it.
> #### Example
>
> ```python
> lemmatizer_bytes = lemmatizer.to_bytes()
> lemmatizer = nlp.add_pipe("lemmatizer")
> lemmatizer.from_bytes(lemmatizer_bytes)
> ```
2020-08-17 17:45:24 +03:00
| Name | Description |
| -------------- | ------------------------------------------------------------------------------------------- |
| `bytes_data` | The data to load from. ~~bytes~~ |
| _keyword-only_ | |
| `exclude` | String names of [serialization fields ](#serialization-fields ) to exclude. ~~Iterable[str]~~ |
| **RETURNS** | The `Lemmatizer` object. ~~Lemmatizer~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 21:31:19 +03:00
## Attributes {#attributes}
2020-08-17 17:45:24 +03:00
| Name | Description |
| --------- | ------------------------------------------- |
| `vocab` | The shared [`Vocab` ](/api/vocab ). ~~Vocab~~ |
| `lookups` | The lookups object. ~~Lookups~~ |
| `mode` | The lemmatizer mode. ~~str~~ |
2020-08-07 16:27:13 +03:00
## Serialization fields {#serialization-fields}
During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the `exclude` argument.
> #### Example
>
> ```python
> data = lemmatizer.to_disk("/path", exclude=["vocab"])
> ```
| Name | Description |
| --------- | ---------------------------------------------------- |
| `vocab` | The shared [`Vocab` ](/api/vocab ). |
| `lookups` | The lookups. You usually don't want to exclude this. |