mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 17:06:29 +03:00
Add aside on spaCy's custom pronoun lemma
This commit is contained in:
parent
d0c15730c4
commit
6a793251c8
|
@ -50,6 +50,13 @@ p A "lemma" is the uninflected form of a word. In English, this means:
|
||||||
+item #[strong Nouns]: The form like "dog", not "dogs"; like "child", not "children"
|
+item #[strong Nouns]: The form like "dog", not "dogs"; like "child", not "children"
|
||||||
+item #[strong Verbs]: The form like "write", not "writes", "writing", "wrote" or "written"
|
+item #[strong Verbs]: The form like "write", not "writes", "writing", "wrote" or "written"
|
||||||
|
|
||||||
|
+aside("About spaCy's custom pronoun lemma")
|
||||||
|
| Unlike verbs and common nouns, there's no clear base form of a personal
|
||||||
|
| pronoun. Should the lemma of "me" be "I", or should we normalize person
|
||||||
|
| as well, giving "it" — or maybe "he"? spaCy's solution is to introduce a
|
||||||
|
| novel symbol, #[code.u-nowrap -PRON-], which is used as the lemma for
|
||||||
|
| all personal pronouns.
|
||||||
|
|
||||||
p
|
p
|
||||||
| The lemmatization data is taken from
|
| The lemmatization data is taken from
|
||||||
| #[+a("https://wordnet.princeton.edu") WordNet]. However, we also add a
|
| #[+a("https://wordnet.princeton.edu") WordNet]. However, we also add a
|
||||||
|
|
Loading…
Reference in New Issue
Block a user