Add aside on spaCy's custom pronoun lemma

This commit is contained in:
Ines Montani 2016-12-19 13:41:47 +01:00
parent d0c15730c4
commit 6a793251c8

View File

@ -50,6 +50,13 @@ p A "lemma" is the uninflected form of a word. In English, this means:
+item #[strong Nouns]: The form like "dog", not "dogs"; like "child", not "children" +item #[strong Nouns]: The form like "dog", not "dogs"; like "child", not "children"
+item #[strong Verbs]: The form like "write", not "writes", "writing", "wrote" or "written" +item #[strong Verbs]: The form like "write", not "writes", "writing", "wrote" or "written"
+aside("About spaCy's custom pronoun lemma")
| Unlike verbs and common nouns, there's no clear base form of a personal
| pronoun. Should the lemma of "me" be "I", or should we normalize person
| as well, giving "it" — or maybe "he"? spaCy's solution is to introduce a
| novel symbol, #[code.u-nowrap -PRON-], which is used as the lemma for
| all personal pronouns.
p p
| The lemmatization data is taken from | The lemmatization data is taken from
| #[+a("https://wordnet.princeton.edu") WordNet]. However, we also add a | #[+a("https://wordnet.princeton.edu") WordNet]. However, we also add a