mirror of
https://github.com/explosion/spaCy.git
synced 2025-07-10 08:12:24 +03:00
* Minor edits to index.rst
This commit is contained in:
parent
e09dc6eccd
commit
c6b546848d
|
@ -72,7 +72,7 @@ particularly egregious:
|
||||||
>>> # Load the pipeline, and call it with some text.
|
>>> # Load the pipeline, and call it with some text.
|
||||||
>>> nlp = spacy.en.English()
|
>>> nlp = spacy.en.English()
|
||||||
>>> tokens = nlp("‘Give it back,’ he pleaded abjectly, ‘it’s mine.’",
|
>>> tokens = nlp("‘Give it back,’ he pleaded abjectly, ‘it’s mine.’",
|
||||||
tag=True, parse=True)
|
tag=True, parse=False)
|
||||||
>>> output = ''
|
>>> output = ''
|
||||||
>>> for tok in tokens:
|
>>> for tok in tokens:
|
||||||
... output += tok.string.upper() if tok.pos == ADVERB else tok.string
|
... output += tok.string.upper() if tok.pos == ADVERB else tok.string
|
||||||
|
@ -86,12 +86,12 @@ we only wanted to highlight "abjectly". While "back" is undoubtedly an adverb,
|
||||||
we probably don't want to highlight it.
|
we probably don't want to highlight it.
|
||||||
|
|
||||||
There are lots of ways we might refine our logic, depending on just what words
|
There are lots of ways we might refine our logic, depending on just what words
|
||||||
we want to flag. The simplest way to filter out adverbs like "back" and "not"
|
we want to flag. The simplest way to exclude adverbs like "back" and "not"
|
||||||
is by word frequency: these words are much more common than the prototypical
|
is by word frequency: these words are much more common than the prototypical
|
||||||
manner adverbs that the style guides are worried about.
|
manner adverbs that the style guides are worried about.
|
||||||
|
|
||||||
The prob attribute of a Lexeme or Token object gives a log probability estimate
|
The :py:attr:`Lexeme.prob` and :py:attr:`Token.prob` attribute gives a
|
||||||
of the word, based on smoothed counts from a 3bn word corpus:
|
log probability estimate of the word:
|
||||||
|
|
||||||
>>> nlp.vocab[u'back'].prob
|
>>> nlp.vocab[u'back'].prob
|
||||||
-7.403977394104004
|
-7.403977394104004
|
||||||
|
@ -100,6 +100,11 @@ of the word, based on smoothed counts from a 3bn word corpus:
|
||||||
>>> nlp.vocab[u'quietly'].prob
|
>>> nlp.vocab[u'quietly'].prob
|
||||||
-11.07155704498291
|
-11.07155704498291
|
||||||
|
|
||||||
|
(The probability estimate is based on counts from a 3 billion word corpus,
|
||||||
|
smoothed using the Gale (2002) `Simple Good-Turing`_ method.)
|
||||||
|
|
||||||
|
.. _`Simple Good-Turing`: http://www.d.umn.edu/~tpederse/Courses/CS8761-FALL02/Code/sgt-gale.pdf
|
||||||
|
|
||||||
So we can easily exclude the N most frequent words in English from our adverb
|
So we can easily exclude the N most frequent words in English from our adverb
|
||||||
marker. Let's try N=1000 for now:
|
marker. Let's try N=1000 for now:
|
||||||
|
|
||||||
|
@ -114,8 +119,8 @@ marker. Let's try N=1000 for now:
|
||||||
>>> print(''.join(tok.string.upper() if is_adverb(tok) else tok.string))
|
>>> print(''.join(tok.string.upper() if is_adverb(tok) else tok.string))
|
||||||
‘Give it back,’ he pleaded ABJECTLY, ‘it’s mine.’
|
‘Give it back,’ he pleaded ABJECTLY, ‘it’s mine.’
|
||||||
|
|
||||||
There are lots of ways we could refine the logic, depending on just what words we
|
There are lots of other ways we could refine the logic, depending on just what
|
||||||
want to flag. Let's say we wanted to only flag adverbs that modified words
|
words we want to flag. Let's say we wanted to only flag adverbs that modified words
|
||||||
similar to "pleaded". This is easy to do, as spaCy loads a vector-space
|
similar to "pleaded". This is easy to do, as spaCy loads a vector-space
|
||||||
representation for every word (by default, the vectors produced by
|
representation for every word (by default, the vectors produced by
|
||||||
`Levy and Goldberg (2014)`_. Naturally, the vector is provided as a numpy
|
`Levy and Goldberg (2014)`_. Naturally, the vector is provided as a numpy
|
||||||
|
|
Loading…
Reference in New Issue
Block a user