mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
Update 101 and usage docs
This commit is contained in:
parent
49235017bf
commit
b03fb2d7b0
|
@ -2,7 +2,7 @@
|
|||
<style>
|
||||
.svg__pipeline__text { fill: #1a1e23; font: 20px "Source Sans Pro" }
|
||||
.svg__pipeline__text-small { fill: #1a1e23; font: bold 18px "Source Sans Pro" }
|
||||
.svg__pipeline__text-code { fill: #1a1e23; font: 600 16px "Source Code Pro" }
|
||||
.svg__pipeline__text-code { fill: #1a1e23; font: 600 16px "Source Code Pro" }
|
||||
</style>
|
||||
<rect width="601" height="127" x="159" y="21" fill="none" stroke="#09a3d5" stroke-width="3" rx="19.1" stroke-dasharray="3 6" ry="19.1"/>
|
||||
<path fill="#e1d5e7" stroke="#9673a6" stroke-width="2" d="M801 55h120v60H801z"/>
|
||||
|
|
Before Width: | Height: | Size: 3.1 KiB After Width: | Height: | Size: 3.1 KiB |
|
@ -89,4 +89,6 @@ p
|
|||
|
||||
p
|
||||
| Even though both #[code Doc] objects contain the same words, the internal
|
||||
| integer IDs are very different.
|
||||
| integer IDs are very different. The same applies for all other strings,
|
||||
| like the annotation scheme. To avoid mismatched IDs, spaCy will always
|
||||
| export the vocab if you save a #[code Doc] or #[code nlp] object.
|
||||
|
|
|
@ -139,6 +139,8 @@ p
|
|||
new_doc = Doc(Vocab()).from_disk('/moby_dick.bin')
|
||||
|
||||
+infobox
|
||||
| #[strong API:] #[+api("language") #[code Language]],
|
||||
| #[+api("doc") #[code Doc]]
|
||||
| #[strong Usage:] #[+a("/docs/usage/saving-loading") Saving and loading]
|
||||
|
||||
+h(2, "rule-matcher") Match text with token rules
|
||||
|
|
|
@ -345,7 +345,7 @@ p
|
|||
| account and check the #[code subtree] for intensifiers like "very", to
|
||||
| increase the sentiment score. At some point, you might also want to train
|
||||
| a sentiment model. However, the approach described in this example is
|
||||
| very useful for #[strong bootstrapping rules to gather training data].
|
||||
| very useful for #[strong bootstrapping rules to collect training data].
|
||||
| It's also an incredibly fast way to gather first insights into your data
|
||||
| – with about 1 million tweets, you'd be looking at a processing time of
|
||||
| #[strong under 1 minute].
|
||||
|
|
Loading…
Reference in New Issue
Block a user