mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-27 17:54:39 +03:00
Fix typos, wording and formatting
This commit is contained in:
parent
eb5a8be9ad
commit
10d05c2b92
|
@ -5,7 +5,7 @@ p
|
||||||
| #[strong how similar they are]. Predicting similarity is useful for
|
| #[strong how similar they are]. Predicting similarity is useful for
|
||||||
| building recommendation systems or flagging duplicates. For example, you
|
| building recommendation systems or flagging duplicates. For example, you
|
||||||
| can suggest a user content that's similar to what they're currently
|
| can suggest a user content that's similar to what they're currently
|
||||||
| looking at, or label a support ticket as a duplicate, if it's very
|
| looking at, or label a support ticket as a duplicate if it's very
|
||||||
| similar to an already existing one.
|
| similar to an already existing one.
|
||||||
|
|
||||||
p
|
p
|
||||||
|
|
|
@ -144,7 +144,7 @@ p
|
||||||
+table(["Argument", "Type", "Description"])
|
+table(["Argument", "Type", "Description"])
|
||||||
+row
|
+row
|
||||||
+cell #[code vocab]
|
+cell #[code vocab]
|
||||||
+cell #[coce Vocab]
|
+cell #[code Vocab]
|
||||||
+cell
|
+cell
|
||||||
| Shared data between components, including strings, morphology,
|
| Shared data between components, including strings, morphology,
|
||||||
| vectors etc.
|
| vectors etc.
|
||||||
|
|
|
@ -65,7 +65,7 @@ p
|
||||||
| spaCy provides a variety of linguistic annotations to give you insights
|
| spaCy provides a variety of linguistic annotations to give you insights
|
||||||
| into a text's grammatical structure. This includes the word types,
|
| into a text's grammatical structure. This includes the word types,
|
||||||
| i.e. the parts of speech, and how the words are related to each other.
|
| i.e. the parts of speech, and how the words are related to each other.
|
||||||
| For example, if you're analysing text, it makes a #[em huge] difference
|
| For example, if you're analysing text, it makes a huge difference
|
||||||
| whether a noun is the subject of a sentence, or the object – or whether
|
| whether a noun is the subject of a sentence, or the object – or whether
|
||||||
| "google" is used as a verb, or refers to the website or company in a
|
| "google" is used as a verb, or refers to the website or company in a
|
||||||
| specific context.
|
| specific context.
|
||||||
|
@ -119,9 +119,11 @@ include _spacy-101/_named-entities
|
||||||
|
|
||||||
+infobox
|
+infobox
|
||||||
| To learn more about entity recognition in spaCy, how to
|
| To learn more about entity recognition in spaCy, how to
|
||||||
| #[strong add your own entities] to a document and how to train and update
|
| #[strong add your own entities] to a document and how to
|
||||||
| the entity predictions of a model, see the usage guide on
|
| #[strong train and update] the entity predictions of a model, see the
|
||||||
| #[+a("/docs/usage/entity-recognition") named entity recognition].
|
| usage guides on
|
||||||
|
| #[+a("/docs/usage/entity-recognition") named entity recognition] and
|
||||||
|
| #[+a("/docs/usage/training-ner") training the named entity recognizer].
|
||||||
|
|
||||||
+h(2, "vectors-similarity") Word vectors and similarity
|
+h(2, "vectors-similarity") Word vectors and similarity
|
||||||
+tag-model("vectors")
|
+tag-model("vectors")
|
||||||
|
|
|
@ -20,19 +20,18 @@ p
|
||||||
nlp = Language(pipeline=['my_factory', mycomponent])
|
nlp = Language(pipeline=['my_factory', mycomponent])
|
||||||
|
|
||||||
p
|
p
|
||||||
| It's now much easier to customise the pipeline with your own components.
|
| It's now much easier to #[strong customise the pipeline] with your own
|
||||||
| Components are functions that receive a #[code Doc] object, modify and
|
| components, functions that receive a #[code Doc] object, modify and
|
||||||
| return it. If your component is stateful, you'll want to create a new one
|
| return it. If your component is stateful, you can define and register a
|
||||||
| for each pipeline. You can do that by defining and registering a factory
|
| factory which receives the shared #[code Vocab] object and returns a
|
||||||
| which receives the shared #[code Vocab] object and returns a component.
|
| component. spaCy's default components can be added to your pipeline by
|
||||||
|
| using their string IDs. This way, you won't have to worry about finding
|
||||||
p
|
| and implementing them – simply add #[code "tagger"] to the pipeline,
|
||||||
| spaCy's default components – the vectorizer, tagger, parser and entity
|
|
||||||
| recognizer, can be added to your pipeline by using their string IDs.
|
|
||||||
| This way, you won't have to worry about finding and implementing them –
|
|
||||||
| to use the default tagger, simply add #[code "tagger"] to the pipeline,
|
|
||||||
| and spaCy will know what to do.
|
| and spaCy will know what to do.
|
||||||
|
|
||||||
|
+image
|
||||||
|
include ../../assets/img/docs/pipeline.svg
|
||||||
|
|
||||||
+infobox
|
+infobox
|
||||||
| #[strong API:] #[+api("language") #[code Language]]
|
| #[strong API:] #[+api("language") #[code Language]]
|
||||||
| #[strong Usage:] #[+a("/docs/usage/language-processing-pipeline") Processing text]
|
| #[strong Usage:] #[+a("/docs/usage/language-processing-pipeline") Processing text]
|
||||||
|
@ -96,11 +95,10 @@ p
|
||||||
| #[code Language] class, or load a model that initialises one. This allows
|
| #[code Language] class, or load a model that initialises one. This allows
|
||||||
| languages to contain more custom data, e.g. lemmatizer lookup tables, or
|
| languages to contain more custom data, e.g. lemmatizer lookup tables, or
|
||||||
| complex regular expressions. The language data has also been tidied up
|
| complex regular expressions. The language data has also been tidied up
|
||||||
| and simplified. It's now also possible to overwrite the functions that
|
| and simplified. spaCy now also supports simple lookup-based lemmatization.
|
||||||
| compute lexical attributes like #[code like_num], and supply
|
|
||||||
| language-specific syntax iterators, e.g. to determine noun chunks. spaCy
|
+image
|
||||||
| now also supports simple lookup-based lemmatization. The data is stored
|
include ../../assets/img/docs/language_data.svg
|
||||||
| in a dictionary mapping a string to its lemma.
|
|
||||||
|
|
||||||
+infobox
|
+infobox
|
||||||
| #[strong API:] #[+api("language") #[code Language]]
|
| #[strong API:] #[+api("language") #[code Language]]
|
||||||
|
@ -111,13 +109,10 @@ p
|
||||||
|
|
||||||
+aside-code("Example").
|
+aside-code("Example").
|
||||||
from spacy.matcher import Matcher
|
from spacy.matcher import Matcher
|
||||||
from spacy.attrs import LOWER, IS_PUNCT
|
|
||||||
matcher = Matcher(nlp.vocab)
|
matcher = Matcher(nlp.vocab)
|
||||||
matcher.add('HelloWorld', None,
|
matcher.add('HEARTS', None, [{'ORTH': '❤️', 'OP': '+'}])
|
||||||
[{LOWER: 'hello'}, {IS_PUNCT: True}, {LOWER: 'world'}],
|
|
||||||
[{LOWER: 'hello'}, {LOWER: 'world'}])
|
|
||||||
assert len(matcher) == 1
|
assert len(matcher) == 1
|
||||||
assert 'HelloWorld' in matcher
|
assert 'HEARTS' in matcher
|
||||||
|
|
||||||
p
|
p
|
||||||
| Patterns can now be added to the matcher by calling
|
| Patterns can now be added to the matcher by calling
|
||||||
|
@ -157,28 +152,8 @@ p
|
||||||
+cell #[+api("language#to_disk") #[code Language.to_disk]]
|
+cell #[+api("language#to_disk") #[code Language.to_disk]]
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code Tokenizer.load]
|
+cell #[code Language.create_make_doc]
|
||||||
+cell
|
+cell #[+api("language#attributes") #[code Language.tokenizer]]
|
||||||
| #[+api("tokenizer#from_disk") #[code Tokenizer.from_disk]]
|
|
||||||
| #[+api("tokenizer#from_bytes") #[code Tokenizer.from_bytes]]
|
|
||||||
|
|
||||||
+row
|
|
||||||
+cell #[code Tagger.load]
|
|
||||||
+cell
|
|
||||||
| #[+api("tagger#from_disk") #[code Tagger.from_disk]]
|
|
||||||
| #[+api("tagger#from_bytes") #[code Tagger.from_bytes]]
|
|
||||||
|
|
||||||
+row
|
|
||||||
+cell #[code DependencyParser.load]
|
|
||||||
+cell
|
|
||||||
| #[+api("dependencyparser#from_disk") #[code DependencyParser.from_disk]]
|
|
||||||
| #[+api("dependencyparser#from_bytes") #[code DependencyParser.from_bytes]]
|
|
||||||
|
|
||||||
+row
|
|
||||||
+cell #[code EntityRecognizer.load]
|
|
||||||
+cell
|
|
||||||
| #[+api("entityrecognizer#from_disk") #[code EntityRecognizer.from_disk]]
|
|
||||||
| #[+api("entityrecognizer#from_bytes") #[code EntityRecognizer.from_bytes]]
|
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell
|
+cell
|
||||||
|
@ -212,6 +187,28 @@ p
|
||||||
| #[+api("stringstore#to_disk") #[code StringStore.to_disk]]
|
| #[+api("stringstore#to_disk") #[code StringStore.to_disk]]
|
||||||
| #[+api("stringstore#to_bytes") #[code StringStore.to_bytes]]
|
| #[+api("stringstore#to_bytes") #[code StringStore.to_bytes]]
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code Tokenizer.load]
|
||||||
|
+cell -
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code Tagger.load]
|
||||||
|
+cell
|
||||||
|
| #[+api("tagger#from_disk") #[code Tagger.from_disk]]
|
||||||
|
| #[+api("tagger#from_bytes") #[code Tagger.from_bytes]]
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code DependencyParser.load]
|
||||||
|
+cell
|
||||||
|
| #[+api("dependencyparser#from_disk") #[code DependencyParser.from_disk]]
|
||||||
|
| #[+api("dependencyparser#from_bytes") #[code DependencyParser.from_bytes]]
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code EntityRecognizer.load]
|
||||||
|
+cell
|
||||||
|
| #[+api("entityrecognizer#from_disk") #[code EntityRecognizer.from_disk]]
|
||||||
|
| #[+api("entityrecognizer#from_bytes") #[code EntityRecognizer.from_bytes]]
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code Matcher.load]
|
+cell #[code Matcher.load]
|
||||||
+cell -
|
+cell -
|
||||||
|
@ -232,7 +229,7 @@ p
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code Doc.read_bytes]
|
+cell #[code Doc.read_bytes]
|
||||||
+cell
|
+cell #[+api("binder") #[code Binder]]
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code Token.is_ancestor_of]
|
+cell #[code Token.is_ancestor_of]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user