mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-30 23:47:31 +03:00 
			
		
		
		
	Add "New in v3.1" guide
This commit is contained in:
		
							parent
							
								
									caba63b74f
								
							
						
					
					
						commit
						bc93c34f54
					
				|  | @ -82,7 +82,7 @@ shortcut for this and instantiate the component using its string name and | |||
| | `moves`                       | A list of transition names. Inferred from the data if set to `None`, which is the default. ~~Optional[List[str]]~~                                                                                                                                  | | ||||
| | _keyword-only_                |                                                                                                                                                                                                                                                     | | ||||
| | `update_with_oracle_cut_size` | During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. The model is not very sensitive to this parameter, so you usually won't need to change it. Defaults to `100`. ~~int~~ | | ||||
| | `incorrect_spans_key`         | Identifies spans that are known to be incorrect entity annotations. The incorrect entity annotations can be stored in the span group, under this key. Defaults to `None`. ~~Optional[str]~~                                                         | | ||||
| | `incorrect_spans_key`         | Identifies spans that are known to be incorrect entity annotations. The incorrect entity annotations can be stored in the span group in [`Doc.spans`](/api/doc#spans), under this key. Defaults to `None`. ~~Optional[str]~~                        | | ||||
| 
 | ||||
| ## EntityRecognizer.\_\_call\_\_ {#call tag="method"} | ||||
| 
 | ||||
|  |  | |||
							
								
								
									
										114
									
								
								website/docs/usage/v3-1.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										114
									
								
								website/docs/usage/v3-1.md
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,114 @@ | |||
| --- | ||||
| title: What's New in v3.1 | ||||
| teaser: New features and how to upgrade | ||||
| menu: | ||||
|   - ['New Features', 'features'] | ||||
|   - ['Upgrading Notes', 'upgrading'] | ||||
| --- | ||||
| 
 | ||||
| ## New Features {#features hidden="true"} | ||||
| 
 | ||||
| <!-- TODO: intro --> | ||||
| 
 | ||||
| ### Using predicted annotations during training {#predicted-annotations-training} | ||||
| 
 | ||||
| <!-- TODO: write --> | ||||
| 
 | ||||
| <Project id="pipelines/tagger_parser_predicted_annotations"> | ||||
| 
 | ||||
| This project shows how to use the `token.dep` attribute predicted by the parser | ||||
| as a feature for a subsequent tagger component in the pipeline. | ||||
| 
 | ||||
| </Project> | ||||
| 
 | ||||
| ### SpanCategorizer for predicting arbitrary and overlapping spans {#spancategorizer tag="experimental"} | ||||
| 
 | ||||
| A common task in applied NLP is extracting spans of texts from documents, | ||||
| including longer phrases or nested expressions. Named entity recognition isn't | ||||
| the right tool for this problem, since an entity recognizer typically predicts | ||||
| single token-based tags that are very sensitive to boundaries. This is effective | ||||
| for proper nouns and self-contained expressions, but less useful for other types | ||||
| of phrases or overlapping spans. The new | ||||
| [`SpanCategorizer`](/api/spancategorizer) component and | ||||
| [SpanCategorizer](/api/architectures#spancategorizer) architecture let you label | ||||
| arbitrary and potentially overlapping spans of texts. A span categorizer | ||||
| consists of two parts: a [suggester function](/api/spancategorizer#suggesters) | ||||
| that proposes candidate spans, which may or may not overlap, and a labeler model | ||||
| that predicts zero or more labels for each candidate. The predicted spans are | ||||
| available via the [`Doc.spans`](/api/doc#spans) container. | ||||
| 
 | ||||
| <!-- TODO: example, getting started (init config?), maybe project template --> | ||||
| 
 | ||||
| <Infobox title="Tip: Create data with Prodigy's new span annotation UI"> | ||||
| 
 | ||||
| <!-- TODO: screenshot --> | ||||
| 
 | ||||
| The upcoming version of our annotation tool [Prodigy](https://prodi.gy) | ||||
| (currently available as a [pre-release](https://support.prodi.gy/t/3861) for all | ||||
| users) features a [new workflow and UI](https://support.prodi.gy/t/3861) for | ||||
| annotating overlapping and nested spans. You can use it to create training data | ||||
| for spaCy's `SpanCategorizer` component. | ||||
| 
 | ||||
| </Infobox> | ||||
| 
 | ||||
| ### Update the entity recognizer with partial incorrect annotations {#negative-samples} | ||||
| 
 | ||||
| > #### config.cfg (excerpt) | ||||
| > | ||||
| > ```ini | ||||
| > [components.ner] | ||||
| > factory = "ner" | ||||
| > incorrect_spans_key = "incorrect_spans" | ||||
| > moves = null | ||||
| > update_with_oracle_cut_size = 100 | ||||
| > ``` | ||||
| 
 | ||||
| The [`EntityRecognizer`](/api/entityrecognizer) can now be updated with known | ||||
| incorrect annotations, which lets you take advantage of partial and sparse data. | ||||
| For example, you'll be able to use the information that certain spans of text | ||||
| are definitely **not** `PERSON` entities, without having to provide the | ||||
| complete-gold standard annotations for the given example. The incorrect span | ||||
| annotations can be added via the [`Doc.spans`](/api/doc#spans) in the training | ||||
| data under the key defined as | ||||
| [`incorrect_spans_key`](/api/entityrecognizer#init) in the component config. | ||||
| 
 | ||||
| <!-- TODO: more details and/or example project? --> | ||||
| 
 | ||||
| ### New pipeline packages for Catalan and Danish {#pipeline-packages} | ||||
| 
 | ||||
| <!-- TODO: intro and update with final numbers --> | ||||
| 
 | ||||
| | Package                                           | Language | Tagger | Parser |  NER | | ||||
| | ------------------------------------------------- | -------- | -----: | -----: | ---: | | ||||
| | [`ca_core_news_sm`](/models/ca#ca_core_news_sm)   | Catalan  |        |        |      | | ||||
| | [`ca_core_news_md`](/models/ca#ca_core_news_md)   | Catalan  |        |        |      | | ||||
| | [`ca_core_news_lg`](/models/ca#ca_core_news_lg)   | Catalan  |        |        |      | | ||||
| | [`ca_core_news_trf`](/models/ca#ca_core_news_trf) | Catalan  |        |        |      | | ||||
| | [`da_core_news_trf`](/models/da#da_core_news_trf) | Danish   |        |        |      | | ||||
| 
 | ||||
| ### Resizable text classification architectures {#resizable-textcat} | ||||
| 
 | ||||
| <!-- TODO: write --> | ||||
| 
 | ||||
| ### CLI command to assemble pipeline from config {#assemble} | ||||
| 
 | ||||
| The [`spacy assemble`](/api/cli#assemble) command lets you assemble a pipeline | ||||
| from a config file without additional training. It can be especially useful for | ||||
| creating a blank pipeline with a custom tokenizer, rule-based components or word | ||||
| vectors. | ||||
| 
 | ||||
| ```cli | ||||
| $ python -m spacy assemble config.cfg ./output | ||||
| ``` | ||||
| 
 | ||||
| ### Support for streaming large or infinite corpora {#streaming-corpora} | ||||
| 
 | ||||
| <!-- TODO: write --> | ||||
| 
 | ||||
| ### New lemmatizers for Catalan and Italian {#pos-lemmatizers} | ||||
| 
 | ||||
| <!-- TODO: write --> | ||||
| 
 | ||||
| ## Notes about upgrading from v3.0 {#upgrading} | ||||
| 
 | ||||
| <!-- TODO: this could just be a bullet-point list mentioning stuff like the spacy_version, vectors initialization etc. --> | ||||
|  | @ -9,7 +9,8 @@ | |||
|                     { "text": "Models & Languages", "url": "/usage/models" }, | ||||
|                     { "text": "Facts & Figures", "url": "/usage/facts-figures" }, | ||||
|                     { "text": "spaCy 101", "url": "/usage/spacy-101" }, | ||||
|                     { "text": "New in v3.0", "url": "/usage/v3" } | ||||
|                     { "text": "New in v3.0", "url": "/usage/v3" }, | ||||
|                     { "text": "New in v3.1", "url": "/usage/v3-1" } | ||||
|                 ] | ||||
|             }, | ||||
|             { | ||||
|  | @ -135,9 +136,7 @@ | |||
|             }, | ||||
|             { | ||||
|                 "label": "Legacy", | ||||
|                 "items": [ | ||||
|                     { "text": "Legacy functions", "url": "/api/legacy" } | ||||
|                 ] | ||||
|                 "items": [{ "text": "Legacy functions", "url": "/api/legacy" }] | ||||
|             } | ||||
|         ] | ||||
|     } | ||||
|  |  | |||
|  | @ -119,8 +119,8 @@ const AlertSpace = ({ nightly, legacy }) => { | |||
| } | ||||
| 
 | ||||
| const navAlert = ( | ||||
|     <Link to="/usage/v3" hidden> | ||||
|         <strong>💥 Out now:</strong> spaCy v3.0 | ||||
|     <Link to="/usage/v3-1" hidden> | ||||
|         <strong>💥 Out now:</strong> spaCy v3.1 | ||||
|     </Link> | ||||
| ) | ||||
| 
 | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user