mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-26 05:31:15 +03:00 
			
		
		
		
	Add "New in v3.1" guide
This commit is contained in:
		
							parent
							
								
									caba63b74f
								
							
						
					
					
						commit
						bc93c34f54
					
				|  | @ -82,7 +82,7 @@ shortcut for this and instantiate the component using its string name and | ||||||
| | `moves`                       | A list of transition names. Inferred from the data if set to `None`, which is the default. ~~Optional[List[str]]~~                                                                                                                                  | | | `moves`                       | A list of transition names. Inferred from the data if set to `None`, which is the default. ~~Optional[List[str]]~~                                                                                                                                  | | ||||||
| | _keyword-only_                |                                                                                                                                                                                                                                                     | | | _keyword-only_                |                                                                                                                                                                                                                                                     | | ||||||
| | `update_with_oracle_cut_size` | During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. The model is not very sensitive to this parameter, so you usually won't need to change it. Defaults to `100`. ~~int~~ | | | `update_with_oracle_cut_size` | During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. The model is not very sensitive to this parameter, so you usually won't need to change it. Defaults to `100`. ~~int~~ | | ||||||
| | `incorrect_spans_key`         | Identifies spans that are known to be incorrect entity annotations. The incorrect entity annotations can be stored in the span group, under this key. Defaults to `None`. ~~Optional[str]~~                                                         | | | `incorrect_spans_key`         | Identifies spans that are known to be incorrect entity annotations. The incorrect entity annotations can be stored in the span group in [`Doc.spans`](/api/doc#spans), under this key. Defaults to `None`. ~~Optional[str]~~                        | | ||||||
| 
 | 
 | ||||||
| ## EntityRecognizer.\_\_call\_\_ {#call tag="method"} | ## EntityRecognizer.\_\_call\_\_ {#call tag="method"} | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
							
								
								
									
										114
									
								
								website/docs/usage/v3-1.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										114
									
								
								website/docs/usage/v3-1.md
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,114 @@ | ||||||
|  | --- | ||||||
|  | title: What's New in v3.1 | ||||||
|  | teaser: New features and how to upgrade | ||||||
|  | menu: | ||||||
|  |   - ['New Features', 'features'] | ||||||
|  |   - ['Upgrading Notes', 'upgrading'] | ||||||
|  | --- | ||||||
|  | 
 | ||||||
|  | ## New Features {#features hidden="true"} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: intro --> | ||||||
|  | 
 | ||||||
|  | ### Using predicted annotations during training {#predicted-annotations-training} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: write --> | ||||||
|  | 
 | ||||||
|  | <Project id="pipelines/tagger_parser_predicted_annotations"> | ||||||
|  | 
 | ||||||
|  | This project shows how to use the `token.dep` attribute predicted by the parser | ||||||
|  | as a feature for a subsequent tagger component in the pipeline. | ||||||
|  | 
 | ||||||
|  | </Project> | ||||||
|  | 
 | ||||||
|  | ### SpanCategorizer for predicting arbitrary and overlapping spans {#spancategorizer tag="experimental"} | ||||||
|  | 
 | ||||||
|  | A common task in applied NLP is extracting spans of texts from documents, | ||||||
|  | including longer phrases or nested expressions. Named entity recognition isn't | ||||||
|  | the right tool for this problem, since an entity recognizer typically predicts | ||||||
|  | single token-based tags that are very sensitive to boundaries. This is effective | ||||||
|  | for proper nouns and self-contained expressions, but less useful for other types | ||||||
|  | of phrases or overlapping spans. The new | ||||||
|  | [`SpanCategorizer`](/api/spancategorizer) component and | ||||||
|  | [SpanCategorizer](/api/architectures#spancategorizer) architecture let you label | ||||||
|  | arbitrary and potentially overlapping spans of texts. A span categorizer | ||||||
|  | consists of two parts: a [suggester function](/api/spancategorizer#suggesters) | ||||||
|  | that proposes candidate spans, which may or may not overlap, and a labeler model | ||||||
|  | that predicts zero or more labels for each candidate. The predicted spans are | ||||||
|  | available via the [`Doc.spans`](/api/doc#spans) container. | ||||||
|  | 
 | ||||||
|  | <!-- TODO: example, getting started (init config?), maybe project template --> | ||||||
|  | 
 | ||||||
|  | <Infobox title="Tip: Create data with Prodigy's new span annotation UI"> | ||||||
|  | 
 | ||||||
|  | <!-- TODO: screenshot --> | ||||||
|  | 
 | ||||||
|  | The upcoming version of our annotation tool [Prodigy](https://prodi.gy) | ||||||
|  | (currently available as a [pre-release](https://support.prodi.gy/t/3861) for all | ||||||
|  | users) features a [new workflow and UI](https://support.prodi.gy/t/3861) for | ||||||
|  | annotating overlapping and nested spans. You can use it to create training data | ||||||
|  | for spaCy's `SpanCategorizer` component. | ||||||
|  | 
 | ||||||
|  | </Infobox> | ||||||
|  | 
 | ||||||
|  | ### Update the entity recognizer with partial incorrect annotations {#negative-samples} | ||||||
|  | 
 | ||||||
|  | > #### config.cfg (excerpt) | ||||||
|  | > | ||||||
|  | > ```ini | ||||||
|  | > [components.ner] | ||||||
|  | > factory = "ner" | ||||||
|  | > incorrect_spans_key = "incorrect_spans" | ||||||
|  | > moves = null | ||||||
|  | > update_with_oracle_cut_size = 100 | ||||||
|  | > ``` | ||||||
|  | 
 | ||||||
|  | The [`EntityRecognizer`](/api/entityrecognizer) can now be updated with known | ||||||
|  | incorrect annotations, which lets you take advantage of partial and sparse data. | ||||||
|  | For example, you'll be able to use the information that certain spans of text | ||||||
|  | are definitely **not** `PERSON` entities, without having to provide the | ||||||
|  | complete-gold standard annotations for the given example. The incorrect span | ||||||
|  | annotations can be added via the [`Doc.spans`](/api/doc#spans) in the training | ||||||
|  | data under the key defined as | ||||||
|  | [`incorrect_spans_key`](/api/entityrecognizer#init) in the component config. | ||||||
|  | 
 | ||||||
|  | <!-- TODO: more details and/or example project? --> | ||||||
|  | 
 | ||||||
|  | ### New pipeline packages for Catalan and Danish {#pipeline-packages} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: intro and update with final numbers --> | ||||||
|  | 
 | ||||||
|  | | Package                                           | Language | Tagger | Parser |  NER | | ||||||
|  | | ------------------------------------------------- | -------- | -----: | -----: | ---: | | ||||||
|  | | [`ca_core_news_sm`](/models/ca#ca_core_news_sm)   | Catalan  |        |        |      | | ||||||
|  | | [`ca_core_news_md`](/models/ca#ca_core_news_md)   | Catalan  |        |        |      | | ||||||
|  | | [`ca_core_news_lg`](/models/ca#ca_core_news_lg)   | Catalan  |        |        |      | | ||||||
|  | | [`ca_core_news_trf`](/models/ca#ca_core_news_trf) | Catalan  |        |        |      | | ||||||
|  | | [`da_core_news_trf`](/models/da#da_core_news_trf) | Danish   |        |        |      | | ||||||
|  | 
 | ||||||
|  | ### Resizable text classification architectures {#resizable-textcat} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: write --> | ||||||
|  | 
 | ||||||
|  | ### CLI command to assemble pipeline from config {#assemble} | ||||||
|  | 
 | ||||||
|  | The [`spacy assemble`](/api/cli#assemble) command lets you assemble a pipeline | ||||||
|  | from a config file without additional training. It can be especially useful for | ||||||
|  | creating a blank pipeline with a custom tokenizer, rule-based components or word | ||||||
|  | vectors. | ||||||
|  | 
 | ||||||
|  | ```cli | ||||||
|  | $ python -m spacy assemble config.cfg ./output | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ### Support for streaming large or infinite corpora {#streaming-corpora} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: write --> | ||||||
|  | 
 | ||||||
|  | ### New lemmatizers for Catalan and Italian {#pos-lemmatizers} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: write --> | ||||||
|  | 
 | ||||||
|  | ## Notes about upgrading from v3.0 {#upgrading} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: this could just be a bullet-point list mentioning stuff like the spacy_version, vectors initialization etc. --> | ||||||
|  | @ -9,7 +9,8 @@ | ||||||
|                     { "text": "Models & Languages", "url": "/usage/models" }, |                     { "text": "Models & Languages", "url": "/usage/models" }, | ||||||
|                     { "text": "Facts & Figures", "url": "/usage/facts-figures" }, |                     { "text": "Facts & Figures", "url": "/usage/facts-figures" }, | ||||||
|                     { "text": "spaCy 101", "url": "/usage/spacy-101" }, |                     { "text": "spaCy 101", "url": "/usage/spacy-101" }, | ||||||
|                     { "text": "New in v3.0", "url": "/usage/v3" } |                     { "text": "New in v3.0", "url": "/usage/v3" }, | ||||||
|  |                     { "text": "New in v3.1", "url": "/usage/v3-1" } | ||||||
|                 ] |                 ] | ||||||
|             }, |             }, | ||||||
|             { |             { | ||||||
|  | @ -135,9 +136,7 @@ | ||||||
|             }, |             }, | ||||||
|             { |             { | ||||||
|                 "label": "Legacy", |                 "label": "Legacy", | ||||||
|                 "items": [ |                 "items": [{ "text": "Legacy functions", "url": "/api/legacy" }] | ||||||
|                     { "text": "Legacy functions", "url": "/api/legacy" } |  | ||||||
|                 ] |  | ||||||
|             } |             } | ||||||
|         ] |         ] | ||||||
|     } |     } | ||||||
|  |  | ||||||
|  | @ -119,8 +119,8 @@ const AlertSpace = ({ nightly, legacy }) => { | ||||||
| } | } | ||||||
| 
 | 
 | ||||||
| const navAlert = ( | const navAlert = ( | ||||||
|     <Link to="/usage/v3" hidden> |     <Link to="/usage/v3-1" hidden> | ||||||
|         <strong>💥 Out now:</strong> spaCy v3.0 |         <strong>💥 Out now:</strong> spaCy v3.1 | ||||||
|     </Link> |     </Link> | ||||||
| ) | ) | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user