mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	Add infobox
This commit is contained in:
		
							parent
							
								
									114cb18892
								
							
						
					
					
						commit
						1d5ff3e455
					
				| 
						 | 
				
			
			@ -1019,6 +1019,15 @@ above:
 | 
			
		|||
- The dictionary `b2a_multi` shows that there are no tokens in `spacy_tokens`
 | 
			
		||||
  that map to multiple tokens in `other_tokens`.
 | 
			
		||||
 | 
			
		||||
<Infobox title="Important note" variant="warning">
 | 
			
		||||
 | 
			
		||||
The current implementation of the alignment algorithm assumes that both
 | 
			
		||||
tokenizations add up to the same string. For example, you'll be able to align
 | 
			
		||||
`["I", "'", "m"]` and `["I", "'m"]`, which both add up to `"I'm"`, but not
 | 
			
		||||
`["I", "'m"]` and `["I", "am"]`.
 | 
			
		||||
 | 
			
		||||
</Infobox>
 | 
			
		||||
 | 
			
		||||
## Merging and splitting {#retokenization new="2.1"}
 | 
			
		||||
 | 
			
		||||
The [`Doc.retokenize`](/api/doc#retokenize) context manager lets you merge and
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in New Issue
	
	Block a user