mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	Add BILUO scheme to annotation docs
This commit is contained in:
		
							parent
							
								
									99b631617d
								
							
						
					
					
						commit
						465a1dd710
					
				| 
						 | 
				
			
			@ -71,6 +71,44 @@ include _annotation/_dep-labels
 | 
			
		|||
 | 
			
		||||
include _annotation/_named-entities
 | 
			
		||||
 | 
			
		||||
+h(3, "biluo") BILUO Scheme
 | 
			
		||||
 | 
			
		||||
p
 | 
			
		||||
    |  spaCy translates character offsets into the BILUO scheme, in order to
 | 
			
		||||
    |  decide the cost of each action given the current state of the entity
 | 
			
		||||
    |  recognizer. The costs are then used to calculate the gradient of the
 | 
			
		||||
    |  loss, to train the model.
 | 
			
		||||
 | 
			
		||||
+aside("Why BILUO, not IOB?")
 | 
			
		||||
    |  There are several coding schemes for encoding entity annotations as
 | 
			
		||||
    |  token tags.  These coding schemes are equally expressive, but not
 | 
			
		||||
    |  necessarily equally learnable.
 | 
			
		||||
    |  #[+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth]
 | 
			
		||||
    |  showed that the minimal #[strong Begin], #[strong In], #[strong Out]
 | 
			
		||||
    |  scheme was more difficult to learn than the #[strong BILUO] scheme that
 | 
			
		||||
    |  we use, which explicitly marks boundary tokens.
 | 
			
		||||
 | 
			
		||||
+table([ "Tag", "Description" ])
 | 
			
		||||
    +row
 | 
			
		||||
        +cell #[code #[span.u-color-theme B] EGIN]
 | 
			
		||||
        +cell The first token of a multi-token entity.
 | 
			
		||||
 | 
			
		||||
    +row
 | 
			
		||||
        +cell #[code #[span.u-color-theme I] N]
 | 
			
		||||
        +cell An inner token of a multi-token entity.
 | 
			
		||||
 | 
			
		||||
    +row
 | 
			
		||||
        +cell #[code #[span.u-color-theme L] AST]
 | 
			
		||||
        +cell The final token of a multi-token entity.
 | 
			
		||||
 | 
			
		||||
    +row
 | 
			
		||||
        +cell #[code #[span.u-color-theme U] NIT]
 | 
			
		||||
        +cell A single-token entity.
 | 
			
		||||
 | 
			
		||||
    +row
 | 
			
		||||
        +cell #[code #[span.u-color-theme O] UT]
 | 
			
		||||
        +cell A non-entity token.
 | 
			
		||||
 | 
			
		||||
+h(2, "json-input") JSON input format for training
 | 
			
		||||
 | 
			
		||||
p
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in New Issue
	
	Block a user