mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 07:57:35 +03:00 
			
		
		
		
	* Updates/bugfixes for NER/IOB converters
* Converter formats `ner` and `iob` use autodetect to choose a converter if
  possible
* `iob2json` is reverted to handle sentence-per-line data like
  `word1|pos1|ent1 word2|pos2|ent2`
  * Fix bug in `merge_sentences()` so the second sentence in each batch isn't
    skipped
* `conll_ner2json` is made more general so it can handle more formats with
  whitespace-separated columns
  * Supports all formats where the first column is the token and the final
    column is the IOB tag; if present, the second column is the POS tag
  * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O`
    separates documents
  * Add option for segmenting sentences (new flag `-s`)
  * Parser-based sentence segmentation with a provided model, otherwise with
    sentencizer (new option `-b` to specify model)
  * Can group sentences into documents with `n_sents` as long as sentence
    segmentation is available
  * Only applies automatic segmentation when there are no existing delimiters
    in the data
* Provide info about settings applied during conversion with warnings and
  suggestions if settings conflict or might not be not optimal.
* Add tests for common formats
* Add '(default)' back to docs for -c auto
* Add document count back to output
* Revert changes to converter output message
* Use explicit tabs in convert CLI test data
* Adjust/add messages for n_sents=1 default
* Add sample NER data to training examples
* Update README
* Add links in docs to example NER data
* Define msg within converters
		
	
			
		
			
				
	
	
		
			3 lines
		
	
	
		
			746 B
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			3 lines
		
	
	
		
			746 B
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| When|WRB|O Sebastian|NNP|B-PERSON Thrun|NNP|I-PERSON started|VBD|O working|VBG|O on|IN|O self|NN|O -|HYPH|O driving|VBG|O cars|NNS|O at|IN|O Google|NNP|B-ORG in|IN|O 2007|CD|B-DATE ,|,|O few|JJ|O people|NNS|O outside|RB|O of|IN|O the|DT|O company|NN|O took|VBD|O him|PRP|O seriously|RB|O .|.|O
 | ||
| “|''|O I|PRP|O can|MD|O tell|VB|O you|PRP|O very|RB|O senior|JJ|O CEOs|NNS|O of|IN|O major|JJ|O American|JJ|B-NORP car|NN|O companies|NNS|O would|MD|O shake|VB|O my|PRP$|O hand|NN|O and|CC|O turn|VB|O away|RB|O because|IN|O I|PRP|O was|VBD|O n’t|RB|O worth|JJ|O talking|VBG|O to|IN|O ,|,|O ”|''|O said|VBD|O Thrun|NNP|B-PERSON ,|,|O in|IN|O an|DT|O interview|NN|O with|IN|O Recode|NNP|B-ORG earlier|RBR|B-DATE this|DT|I-DATE week|NN|I-DATE .|.|O
 |