mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-01 00:17:44 +03:00 
			
		
		
		
	* Updates/bugfixes for NER/IOB converters
* Converter formats `ner` and `iob` use autodetect to choose a converter if
  possible
* `iob2json` is reverted to handle sentence-per-line data like
  `word1|pos1|ent1 word2|pos2|ent2`
  * Fix bug in `merge_sentences()` so the second sentence in each batch isn't
    skipped
* `conll_ner2json` is made more general so it can handle more formats with
  whitespace-separated columns
  * Supports all formats where the first column is the token and the final
    column is the IOB tag; if present, the second column is the POS tag
  * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O`
    separates documents
  * Add option for segmenting sentences (new flag `-s`)
  * Parser-based sentence segmentation with a provided model, otherwise with
    sentencizer (new option `-b` to specify model)
  * Can group sentences into documents with `n_sents` as long as sentence
    segmentation is available
  * Only applies automatic segmentation when there are no existing delimiters
    in the data
* Provide info about settings applied during conversion with warnings and
  suggestions if settings conflict or might not be not optimal.
* Add tests for common formats
* Add '(default)' back to docs for -c auto
* Add document count back to output
* Revert changes to converter output message
* Use explicit tabs in convert CLI test data
* Adjust/add messages for n_sents=1 default
* Add sample NER data to training examples
* Update README
* Add links in docs to example NER data
* Define msg within converters
		
	
			
		
			
				
	
	
		
			67 lines
		
	
	
		
			746 B
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			67 lines
		
	
	
		
			746 B
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| When	WRB	O
 | ||
| Sebastian	NNP	B-PERSON
 | ||
| Thrun	NNP	I-PERSON
 | ||
| started	VBD	O
 | ||
| working	VBG	O
 | ||
| on	IN	O
 | ||
| self	NN	O
 | ||
| -	HYPH	O
 | ||
| driving	VBG	O
 | ||
| cars	NNS	O
 | ||
| at	IN	O
 | ||
| Google	NNP	B-ORG
 | ||
| in	IN	O
 | ||
| 2007	CD	B-DATE
 | ||
| ,	,	O
 | ||
| few	JJ	O
 | ||
| people	NNS	O
 | ||
| outside	RB	O
 | ||
| of	IN	O
 | ||
| the	DT	O
 | ||
| company	NN	O
 | ||
| took	VBD	O
 | ||
| him	PRP	O
 | ||
| seriously	RB	O
 | ||
| .	.	O
 | ||
| “	''	O
 | ||
| I	PRP	O
 | ||
| can	MD	O
 | ||
| tell	VB	O
 | ||
| you	PRP	O
 | ||
| very	RB	O
 | ||
| senior	JJ	O
 | ||
| CEOs	NNS	O
 | ||
| of	IN	O
 | ||
| major	JJ	O
 | ||
| American	JJ	B-NORP
 | ||
| car	NN	O
 | ||
| companies	NNS	O
 | ||
| would	MD	O
 | ||
| shake	VB	O
 | ||
| my	PRP$	O
 | ||
| hand	NN	O
 | ||
| and	CC	O
 | ||
| turn	VB	O
 | ||
| away	RB	O
 | ||
| because	IN	O
 | ||
| I	PRP	O
 | ||
| was	VBD	O
 | ||
| n’t	RB	O
 | ||
| worth	JJ	O
 | ||
| talking	VBG	O
 | ||
| to	IN	O
 | ||
| ,	,	O
 | ||
| ”	''	O
 | ||
| said	VBD	O
 | ||
| Thrun	NNP	B-PERSON
 | ||
| ,	,	O
 | ||
| in	IN	O
 | ||
| an	DT	O
 | ||
| interview	NN	O
 | ||
| with	IN	O
 | ||
| Recode	NNP	B-ORG
 | ||
| earlier	RBR	B-DATE
 | ||
| this	DT	I-DATE
 | ||
| week	NN	I-DATE
 | ||
| .	.	O
 |