mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 07:57:35 +03:00 
			
		
		
		
	| * Updates/bugfixes for NER/IOB converters
* Converter formats `ner` and `iob` use autodetect to choose a converter if
  possible
* `iob2json` is reverted to handle sentence-per-line data like
  `word1|pos1|ent1 word2|pos2|ent2`
  * Fix bug in `merge_sentences()` so the second sentence in each batch isn't
    skipped
* `conll_ner2json` is made more general so it can handle more formats with
  whitespace-separated columns
  * Supports all formats where the first column is the token and the final
    column is the IOB tag; if present, the second column is the POS tag
  * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O`
    separates documents
  * Add option for segmenting sentences (new flag `-s`)
  * Parser-based sentence segmentation with a provided model, otherwise with
    sentencizer (new option `-b` to specify model)
  * Can group sentences into documents with `n_sents` as long as sentence
    segmentation is available
  * Only applies automatic segmentation when there are no existing delimiters
    in the data
* Provide info about settings applied during conversion with warnings and
  suggestions if settings conflict or might not be not optimal.
* Add tests for common formats
* Add '(default)' back to docs for -c auto
* Add document count back to output
* Revert changes to converter output message
* Use explicit tabs in convert CLI test data
* Adjust/add messages for n_sents=1 default
* Add sample NER data to training examples
* Update README
* Add links in docs to example NER data
* Define msg within converters | ||
|---|---|---|
| .. | ||
| annotation.md | ||
| cli.md | ||
| cython-classes.md | ||
| cython-structs.md | ||
| cython.md | ||
| dependencyparser.md | ||
| doc.md | ||
| entityrecognizer.md | ||
| entityruler.md | ||
| goldcorpus.md | ||
| goldparse.md | ||
| index.md | ||
| language.md | ||
| lemmatizer.md | ||
| lexeme.md | ||
| matcher.md | ||
| phrasematcher.md | ||
| pipeline-functions.md | ||
| scorer.md | ||
| sentencizer.md | ||
| span.md | ||
| stringstore.md | ||
| tagger.md | ||
| textcategorizer.md | ||
| token.md | ||
| tokenizer.md | ||
| top-level.md | ||
| vectors.md | ||
| vocab.md | ||