spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-05 12:21:27 +03:00

History

adrianeboyd 82159b5c19 Updates/bugfixes for NER/IOB converters (#4186 ) * Updates/bugfixes for NER/IOB converters * Converter formats `ner` and `iob` use autodetect to choose a converter if possible * `iob2json` is reverted to handle sentence-per-line data like `word1\|pos1\|ent1 word2\|pos2\|ent2` * Fix bug in `merge_sentences()` so the second sentence in each batch isn't skipped * `conll_ner2json` is made more general so it can handle more formats with whitespace-separated columns * Supports all formats where the first column is the token and the final column is the IOB tag; if present, the second column is the POS tag * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O` separates documents * Add option for segmenting sentences (new flag `-s`) * Parser-based sentence segmentation with a provided model, otherwise with sentencizer (new option `-b` to specify model) * Can group sentences into documents with `n_sents` as long as sentence segmentation is available * Only applies automatic segmentation when there are no existing delimiters in the data * Provide info about settings applied during conversion with warnings and suggestions if settings conflict or might not be not optimal. * Add tests for common formats * Add '(default)' back to docs for -c auto * Add document count back to output * Revert changes to converter output message * Use explicit tabs in convert CLI test data * Adjust/add messages for n_sents=1 default * Add sample NER data to training examples * Update README * Add links in docs to example NER data * Define msg within converters		2019-08-29 12:04:01 +02:00
..
annotation.md	Update annotation docs for German	2019-07-22 11:59:03 +02:00
cli.md	Updates/bugfixes for NER/IOB converters (#4186 )	2019-08-29 12:04:01 +02:00
cython-classes.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
cython-structs.md	Improve Token.prob and Lexeme.prob docs (resolves #3701 )	2019-05-11 15:23:41 +02:00
cython.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
dependencyparser.md	Fix DependencyParser.predict docs (resolves #3561 )	2019-05-11 15:37:54 +02:00
doc.md	Update .tensor docs [ci skip]	2019-08-01 18:37:09 +02:00
entityrecognizer.md	💫 Make serialization methods consistent (#3385 )	2019-03-10 19:16:45 +01:00
entityruler.md	Fix typo [ci skip]	2019-08-20 13:02:05 +02:00
goldcorpus.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
goldparse.md	Corrected imported fucntion (#4062 )	2019-08-01 12:43:36 +02:00
index.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
language.md	💫 Support simple training format in nlp.evaluate and add tests (#4033 )	2019-07-27 17:30:18 +02:00
lemmatizer.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
lexeme.md	Fix lex_id docs (closes #3743 )	2019-05-16 23:15:58 +02:00
matcher.md	Remove n_threads	2019-02-17 22:25:42 +01:00
phrasematcher.md	Remove n_threads	2019-02-17 22:25:42 +01:00
pipeline-functions.md	DOC Fix pipeline functions examples (#4189 )	2019-08-23 19:15:32 +02:00
scorer.md	Update Scorer.ents_per_type	2019-07-10 11:19:28 +02:00
sentencizer.md	💫 Add better and serializable sentencizer (#3471 )	2019-03-23 15:45:02 +01:00
span.md	Document force flag on set_extension (closes #4148 )	2019-08-19 19:22:07 +02:00
stringstore.md	💫 Make serialization methods consistent (#3385 )	2019-03-10 19:16:45 +01:00
tagger.md	💫 Make serialization methods consistent (#3385 )	2019-03-10 19:16:45 +01:00
textcategorizer.md	Fix formatting [ci skip]	2019-03-23 16:45:50 +01:00
token.md	Document force flag on set_extension (closes #4148 )	2019-08-19 19:22:07 +02:00
tokenizer.md	Add links to tokenizer API docs to refer relevant information. (#4064 )	2019-08-01 14:28:38 +02:00
top-level.md	Fix visualizer options linking for displaCy. (#4202 )	2019-08-27 14:04:28 +02:00
vectors.md	Fix website docs for Vectors.from_glove (#3565 )	2019-04-10 15:23:27 +02:00
vocab.md	Document new API [ci skip]	2019-03-11 15:23:53 +01:00