mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-24 20:51:30 +03:00
* Updates/bugfixes for NER/IOB converters
* Converter formats `ner` and `iob` use autodetect to choose a converter if
possible
* `iob2json` is reverted to handle sentence-per-line data like
`word1|pos1|ent1 word2|pos2|ent2`
* Fix bug in `merge_sentences()` so the second sentence in each batch isn't
skipped
* `conll_ner2json` is made more general so it can handle more formats with
whitespace-separated columns
* Supports all formats where the first column is the token and the final
column is the IOB tag; if present, the second column is the POS tag
* As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O`
separates documents
* Add option for segmenting sentences (new flag `-s`)
* Parser-based sentence segmentation with a provided model, otherwise with
sentencizer (new option `-b` to specify model)
* Can group sentences into documents with `n_sents` as long as sentence
segmentation is available
* Only applies automatic segmentation when there are no existing delimiters
in the data
* Provide info about settings applied during conversion with warnings and
suggestions if settings conflict or might not be not optimal.
* Add tests for common formats
* Add '(default)' back to docs for -c auto
* Add document count back to output
* Revert changes to converter output message
* Use explicit tabs in convert CLI test data
* Adjust/add messages for n_sents=1 default
* Add sample NER data to training examples
* Update README
* Add links in docs to example NER data
* Define msg within converters
67 lines
746 B
Plaintext
67 lines
746 B
Plaintext
When WRB O
|
||
Sebastian NNP B-PERSON
|
||
Thrun NNP I-PERSON
|
||
started VBD O
|
||
working VBG O
|
||
on IN O
|
||
self NN O
|
||
- HYPH O
|
||
driving VBG O
|
||
cars NNS O
|
||
at IN O
|
||
Google NNP B-ORG
|
||
in IN O
|
||
2007 CD B-DATE
|
||
, , O
|
||
few JJ O
|
||
people NNS O
|
||
outside RB O
|
||
of IN O
|
||
the DT O
|
||
company NN O
|
||
took VBD O
|
||
him PRP O
|
||
seriously RB O
|
||
. . O
|
||
“ '' O
|
||
I PRP O
|
||
can MD O
|
||
tell VB O
|
||
you PRP O
|
||
very RB O
|
||
senior JJ O
|
||
CEOs NNS O
|
||
of IN O
|
||
major JJ O
|
||
American JJ B-NORP
|
||
car NN O
|
||
companies NNS O
|
||
would MD O
|
||
shake VB O
|
||
my PRP$ O
|
||
hand NN O
|
||
and CC O
|
||
turn VB O
|
||
away RB O
|
||
because IN O
|
||
I PRP O
|
||
was VBD O
|
||
n’t RB O
|
||
worth JJ O
|
||
talking VBG O
|
||
to IN O
|
||
, , O
|
||
” '' O
|
||
said VBD O
|
||
Thrun NNP B-PERSON
|
||
, , O
|
||
in IN O
|
||
an DT O
|
||
interview NN O
|
||
with IN O
|
||
Recode NNP B-ORG
|
||
earlier RBR B-DATE
|
||
this DT I-DATE
|
||
week NN I-DATE
|
||
. . O
|