spaCy/examples
adrianeboyd 82159b5c19 Updates/bugfixes for NER/IOB converters (#4186)
* Updates/bugfixes for NER/IOB converters

* Converter formats `ner` and `iob` use autodetect to choose a converter if
  possible

* `iob2json` is reverted to handle sentence-per-line data like
  `word1|pos1|ent1 word2|pos2|ent2`

  * Fix bug in `merge_sentences()` so the second sentence in each batch isn't
    skipped

* `conll_ner2json` is made more general so it can handle more formats with
  whitespace-separated columns

  * Supports all formats where the first column is the token and the final
    column is the IOB tag; if present, the second column is the POS tag

  * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O`
    separates documents

  * Add option for segmenting sentences (new flag `-s`)

  * Parser-based sentence segmentation with a provided model, otherwise with
    sentencizer (new option `-b` to specify model)

  * Can group sentences into documents with `n_sents` as long as sentence
    segmentation is available

  * Only applies automatic segmentation when there are no existing delimiters
    in the data

* Provide info about settings applied during conversion with warnings and
  suggestions if settings conflict or might not be not optimal.

* Add tests for common formats

* Add '(default)' back to docs for -c auto

* Add document count back to output

* Revert changes to converter output message

* Use explicit tabs in convert CLI test data

* Adjust/add messages for n_sents=1 default

* Add sample NER data to training examples

* Update README

* Add links in docs to example NER data

* Define msg within converters
2019-08-29 12:04:01 +02:00
..
information_extraction Update example and sign contributor agreement (#3916) 2019-07-08 10:27:20 +02:00
keras_parikh_entailment Merge branch 'master' into develop 2018-12-18 13:48:10 +01:00
notebooks 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
pipeline CLI scripts for entity linking (wikipedia & generic) (#4091) 2019-08-13 15:38:59 +02:00
training Updates/bugfixes for NER/IOB converters (#4186) 2019-08-29 12:04:01 +02:00
deep_learning_keras.py Tidy up references to n_threads and fix default 2019-03-15 16:24:26 +01:00
README.md Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
vectors_fast_text.py Auto-format examples 2018-12-02 04:26:26 +01:00
vectors_tensorboard.py Auto-format examples 2018-12-02 04:26:26 +01:00

spaCy examples

The examples are Python scripts with well-behaved command line interfaces. For more detailed usage guides, see the documentation.

To see the available arguments, you can use the --help or -h flag:

$ python examples/training/train_ner.py --help

While we try to keep the examples up to date, they are not currently exercised by the test suite, as some of them require significant data downloads or take time to train. If you find that an example is no longer running, please tell us! We know there's nothing worse than trying to figure out what you're doing wrong, and it turns out your code was never the problem.