spaCy/spacy
Matthew Honnibal 2a5a61683e Add function to get train format from Doc objects
Our JSON training format is annoying to work with, and we've wanted to
retire it for some time. In the meantime, we can at least add some
missing functions to make it easier to live with.

This patch adds a function that generates the JSON format from a list
of Doc objects, one per paragraph. This should be a convenient way to handle
a lot of data conversions: whatever format you have the source
information in, you can use it to setup a Doc object. This approach
should offer better future-proofing as well. Hopefully, we can steadily
rewrite code that is sensitive to the current data-format, so that it
instead goes through this function. Then when we change the data format,
we won't have such a problem.
2018-08-14 13:13:10 +02:00
..
cli Update develop from master 2018-08-14 03:04:28 +02:00
data
displacy fix issue #2452 - displacy arrow direction is always forward (#2506) (closes #2452) 2018-07-04 14:12:08 +02:00
lang Update develop from master 2018-08-14 03:04:28 +02:00
syntax Make pipeline work on empty docs 2018-06-29 19:21:38 +02:00
tests Add function to get train format from Doc objects 2018-08-14 13:13:10 +02:00
tokens Update develop from master 2018-08-14 03:04:28 +02:00
__init__.pxd
__init__.py Try again to filter warnings 2018-08-10 00:42:54 +02:00
__main__.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_align.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_matcher2_notes.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_ml.py Only warn about unnamed vectors if non-zero sized. 2018-05-19 18:51:55 +02:00
about.py Update develop from master 2018-08-14 03:04:28 +02:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py 💫 Rule-based NER component (#2513) 2018-07-18 19:43:16 +02:00
errors.py Allow ignoring warnings and only overwrite if set explicitly 2018-07-20 22:50:19 +02:00
glossary.py
gold.pxd
gold.pyx Add function to get train format from Doc objects 2018-08-14 13:13:10 +02:00
language.py 💫 Rule-based NER component (#2513) 2018-07-18 19:43:16 +02:00
lemmatizer.py Fix lemmatization 2018-07-05 13:56:02 +02:00
lexeme.pxd
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
matcher.pyx Fix compile error in matcher 2018-07-06 12:29:23 +02:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd
parts_of_speech.pyx
pipeline.pxd
pipeline.pyx 💫 Rule-based NER component (#2513) 2018-07-18 19:43:16 +02:00
scorer.py Fix scoring if tokenization changes 2018-05-01 01:33:20 +02:00
strings.pxd
strings.pyx 💫 New system for error messages and warnings (#2163) 2018-04-03 15:50:31 +02:00
structs.pxd
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd
tokenizer.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
typedefs.pxd
typedefs.pyx
util.py Merge branch 'master' into develop 2018-07-21 15:34:18 +02:00
vectors.pyx 💫 New system for error messages and warnings (#2163) 2018-04-03 15:50:31 +02:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00