spaCy/spacy
Matthew Honnibal 2a5a61683e Add function to get train format from Doc objects
Our JSON training format is annoying to work with, and we've wanted to
retire it for some time. In the meantime, we can at least add some
missing functions to make it easier to live with.

This patch adds a function that generates the JSON format from a list
of Doc objects, one per paragraph. This should be a convenient way to handle
a lot of data conversions: whatever format you have the source
information in, you can use it to setup a Doc object. This approach
should offer better future-proofing as well. Hopefully, we can steadily
rewrite code that is sensitive to the current data-format, so that it
instead goes through this function. Then when we change the data format,
we won't have such a problem.
2018-08-14 13:13:10 +02:00
..
cli Update develop from master 2018-08-14 03:04:28 +02:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy fix issue #2452 - displacy arrow direction is always forward (#2506) (closes #2452) 2018-07-04 14:12:08 +02:00
lang Update develop from master 2018-08-14 03:04:28 +02:00
syntax Make pipeline work on empty docs 2018-06-29 19:21:38 +02:00
tests Add function to get train format from Doc objects 2018-08-14 13:13:10 +02:00
tokens Update develop from master 2018-08-14 03:04:28 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Try again to filter warnings 2018-08-10 00:42:54 +02:00
__main__.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_align.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_matcher2_notes.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_ml.py Only warn about unnamed vectors if non-zero sized. 2018-05-19 18:51:55 +02:00
about.py Update develop from master 2018-08-14 03:04:28 +02:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py 💫 Rule-based NER component (#2513) 2018-07-18 19:43:16 +02:00
errors.py Allow ignoring warnings and only overwrite if set explicitly 2018-07-20 22:50:19 +02:00
glossary.py Fix typo in glossary (resolves #1964) 2018-02-10 11:58:41 +01:00
gold.pxd Add support for sent_start to GoldParse 2017-08-25 20:03:14 -05:00
gold.pyx Add function to get train format from Doc objects 2018-08-14 13:13:10 +02:00
language.py 💫 Rule-based NER component (#2513) 2018-07-18 19:43:16 +02:00
lemmatizer.py Fix lemmatization 2018-07-05 13:56:02 +02:00
lexeme.pxd WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
matcher.pyx Fix compile error in matcher 2018-07-06 12:29:23 +02:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
pipeline.pxd Fix names of pipeline components 2017-10-26 12:38:23 +02:00
pipeline.pyx 💫 Rule-based NER component (#2513) 2018-07-18 19:43:16 +02:00
scorer.py Fix scoring if tokenization changes 2018-05-01 01:33:20 +02:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx 💫 New system for error messages and warnings (#2163) 2018-04-03 15:50:31 +02:00
structs.pxd Make TokenC.sent_tart an int, to allow ternary value 2017-10-08 19:58:54 +02:00
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
tokenizer.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge branch 'master' into develop 2018-07-21 15:34:18 +02:00
vectors.pyx 💫 New system for error messages and warnings (#2163) 2018-04-03 15:50:31 +02:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00