Matthew Honnibal
c647a0d33e
Fix training counter for gold preprocessing
2017-06-03 14:33:39 -05:00
Matthew Honnibal
e62f46d39f
Clarify gold.pyx slightly
2017-06-03 13:28:52 -05:00
Matthew Honnibal
be4a640f0c
Fix arc eager label costs for uint64
2017-05-30 20:37:58 +02:00
Matthew Honnibal
84e66ca6d4
WIP on stringstore change. 27 failures
2017-05-28 14:06:40 +02:00
Matthew Honnibal
d06f235fc9
Fix conflict on convert.py
2017-05-26 11:33:29 -05:00
Matthew Honnibal
2e587c6417
Export iob_to_biluo utility
2017-05-26 11:32:55 -05:00
Matthew Honnibal
daac3e3573
Always shuffle gold data, and support length cap
2017-05-26 11:30:52 -05:00
Matthew Honnibal
3a6e59cc53
Add minibatch function in spacy.gold
2017-05-25 17:15:09 -05:00
Matthew Honnibal
3959d778ac
Revert "Revert "WIP on improving parser efficiency""
...
This reverts commit 532afef4a8
.
2017-05-23 03:06:53 -05:00
Matthew Honnibal
532afef4a8
Revert "WIP on improving parser efficiency"
...
This reverts commit bdaac7ab44
.
2017-05-23 03:05:25 -05:00
Matthew Honnibal
bdaac7ab44
WIP on improving parser efficiency
2017-05-23 02:59:31 -05:00
Matthew Honnibal
c9760b2104
Support sentence limits in GoldCorpus
2017-05-22 10:40:46 -05:00
ines
54f04a9fe0
Update API docs with changes in spacy.gold and spacy.language
2017-05-22 12:29:30 +02:00
Matthew Honnibal
2a5eb9f61e
Make nonproj methods top-level functions, instead of class methods
2017-05-22 04:51:08 -05:00
Matthew Honnibal
025d9bbc37
Fix handling of non-projective deps
2017-05-22 04:51:08 -05:00
Matthew Honnibal
f13d6c7359
Support gold preprocessing and single gold files
2017-05-22 04:51:08 -05:00
Matthew Honnibal
5db89053aa
Merge docstrings
2017-05-21 13:46:23 -05:00
Matthew Honnibal
432b3499b3
Fix memory leak
2017-05-21 13:38:46 -05:00
Matthew Honnibal
4803b3b69e
Add GoldCorpus class, to manage data streaming
2017-05-21 09:06:17 -05:00
ines
075f5ff87a
Update docstrings and API docs for GoldParse
2017-05-21 13:53:46 +02:00
Matthew Honnibal
fc8d3a112c
Add util.env_opt support: Can set hyper params through environment variables.
2017-05-18 04:36:53 -05:00
Matthew Honnibal
793430aa7a
Get spaCy train command working with neural network
...
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal
89a4f262fc
Fix training methods
2017-04-16 13:00:37 -05:00
ines
e1efd589c3
Fix json imports and use ujson
2017-04-15 12:13:34 +02:00
ines
958b12dec8
Use pathlib instead of os.path
2017-04-15 12:13:00 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Raphaël Bournhonesque
f332bf05be
Remove unused import statements
2017-03-21 21:08:54 +01:00
Matthew Honnibal
2611ac2a89
Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens
2017-03-16 09:38:28 -05:00
Matthew Honnibal
3d4e389d23
Whitespace
2017-03-15 09:29:42 -05:00
Matthew Honnibal
159e8c46e1
Merge old training fixes with newer state
2016-11-25 09:16:36 -06:00
Matthew Honnibal
cc7e607a8a
Fix gold.pyx for 1.0
2016-11-25 08:57:59 -06:00
Matthew Honnibal
b86f8af0c1
Fix doc strings
2016-11-01 12:25:36 +01:00
Matthew Honnibal
f5fe4f595b
Fix json loading, for Python 3.
2016-10-20 21:23:26 +02:00
Matthew Honnibal
52b48b415e
Fix GoldParse class
2016-10-16 11:41:36 +02:00
Matthew Honnibal
0317cea0ad
Fix GoldParse
2016-10-15 23:55:07 +02:00
Matthew Honnibal
a48aa15384
Improve the API for the GoldParse class.
2016-10-15 23:53:29 +02:00
Matthew Honnibal
e07fe92b27
Draft a refactored init for the GoldParse class
2016-10-15 22:09:52 +02:00
Matthew Honnibal
86ae665c78
Add function for entity->biluo transformation
2016-10-15 21:51:04 +02:00
Matthew Honnibal
645d99523a
Move merge_sents method into spacy.gold
2016-10-13 03:24:29 +02:00
Matthew Honnibal
ea23b64cc8
Refactor training, with new spacy.train module. Defaults still a little awkward.
2016-10-09 12:24:24 +02:00
Wolfgang Seeker
b6b96b233c
don't require read_json_file to expect particular annotations
2016-05-02 15:29:30 +02:00
Wolfgang Seeker
4d7f393fae
don't require json-files to have syntactic annotation
2016-04-22 16:32:27 +02:00
Henning Peters
6215272786
remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels
2016-04-12 11:28:07 +02:00
Wolfgang Seeker
690c5acabf
adjust train.py to train both english and german models
2016-03-03 15:21:00 +01:00
Wolfgang Seeker
3448cb40a4
integrated pseudo-projective parsing into parser
...
- nonproj.pyx holds a class PseudoProjectivity which currently holds
all functionality to implement Nivre & Nilsson 2005's pseudo-projective
parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
structures
2016-03-01 10:09:08 +01:00
Wolfgang Seeker
4b2297d5d4
add class PseudoProjective for pseudo-projective parsing
...
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Wolfgang Seeker
8d531c958b
replace tests for non-projectivity
...
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Matthew Honnibal
83dccf0fd7
* Use io module insteads of deprecated codecs module
2015-10-10 14:13:01 +11:00
alvations
8caedba42a
caught more codecs.open -> io.open
2015-09-30 20:20:09 +02:00
Matthew Honnibal
7606d9936f
* Python3 correction for GoldParse
2015-07-28 14:44:53 +02:00
Matthew Honnibal
f4809e562f
* Allow json to be used as a fallback if ujson is not available
2015-07-25 18:11:36 +02:00
Matthew Honnibal
2ae0b439b2
* Fix space check in gold.pyx
2015-07-14 00:10:27 +02:00
Matthew Honnibal
89a91ad726
* Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity
2015-07-09 13:30:41 +02:00
Matthew Honnibal
43ef5ddea5
* Ensure root albel is spelled ROOT, for backwards compatibility
2015-06-23 04:14:03 +02:00
Matthew Honnibal
46fb24e9fd
* Add cycle-checking code in gold.pyx
2015-06-23 00:02:22 +02:00
Matthew Honnibal
b643cb3d5c
* Allow training documents to be filtered in gold.pyx
2015-06-12 02:42:08 +02:00
Matthew Honnibal
00a0dfcb59
* Avoid shipping the spacy.munge package
2015-06-08 00:54:13 +02:00
Matthew Honnibal
89b8775887
* Fix output from _min_edit_path when inputs match.
2015-06-06 05:58:53 +02:00
Matthew Honnibal
ae653b850a
* Remove unused import from gold.pyx
2015-06-03 06:07:15 +02:00
Matthew Honnibal
a513ec500f
* Have oracle functions take a struct instead of a Python object
2015-06-02 20:01:06 +02:00
Matthew Honnibal
87d6551d19
* Allow gold parse to cut non-projective arcs
2015-05-31 01:11:56 +02:00
Matthew Honnibal
9e39a206da
* Fix efficiency of JSON reading, by using ujson instead of stream
2015-05-30 17:54:52 +02:00
Matthew Honnibal
76300bbb1b
* Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag.
2015-05-30 01:25:46 +02:00
Matthew Honnibal
b76bbbd12c
* Read json files recursively from a directory, instead of requiring a single .json file
2015-05-29 03:52:55 +02:00
Matthew Honnibal
7a2725bca4
* Read input json in a streaming way
2015-05-27 19:13:11 +02:00
Matthew Honnibal
6016ee83a6
* Fix reading of NER in gold.pyx
2015-05-27 03:17:50 +02:00
Matthew Honnibal
3593babd35
* Add functions for Levenshtein distance alignment
2015-05-24 21:50:48 +02:00
Matthew Honnibal
fc75210941
* Move spacy.syntax.conll to spacy.gold
2015-05-24 21:35:02 +02:00