Commit Graph

3117 Commits

Author SHA1 Message Date
Matthew Honnibal
4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
Matthew Honnibal
4803b3b69e Add GoldCorpus class, to manage data streaming 2017-05-21 09:06:17 -05:00
Matthew Honnibal
180e5afede Fix tokvecs flattening in pipeline 2017-05-21 09:05:34 -05:00
Matthew Honnibal
0731971bfc Add itershuffle utility function. Maybe belongs in thinc 2017-05-21 09:05:05 -05:00
ines
2c5cfe8bbf Update docstrings and API docs for StringStore 2017-05-21 14:18:58 +02:00
ines
251346b59f Fix typos and formatting 2017-05-21 14:18:46 +02:00
ines
075f5ff87a Update docstrings and API docs for GoldParse 2017-05-21 13:53:46 +02:00
ines
99b631617d Reformat docstrings 2017-05-21 13:32:15 +02:00
ines
885e82c9b0 Update docstrings and remove deprecated load classmethod 2017-05-21 13:27:52 +02:00
ines
c5a653fa48 Update docstrings and API docs for Tokenizer 2017-05-21 13:18:14 +02:00
ines
f216422ac5 Remove deprecated load classmethod 2017-05-21 13:18:01 +02:00
ines
d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
ines
3871157d84 Update spacy.util documentation 2017-05-21 01:12:09 +02:00
ines
0c6c65aa3c Improve messaging if model linking fails after download 2017-05-21 00:28:37 +02:00
Matthew Honnibal
3b7c108246 Pass tokvecs through as a list, instead of concatenated. Also fix padding 2017-05-20 13:23:32 -05:00
ines
924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
Matthew Honnibal
d52b65aec2 Revert "Move to contiguous buffer for token_ids and d_vectors"
This reverts commit 3ff8c35a79.
2017-05-20 11:26:23 -05:00
ines
27de0834b2 Update docstrings and API docs for Lexeme 2017-05-20 15:13:42 +02:00
ines
7ed8a92ed1 Update docstrings and API docs for Token 2017-05-20 15:13:33 +02:00
ines
4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines
39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00
ines
c00ff257be Update docstrings and API docs for Matcher 2017-05-20 14:26:10 +02:00
ines
790435e51c Update docstrings 2017-05-20 14:05:07 +02:00
ines
f0cc642bb9 Update docstrings and API docs for Vocab 2017-05-20 14:00:41 +02:00
Matthew Honnibal
ce9234f593 Update Matcher API 2017-05-20 13:54:53 +02:00
Matthew Honnibal
b272890a8c Try to move parser to simpler PrecomputedAffine class. Currently broken -- maybe the previous change 2017-05-20 06:40:10 -05:00
ines
e39ad78267 Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
2017-05-20 12:24:40 +02:00
Matthew Honnibal
3ff8c35a79 Move to contiguous buffer for token_ids and d_vectors 2017-05-20 04:17:30 -05:00
Matthew Honnibal
8b04b0af9f Remove freqs from transition_system 2017-05-20 02:20:48 -05:00
Matthew Honnibal
61fe55efba Move EnglishDefaults class out of English 2017-05-20 02:18:19 -05:00
Matthew Honnibal
a1ba20e2b1 Fix over-run on parse_batch 2017-05-19 18:57:30 -05:00
ines
1d4d3d0ecd Add TODO 2017-05-20 01:38:04 +02:00
Matthew Honnibal
7ee1827af0 Disable data caching in parser 2017-05-19 18:17:11 -05:00
Matthew Honnibal
e84de028b5 Remove 'rebatch' op, and remove min-batch cap 2017-05-19 18:16:36 -05:00
Matthew Honnibal
3376d4d6e8 Update the train script, fixing GPU memory leak 2017-05-19 18:15:50 -05:00
Matthew Honnibal
836fe1d880 Update neural net tests 2017-05-19 18:11:29 -05:00
ines
fe5d8819ea Update Matcher docstrings and API docs 2017-05-19 21:47:06 +02:00
Matthew Honnibal
08766240c3 Add incomplete iob converter 2017-05-19 13:27:51 -05:00
Matthew Honnibal
c12ab47a56 Remove state argument in pipeline. Other changes 2017-05-19 13:26:36 -05:00
Matthew Honnibal
66ea9aebe7 Remove the state argument from Language 2017-05-19 13:25:42 -05:00
Matthew Honnibal
09a877886b WIP on iob converter 2017-05-19 13:24:39 -05:00
ines
a804045597 Use is_ancestor instead of deprecated is_ancestor_of 2017-05-19 20:23:40 +02:00
Matthew Honnibal
8d5e6d9f4f Rename no_ner arg to no_entities 2017-05-19 13:23:11 -05:00
ines
e9e62b01b0 Update docstrings and API docs for Token 2017-05-19 18:47:56 +02:00
ines
62ceec4fc6 Update docstrings and API docs for Span 2017-05-19 18:47:46 +02:00
ines
23f9a3ccc8 Update docstrings and API docs for Doc 2017-05-19 18:47:39 +02:00
ines
2c8c9dc0c9 Update docstrings and API docs for Language 2017-05-19 18:47:24 +02:00
ines
0791f0aae6 Update docstrings and API docs for Span class 2017-05-19 00:31:31 +02:00
ines
8455cb1327 Update docstring for Doc.__getitem__ 2017-05-19 00:30:51 +02:00
ines
0fc05e54e4 Document TokenVectorEncoder 2017-05-19 00:00:02 +02:00
ines
b687ad109d Update docstrings and API docs for Doc class 2017-05-18 23:59:44 +02:00
ines
d42bc16868 Update docstrings and API docs for Language class 2017-05-18 23:57:38 +02:00
ines
593361ee3c Update docstrings for Span class 2017-05-18 22:17:41 +02:00
ines
b87066ff10 Update docstrings and API docs for Doc class 2017-05-18 22:17:41 +02:00
Matthew Honnibal
238be0f16a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-18 08:32:22 -05:00
Matthew Honnibal
c214c0decb Improve env_opt reporting 2017-05-18 08:32:03 -05:00
Matthew Honnibal
bbb59e371c Fix GPU evaluation 2017-05-18 08:31:15 -05:00
Matthew Honnibal
c2c825127a Fix use_params and pipe methods 2017-05-18 08:30:59 -05:00
Matthew Honnibal
ca70b08661 Fix GPU training and evaluation 2017-05-18 08:30:33 -05:00
ines
489d2fb4ba Add is_in_jupyter() helper for displaCy (see #1058) 2017-05-18 14:13:14 +02:00
ines
abf0188b0a Move cupy and CudaStream to compat 2017-05-18 14:12:45 +02:00
ines
33decd85b6 Reorganise and explicitly state what's importable 2017-05-18 14:12:31 +02:00
Matthew Honnibal
a438cef8c5 Fix significant bug in feature calculation -- off by 1 2017-05-18 06:21:32 -05:00
Matthew Honnibal
fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal
d2626fdb45 Fix name error in nn parser 2017-05-18 04:31:01 -05:00
Matthew Honnibal
b460533827 Bug fixes to pipeline 2017-05-18 04:29:51 -05:00
Matthew Honnibal
8815507f8e Move SpanishDefaults out of Language class, for pickle 2017-05-18 04:28:51 -05:00
Matthew Honnibal
2713041571 Fix GPU usage in Language 2017-05-18 04:25:19 -05:00
Matthew Honnibal
711ad5edc4 Cache features in doc2feats 2017-05-18 04:22:20 -05:00
Matthew Honnibal
39ea38c4b1 Add option to use gpu to spacy train 2017-05-18 04:21:49 -05:00
Matthew Honnibal
a1d8e420b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-17 08:00:04 -05:00
Matthew Honnibal
edfea3a513 Fix progress bar 2017-05-17 14:59:37 +02:00
Matthew Honnibal
0b7fd67408 Fix style check in displacy 2017-05-17 07:57:24 -05:00
Matthew Honnibal
55dab77de8 Add conversion rule for .conll 2017-05-17 13:13:48 +02:00
Matthew Honnibal
692bd2a186 Bug fix to tagger: wasnt backproping to token vectors 2017-05-17 13:13:14 +02:00
Matthew Honnibal
877f83807f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-17 12:09:29 +02:00
Matthew Honnibal
793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal
3bf4a28d8d Use tag in CoNLL converter, not POS 2017-05-17 12:04:33 +02:00
ines
1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
Matthew Honnibal
c9a5d5d24b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-16 16:22:05 +02:00
Matthew Honnibal
8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal
221b4c1ee8 Fix test for Python 3 2017-05-16 13:06:30 +02:00
Matthew Honnibal
5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal
1d7c18e58a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-15 21:53:47 +02:00
Matthew Honnibal
a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines
98354be150 Only get user_data if it exists on doc 2017-05-15 13:39:47 +02:00
ines
c33bdeb564 Use uppercase for entity types 2017-05-15 01:24:57 +02:00
ines
4aaa607b8d Add xmlns:xlink so SVGs are rendered properly as individual files 2017-05-14 19:54:13 +02:00
ines
9dd13cd76a Update docstrings 2017-05-14 19:30:47 +02:00
ines
a04550605a Add Jupyter notebook support (see #1058) 2017-05-14 18:39:01 +02:00
ines
c31792aaec Add displaCy visualisers (see #1058) 2017-05-14 17:50:23 +02:00
ines
b462076d80 Merge load_lang_class and get_lang_class 2017-05-14 01:31:10 +02:00
ines
36bebe7164 Update docstrings 2017-05-14 01:30:29 +02:00
Matthew Honnibal
4b9d69f428 Merge branch 'v2' into develop
* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module

Currently the two parsers live side-by-side, until we figure out how to
organize them.
2017-05-14 01:10:23 +02:00
Matthew Honnibal
5cac951a16 Move new parser to nn_parser.pyx, and restore old parser, to make tests pass. 2017-05-14 00:55:01 +02:00
Matthew Honnibal
f8c02b4341 Remove cupy imports from parser, so it can work on CPU 2017-05-14 00:37:53 +02:00
Matthew Honnibal
613ba79e2e Fiddle with sizings for parser 2017-05-13 17:20:23 -05:00
Matthew Honnibal
e6d71e1778 Small fixes to parser 2017-05-13 17:19:04 -05:00
Matthew Honnibal
188c0f6949 Clean up unused import 2017-05-13 17:18:27 -05:00
Matthew Honnibal
f85c8464f7 Draft support of regression loss in parser 2017-05-13 17:17:27 -05:00