Commit Graph

4528 Commits

Author SHA1 Message Date
ines
3cef901834 Add tag map for French and Italian 2017-11-04 23:32:51 +01:00
Matthew Honnibal
cfb83c231c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:08:19 +01:00
Matthew Honnibal
d185927998 Undo harmful pickling hacks on Language class 2017-11-04 23:07:03 +01:00
ines
6c15aafebd Fix formatting 2017-11-04 23:07:02 +01:00
Matthew Honnibal
3ca16ddbd4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 00:25:02 +01:00
Matthew Honnibal
e4ec4be948 Fix parser test 2017-11-04 00:23:45 +01:00
Matthew Honnibal
98c29b7912 Add padding vector in parser, to make gradient more correct 2017-11-04 00:23:23 +01:00
ines
5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines
718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal
1e9634691a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 20:21:15 +01:00
Matthew Honnibal
13c8881d2f Expose parser's tok2vec model component 2017-11-03 20:20:59 +01:00
Matthew Honnibal
17c63906f9 Update tensorizer component 2017-11-03 20:20:26 +01:00
Matthew Honnibal
2bf21cbe29 Update model after optimising it instead of waiting 2017-11-03 20:20:01 +01:00
Matthew Honnibal
d6e831bf89 Fix lemmatizer tests 2017-11-03 19:46:34 +01:00
ines
eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines
f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667 Make test less flakey 2017-11-03 14:36:08 +01:00
Matthew Honnibal
7fea845374 Remove print statement 2017-11-03 14:04:51 +01:00
Matthew Honnibal
0a534ae96a Fix test for backprop d_pad 2017-11-03 14:04:16 +01:00
Matthew Honnibal
33bd2428db Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 13:29:56 +01:00
Matthew Honnibal
6681058abd Fix tensor extending in tagger 2017-11-03 13:29:36 +01:00
Matthew Honnibal
bd2cbdfa85 Make Morphology not fail on unknown tags 2017-11-03 13:29:09 +01:00
Matthew Honnibal
c9b118a7e9 Set softmax attr in tagger model 2017-11-03 11:22:01 +01:00
Matthew Honnibal
a5b05f85f0 Set Doc.tensor attribute in parser 2017-11-03 11:21:00 +01:00
Matthew Honnibal
62ed58935a Add Doc.extend_tensor() method 2017-11-03 11:20:31 +01:00
Matthew Honnibal
d6fc39c8a6 Set Doc.tensor from Tagger 2017-11-03 11:20:05 +01:00
Matthew Honnibal
b3264aa5f0 Expose the softmax layer in the tagger model, to allow setting tensors 2017-11-03 11:19:51 +01:00
Matthew Honnibal
c2bbf076a4 Add document length cap for training 2017-11-03 01:54:54 +01:00
Matthew Honnibal
6771780d3f Fix backprop of padding variable 2017-11-03 01:54:34 +01:00
Matthew Honnibal
54a716f2ec Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 00:55:20 +01:00
Matthew Honnibal
260e6ee3fb Improve efficiency of backprop of padding variable 2017-11-03 00:49:11 +01:00
Matthew Honnibal
a22f96c3f1 Add test for backpropagating padding 2017-11-03 00:48:54 +01:00
ines
9baab241b4 Add skeleton language data for Turkish 2017-11-02 16:32:24 +01:00
ines
c6fea3e5f6 Add Romanian and Croatian skeletons (experimental)
Add language data templates to make it easier for others to contribute to the language support
2017-11-01 23:04:28 +01:00
ines
18c859500b Add missing imports 2017-11-01 23:02:51 +01:00
ines
819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines
3af281a334 Update test model name 2017-11-01 23:02:00 +01:00
Matthew Honnibal
b30dd36179 Allow Tagger.add_label() before training 2017-11-01 21:49:24 +01:00
Matthew Honnibal
eca41f0cf6 Fix filename conversion for conllu 2017-11-01 21:26:49 +01:00
Matthew Honnibal
e237472cdc Fix tag and filename conversion for conllu 2017-11-01 21:25:33 +01:00
Matthew Honnibal
b84d99b281 Revert tagger.add_label() changes, to fix model 2017-11-01 21:10:45 +01:00
Matthew Honnibal
f5855e539b Fix tagger model loading 2017-11-01 20:42:36 +01:00
Matthew Honnibal
624644adfe Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 20:26:41 +01:00
ines
5f661a1b3a Remove tensorizer from pre-set pipe_names 2017-11-01 19:48:33 +01:00
Matthew Honnibal
190522efd3 Fix tagger when some tags aren't in Morphology 2017-11-01 19:27:49 +01:00
Matthew Honnibal
e85e31cfbd Fix backprop of d_pad 2017-11-01 19:27:26 +01:00
Matthew Honnibal
759cc79185 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 19:00:19 +01:00
Matthew Honnibal
1ae40b50b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 17:07:02 +01:00
Matthew Honnibal
7ae1aacdb8 Fix add_label methods 2017-11-01 17:06:43 +01:00
ines
8c2260e18c Move span tests to /doc 2017-11-01 16:56:35 +01:00
Matthew Honnibal
2ef7b59eb0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:51:41 +01:00
ines
1d1f91a041 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:49:44 +01:00
ines
9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines
260cb37224 Catch deprecation warning 2017-11-01 16:49:18 +01:00
ines
5914faafbb Fix .merge tests to not use deprecated API 2017-11-01 16:49:11 +01:00
ines
705a4e3e4a Fix formatting 2017-11-01 16:44:08 +01:00
Matthew Honnibal
d17a12c71d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:38:26 +01:00
Matthew Honnibal
9f9439667b Don't create low-data text classifier if no vectors 2017-11-01 16:34:09 +01:00
Matthew Honnibal
e7a9174877 Add add_label methods to Tagger and TextCategorizer 2017-11-01 16:32:44 +01:00
ines
39e0586192 Add deprecated helper
Uses warning to show DeprecationWarning and custom stack trace
2017-11-01 16:32:36 +01:00
Matthew Honnibal
a7bf38bf31 Remove misleading comment on util.get_cuda_stream() 2017-11-01 13:57:25 +01:00
Matthew Honnibal
273e96b63f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 13:27:35 +01:00
Matthew Honnibal
9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal
7e7116cdf7 Fix Doc.to_array when only one string attr provided 2017-11-01 13:26:43 +01:00
Matthew Honnibal
301fb2bb60 Implement Span.n_lefts and Span.n_rights 2017-11-01 13:25:12 +01:00
Matthew Honnibal
c047498f87 Fix vectors test 2017-11-01 13:24:47 +01:00
ines
9a5e7c6fe2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 13:14:45 +01:00
ines
bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines
affd3404ab Remove old model command (now "vocab") 2017-11-01 13:14:03 +01:00
Matthew Honnibal
fdb4b8e456 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 02:07:17 +01:00
Matthew Honnibal
c48dd0e1d3 Fix vector pruning 2017-11-01 02:06:58 +01:00
ines
37e62ab0e2 Update vector meta in meta.json 2017-11-01 01:25:09 +01:00
ines
96b4aef0bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 01:10:53 +01:00
Matthew Honnibal
86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
ines
5683fd65ed Update docstrings 2017-11-01 00:42:39 +01:00
Matthew Honnibal
44bce8e53f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 00:35:16 +01:00
Matthew Honnibal
c16310d156 Update vectors with find method 2017-11-01 00:34:55 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
ines
2ad2f09d12 Update docstrings and simplify most_similar 2017-11-01 00:18:08 +01:00
Jim O'Regan
08b0bfd153 merge 2017-10-31 22:55:59 +00:00
Jim O'Regan
00ecfa5417 Ó, not O 2017-10-31 22:54:42 +00:00
ines
ba2e6c8c6f Update docstrings and formatting 2017-10-31 23:23:34 +01:00
Matthew Honnibal
0de8d213a3
Merge pull request #1475 from explosion/feature/sm-vectors
Improve and simplify Vectors class
2017-10-31 22:59:50 +01:00
Ines Montani
25b1d6cd91
Fix syntax error 2017-10-31 22:36:03 +01:00
Matthew Honnibal
92dc127569 Fix test for Python 3 2017-10-31 22:21:55 +01:00
Jim O'Regan
fe4b10346a replace example sentence until I get around to adding a punctuation.py 2017-10-31 20:24:53 +00:00
Matthew Honnibal
c5799ecc7b Remove print statement 2017-10-31 21:12:33 +01:00
ines
7e424a1804 Don't copy exception dicts if not necessary and tidy up 2017-10-31 21:05:29 +01:00
Matthew Honnibal
c390f2d745 Make it easier to pass explicit no-pruning to vocab 2017-10-31 20:14:47 +01:00
Ines Montani
06c25a8882
Remove comma that caused list to wrap in tuple!
Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)
2017-10-31 20:13:16 +01:00
Matthew Honnibal
d90a22afe6 Fix loading previous vectors models 2017-10-31 19:58:35 +01:00
Ines Montani
147448b65b
Add missing symbols 2017-10-31 19:34:45 +01:00
Matthew Honnibal
997a61557a Add vectors.n_keys property 2017-10-31 19:30:52 +01:00
Matthew Honnibal
8075726838 Restore vector usage in models 2017-10-31 19:21:17 +01:00
Matthew Honnibal
3659a807b0 Remove vector pruning arg from train CLI 2017-10-31 19:21:05 +01:00
Ines Montani
9b0de9fb43
Fix import of symbols (now nested one level lower) 2017-10-31 19:17:58 +01:00
Matthew Honnibal
59203a2e8a Move vector pruning command into spacy vocab cli tool 2017-10-31 19:10:01 +01:00
Matthew Honnibal
77d8f5de9a Revise and simplify Vectors class 2017-10-31 18:25:08 +01:00
Jim O'Regan
d4a8160c36 change quotes 2017-10-31 15:15:44 +00:00
Jim O'Regan
34ca59691b no idea what is wrong here 2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e merge 2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f Fix vector remapping 2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c WIP on vectors fixes 2017-10-31 11:22:56 +01:00
Matthew Honnibal
ce876c551e Fix GPU usage 2017-10-31 02:33:34 +01:00
Matthew Honnibal
7698903617 Fix GPU usage 2017-10-31 02:33:16 +01:00
Matthew Honnibal
368fdb389a WIP on refactoring and fixing vectors 2017-10-31 02:00:26 +01:00
Matthew Honnibal
4e3006cec7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-30 19:44:58 +01:00
Matthew Honnibal
4112a991ec Fix vector pruning 2017-10-30 19:44:40 +01:00
ines
ec657c1ddc Update vocab docs and document Vocab.prune_vectors 2017-10-30 19:35:41 +01:00
ines
803e41bc66 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-30 18:39:51 +01:00
ines
8e02294241 Add vectors to Language.meta 2017-10-30 18:39:48 +01:00
ines
abf8aa05d3 Populate --create-meta defaults from file if available
If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.
2017-10-30 18:39:38 +01:00
ines
ce98fa7934 Fix formatting 2017-10-30 18:38:55 +01:00
ines
98c35d2585 Fix spacy vocab command 2017-10-30 18:38:41 +01:00
Matthew Honnibal
e98451b5f7 Add -prune-vectors argument to spacy.cly.train 2017-10-30 18:00:10 +01:00
Matthew Honnibal
e026b29ea9 Add prune_vectors method to Vocab 2017-10-30 17:59:43 +01:00
Explosion Bot
d0cf12c8c7 Fix off-by-one error in vectors 2017-10-30 16:22:03 +01:00
Explosion Bot
05a1dd570e Fix vocab script 2017-10-30 16:19:22 +01:00
Explosion Bot
b46bdce8d2 Add missing import 2017-10-30 16:18:10 +01:00
Explosion Bot
2d2cc294b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-30 16:15:05 +01:00
Explosion Bot
0fc1209421 Wire up new vocab command 2017-10-30 16:14:50 +01:00
Explosion Bot
aa64031751 Fix clear_vectors() method on Vocab 2017-10-30 16:09:04 +01:00
Explosion Bot
7b56b2f04b Add Vocab.cfg attr, to hold stuff like oov probs 2017-10-30 16:08:50 +01:00
Explosion Bot
ab5d5ed880 Fix vectors.add() 2017-10-30 16:08:09 +01:00
Explosion Bot
41d0f1665a Fix add_attrs for cluster 2017-10-30 16:07:50 +01:00
ines
5453821a9f Update NER annotation scheme
Add note on training data sources and include coarse-grained Wikipedia scheme
2017-10-30 13:53:49 +01:00
Explosion Bot
5ede7cec9b Improve Lexeme.set_attrs method 2017-10-30 11:49:11 +01:00
Explosion Bot
72aea8f105 Update vectors.add() to allow setting keys to rows 2017-10-30 10:03:08 +01:00
Matthew Honnibal
c43cc5361d
Merge pull request #1467 from explosion/feature/better-parser
💫 Bug fixes to parser model (requires retraining)
2017-10-29 02:05:22 +02:00
ines
6c2d8d3b2a Use shortcuts-nightly.json to resolve model shortcuts 2017-10-29 01:28:31 +02:00
Matthew Honnibal
a0c7dabb72 Fix bug in 8-token parser features 2017-10-28 23:01:35 +00:00
Matthew Honnibal
b713d10d97 Switch to 13 features in parser 2017-10-28 23:01:14 +00:00
Matthew Honnibal
3b91097321 Whitespace 2017-10-28 17:05:11 +00:00
Matthew Honnibal
6ef72864fa Improve initialization for hidden layers 2017-10-28 17:05:01 +00:00
Matthew Honnibal
5414e2f14b Use missing features in parser 2017-10-28 16:45:54 +00:00
Matthew Honnibal
df4803cc6d Add learned missing values for parser 2017-10-28 16:45:14 +00:00
Matthew Honnibal
64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
Explosion Bot
fb0c96f39a Fix optimizer loading 2017-10-28 11:58:16 +02:00
Explosion Bot
b22e42af7f Merge changes to parser and _ml 2017-10-28 11:52:10 +02:00
ines
d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines
a8e10f94e4 Tidy up Lexeme and update docs 2017-10-27 21:07:50 +02:00
ines
ba5e646219 Tidy up pipeline 2017-10-27 20:29:08 +02:00
ines
b4d226a3f1 Tidy up syntax 2017-10-27 19:45:57 +02:00
ines
5167a0cce2 Tidy up Vectors and docs 2017-10-27 19:45:19 +02:00
ines
7946464742 Remove spacy.tagger (now in pipeline) 2017-10-27 19:45:04 +02:00
ines
9c89e2cdef Remove unused syntax iterators (now in language data) 2017-10-27 18:09:53 +02:00
ines
d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines
544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines
a6135336f5 Tidy up gold 2017-10-27 17:02:55 +02:00
ines
6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
ines
1a559d4c95 Remove old, unused file 2017-10-27 15:34:35 +02:00
ines
91899d337b Tidy up language, lemmatizer and scorer 2017-10-27 14:40:14 +02:00
ines
778212efea Tidy up init and main 2017-10-27 14:39:51 +02:00
ines
e33b7e0b3c Tidy up parser and ML 2017-10-27 14:39:30 +02:00
ines
e3265998c0 Tidy up displaCy 2017-10-27 14:39:19 +02:00
ines
ea4a41c8fb Tidy up util and helpers 2017-10-27 14:39:09 +02:00
ines
d941fc3667 Tidy up CLI 2017-10-27 14:38:39 +02:00
Matthew Honnibal
531142a933 Merge remote-tracking branch 'origin/develop' into feature/better-parser 2017-10-27 12:34:48 +00:00
Matthew Honnibal
19a2b9bf27 Fix import of Optimizer 2017-10-27 12:33:42 +00:00
Matthew Honnibal
4d048e94d3 Add compat for thinc.neural.optimizers.Optimizer 2017-10-27 10:23:49 +00:00
Ines Montani
4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
75a637fa43 Remove redundant imports from _ml 2017-10-27 10:19:56 +00:00
Matthew Honnibal
c9987cf131 Avoid use of numpy.tensordot 2017-10-27 10:18:36 +00:00
Matthew Honnibal
f6fef30adc Remove dead code from spacy._ml 2017-10-27 10:16:41 +00:00
Matthew Honnibal
b9616419e1 Add try/except around bz2 import 2017-10-27 01:18:05 +00:00
Matthew Honnibal
783c0c8795 Remove unnecessary bz2 import 2017-10-27 01:17:54 +00:00
Matthew Honnibal
bb25bdcd92 Adjust call to scatter_add for the new version 2017-10-27 01:16:55 +00:00
Ines Montani
287a3ca256 Merge pull request #1466 from explosion/feature/rename-pipeline
💫 Clean up dead linear model code
2017-10-27 02:03:28 +02:00
ines
4eb5bd02e7 Update textcat pre-processing after to_array change 2017-10-27 00:32:12 +02:00
ines
2d6ec99884 Set 'model' as default model name to prevent meta.json errors 2017-10-26 16:12:23 +02:00
ines
9e372913e0 Remove old 'SP' condition in tag map 2017-10-26 16:11:57 +02:00
Matthew Honnibal
c52671420c Remove old cfile import 2017-10-26 13:28:19 +02:00
Matthew Honnibal
ea03f1ef64 Remove obsolete cfile code 2017-10-26 13:23:36 +02:00
Matthew Honnibal
90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
ines
6f78e29bed Add LAW entity label to glossary 2017-10-26 13:04:35 +02:00
ines
9bf78d5fb3 Update spacy.explain docs 2017-10-26 13:04:25 +02:00
Matthew Honnibal
33f8c58782 Remove obsolete parser.pyx 2017-10-26 12:42:05 +02:00
Matthew Honnibal
a8abc47811 Rename BaseThincComponent --> Pipe 2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
b6b4f1aaf7 Merge pull request #1462 from explosion/feature/vector-meta-data
💫 Add vector meta data to model meta.json on train/package and show in docs
2017-10-26 11:39:41 +02:00
Matthew Honnibal
35977bdbb9 Update better-parser branch with develop 2017-10-26 00:55:53 +00:00
Ines Montani
090bd00369 Merge pull request #1464 from mayukh18/develop_bengali_pronouns
added the bengali pronouns for v2.0
2017-10-25 21:55:25 +02:00
mayukh18
1bc07758fa added few bengali pronouns 2017-10-25 22:24:40 +05:30
ines
de1e5f35d5 Merge branch 'develop' into feature/disable-pipes 2017-10-25 16:33:12 +02:00
ines
728b609bf9 Merge branch 'develop' into feature/vector-meta-data 2017-10-25 16:32:22 +02:00
ines
c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines
91beacf5e3 Fix Matcher.__contains__ 2017-10-25 16:19:38 +02:00
ines
11e3f19764 Fix vectors data added after training (see #1457) 2017-10-25 16:08:26 +02:00
ines
057954695b Read pipeline and vector data off model in --generate-meta 2017-10-25 16:03:26 +02:00
ines
273e638183 Add vector data to model meta after training (see #1457) 2017-10-25 16:03:05 +02:00
ines
18aae423fb Remove import of non-existing function 2017-10-25 15:54:10 +02:00
ines
5117a7d24d Fix whitespace 2017-10-25 15:54:02 +02:00
ines
657a4d91bc Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:19:05 +02:00
ines
1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
ines
6a00de4f77 Fix check of unexpected pipe names in restore() 2017-10-25 14:56:35 +02:00
ines
7f03932477 Return self on __enter__ 2017-10-25 14:56:16 +02:00
Matthew Honnibal
b5de768852 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-25 14:44:16 +02:00
Matthew Honnibal
094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal
e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00