Matthew Honnibal
e777ea25bb
Merge pull request #1492 from uwol/develop
...
TextCategorizer return parameter fix
2017-11-05 14:13:04 +01:00
Matthew Honnibal
0d4bd6414e
Fix Italian tag map
2017-11-05 14:11:03 +01:00
ines
ef597622a6
Add Portuguese tag map
2017-11-05 13:58:34 +01:00
ines
793c62dfda
Add Dutch tag map
2017-11-05 13:48:07 +01:00
ines
f7485a09c8
Fix Italian tag map
2017-11-05 13:12:58 +01:00
uwol
a2162b8908
tensorizer return parameter fix
2017-11-05 12:25:10 +01:00
ines
0a27afbf86
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 23:32:52 +01:00
ines
3cef901834
Add tag map for French and Italian
2017-11-04 23:32:51 +01:00
Matthew Honnibal
cfb83c231c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 23:08:19 +01:00
Matthew Honnibal
d185927998
Undo harmful pickling hacks on Language class
2017-11-04 23:07:03 +01:00
ines
6c15aafebd
Fix formatting
2017-11-04 23:07:02 +01:00
Matthew Honnibal
3ca16ddbd4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 00:25:02 +01:00
Matthew Honnibal
e4ec4be948
Fix parser test
2017-11-04 00:23:45 +01:00
Matthew Honnibal
98c29b7912
Add padding vector in parser, to make gradient more correct
2017-11-04 00:23:23 +01:00
ines
5e7d98f72a
Remove test for #1491
2017-11-03 22:10:57 +01:00
ines
718f1c50fb
Add regression test for #1491
2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5
Back-off to tensor for similarity if no vectors
2017-11-03 20:56:33 +01:00
Matthew Honnibal
1e9634691a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 20:21:15 +01:00
Matthew Honnibal
13c8881d2f
Expose parser's tok2vec model component
2017-11-03 20:20:59 +01:00
Matthew Honnibal
17c63906f9
Update tensorizer component
2017-11-03 20:20:26 +01:00
Matthew Honnibal
2bf21cbe29
Update model after optimising it instead of waiting
2017-11-03 20:20:01 +01:00
Matthew Honnibal
d6e831bf89
Fix lemmatizer tests
2017-11-03 19:46:34 +01:00
ines
eef930c73e
Assert instead of print
2017-11-03 18:50:57 +01:00
ines
f0986df94b
Add test for #1488 (passes on v2.0.0a18?)
2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667
Make test less flakey
2017-11-03 14:36:08 +01:00
Matthew Honnibal
7fea845374
Remove print statement
2017-11-03 14:04:51 +01:00
Matthew Honnibal
0a534ae96a
Fix test for backprop d_pad
2017-11-03 14:04:16 +01:00
Matthew Honnibal
33bd2428db
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 13:29:56 +01:00
Matthew Honnibal
6681058abd
Fix tensor extending in tagger
2017-11-03 13:29:36 +01:00
Matthew Honnibal
bd2cbdfa85
Make Morphology not fail on unknown tags
2017-11-03 13:29:09 +01:00
Matthew Honnibal
c9b118a7e9
Set softmax attr in tagger model
2017-11-03 11:22:01 +01:00
Matthew Honnibal
a5b05f85f0
Set Doc.tensor attribute in parser
2017-11-03 11:21:00 +01:00
Matthew Honnibal
62ed58935a
Add Doc.extend_tensor() method
2017-11-03 11:20:31 +01:00
Matthew Honnibal
d6fc39c8a6
Set Doc.tensor from Tagger
2017-11-03 11:20:05 +01:00
Matthew Honnibal
b3264aa5f0
Expose the softmax layer in the tagger model, to allow setting tensors
2017-11-03 11:19:51 +01:00
Matthew Honnibal
c2bbf076a4
Add document length cap for training
2017-11-03 01:54:54 +01:00
Matthew Honnibal
6771780d3f
Fix backprop of padding variable
2017-11-03 01:54:34 +01:00
Matthew Honnibal
54a716f2ec
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 00:55:20 +01:00
Matthew Honnibal
260e6ee3fb
Improve efficiency of backprop of padding variable
2017-11-03 00:49:11 +01:00
Matthew Honnibal
a22f96c3f1
Add test for backpropagating padding
2017-11-03 00:48:54 +01:00
ines
9baab241b4
Add skeleton language data for Turkish
2017-11-02 16:32:24 +01:00
ines
c6fea3e5f6
Add Romanian and Croatian skeletons (experimental)
...
Add language data templates to make it easier for others to contribute to the language support
2017-11-01 23:04:28 +01:00
ines
18c859500b
Add missing imports
2017-11-01 23:02:51 +01:00
ines
819e30a26e
Tidy up tokenizer exceptions
2017-11-01 23:02:45 +01:00
ines
3af281a334
Update test model name
2017-11-01 23:02:00 +01:00
Matthew Honnibal
b30dd36179
Allow Tagger.add_label() before training
2017-11-01 21:49:24 +01:00
Matthew Honnibal
eca41f0cf6
Fix filename conversion for conllu
2017-11-01 21:26:49 +01:00
Matthew Honnibal
e237472cdc
Fix tag and filename conversion for conllu
2017-11-01 21:25:33 +01:00
Matthew Honnibal
b84d99b281
Revert tagger.add_label() changes, to fix model
2017-11-01 21:10:45 +01:00
Matthew Honnibal
f5855e539b
Fix tagger model loading
2017-11-01 20:42:36 +01:00
Matthew Honnibal
624644adfe
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 20:26:41 +01:00
ines
5f661a1b3a
Remove tensorizer from pre-set pipe_names
2017-11-01 19:48:33 +01:00
Matthew Honnibal
190522efd3
Fix tagger when some tags aren't in Morphology
2017-11-01 19:27:49 +01:00
Matthew Honnibal
e85e31cfbd
Fix backprop of d_pad
2017-11-01 19:27:26 +01:00
Matthew Honnibal
759cc79185
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 19:00:19 +01:00
Matthew Honnibal
1ae40b50b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 17:07:02 +01:00
Matthew Honnibal
7ae1aacdb8
Fix add_label methods
2017-11-01 17:06:43 +01:00
ines
8c2260e18c
Move span tests to /doc
2017-11-01 16:56:35 +01:00
Matthew Honnibal
2ef7b59eb0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:51:41 +01:00
ines
1d1f91a041
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:49:44 +01:00
ines
9659391944
Update deprecated methods and add warnings
2017-11-01 16:49:42 +01:00
ines
260cb37224
Catch deprecation warning
2017-11-01 16:49:18 +01:00
ines
5914faafbb
Fix .merge tests to not use deprecated API
2017-11-01 16:49:11 +01:00
ines
705a4e3e4a
Fix formatting
2017-11-01 16:44:08 +01:00
Matthew Honnibal
d17a12c71d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:38:26 +01:00
Matthew Honnibal
9f9439667b
Don't create low-data text classifier if no vectors
2017-11-01 16:34:09 +01:00
Matthew Honnibal
e7a9174877
Add add_label methods to Tagger and TextCategorizer
2017-11-01 16:32:44 +01:00
ines
39e0586192
Add deprecated helper
...
Uses warning to show DeprecationWarning and custom stack trace
2017-11-01 16:32:36 +01:00
Matthew Honnibal
a7bf38bf31
Remove misleading comment on util.get_cuda_stream()
2017-11-01 13:57:25 +01:00
Matthew Honnibal
273e96b63f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 13:27:35 +01:00
Matthew Honnibal
9e0ebee81c
Add Token.is_sent_start property, so can deprecate Token.sent_start
2017-11-01 13:27:14 +01:00
Matthew Honnibal
7e7116cdf7
Fix Doc.to_array when only one string attr provided
2017-11-01 13:26:43 +01:00
Matthew Honnibal
301fb2bb60
Implement Span.n_lefts and Span.n_rights
2017-11-01 13:25:12 +01:00
Matthew Honnibal
c047498f87
Fix vectors test
2017-11-01 13:24:47 +01:00
ines
9a5e7c6fe2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 13:14:45 +01:00
ines
bfe17b7df1
Fix begin_training if get_gold_tuples is None
2017-11-01 13:14:31 +01:00
ines
affd3404ab
Remove old model command (now "vocab")
2017-11-01 13:14:03 +01:00
Matthew Honnibal
fdb4b8e456
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 02:07:17 +01:00
Matthew Honnibal
c48dd0e1d3
Fix vector pruning
2017-11-01 02:06:58 +01:00
ines
37e62ab0e2
Update vector meta in meta.json
2017-11-01 01:25:09 +01:00
ines
96b4aef0bf
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 01:10:53 +01:00
Matthew Honnibal
86eba61fae
Fix token.vector when vectors are missing
2017-11-01 00:47:35 +01:00
ines
5683fd65ed
Update docstrings
2017-11-01 00:42:39 +01:00
Matthew Honnibal
44bce8e53f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 00:35:16 +01:00
Matthew Honnibal
c16310d156
Update vectors with find method
2017-11-01 00:34:55 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
...
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
ines
2ad2f09d12
Update docstrings and simplify most_similar
2017-11-01 00:18:08 +01:00
Jim O'Regan
08b0bfd153
merge
2017-10-31 22:55:59 +00:00
Jim O'Regan
00ecfa5417
Ó, not O
2017-10-31 22:54:42 +00:00
ines
ba2e6c8c6f
Update docstrings and formatting
2017-10-31 23:23:34 +01:00
Matthew Honnibal
0de8d213a3
Merge pull request #1475 from explosion/feature/sm-vectors
...
Improve and simplify Vectors class
2017-10-31 22:59:50 +01:00
Ines Montani
25b1d6cd91
Fix syntax error
2017-10-31 22:36:03 +01:00
Matthew Honnibal
92dc127569
Fix test for Python 3
2017-10-31 22:21:55 +01:00
Jim O'Regan
fe4b10346a
replace example sentence until I get around to adding a punctuation.py
2017-10-31 20:24:53 +00:00
Matthew Honnibal
c5799ecc7b
Remove print statement
2017-10-31 21:12:33 +01:00
ines
7e424a1804
Don't copy exception dicts if not necessary and tidy up
2017-10-31 21:05:29 +01:00
Matthew Honnibal
c390f2d745
Make it easier to pass explicit no-pruning to vocab
2017-10-31 20:14:47 +01:00
Ines Montani
06c25a8882
Remove comma that caused list to wrap in tuple!
...
Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)
2017-10-31 20:13:16 +01:00
Matthew Honnibal
d90a22afe6
Fix loading previous vectors models
2017-10-31 19:58:35 +01:00
Ines Montani
147448b65b
Add missing symbols
2017-10-31 19:34:45 +01:00
Matthew Honnibal
997a61557a
Add vectors.n_keys property
2017-10-31 19:30:52 +01:00
Matthew Honnibal
8075726838
Restore vector usage in models
2017-10-31 19:21:17 +01:00
Matthew Honnibal
3659a807b0
Remove vector pruning arg from train CLI
2017-10-31 19:21:05 +01:00
Ines Montani
9b0de9fb43
Fix import of symbols (now nested one level lower)
2017-10-31 19:17:58 +01:00
Matthew Honnibal
59203a2e8a
Move vector pruning command into spacy vocab cli tool
2017-10-31 19:10:01 +01:00
Matthew Honnibal
77d8f5de9a
Revise and simplify Vectors class
2017-10-31 18:25:08 +01:00
Jim O'Regan
d4a8160c36
change quotes
2017-10-31 15:15:44 +00:00
Jim O'Regan
34ca59691b
no idea what is wrong here
2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e
merge
2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f
Fix vector remapping
2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
ce876c551e
Fix GPU usage
2017-10-31 02:33:34 +01:00
Matthew Honnibal
7698903617
Fix GPU usage
2017-10-31 02:33:16 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Matthew Honnibal
4e3006cec7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 19:44:58 +01:00
Matthew Honnibal
4112a991ec
Fix vector pruning
2017-10-30 19:44:40 +01:00
ines
ec657c1ddc
Update vocab docs and document Vocab.prune_vectors
2017-10-30 19:35:41 +01:00
ines
803e41bc66
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 18:39:51 +01:00
ines
8e02294241
Add vectors to Language.meta
2017-10-30 18:39:48 +01:00
ines
abf8aa05d3
Populate --create-meta defaults from file if available
...
If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.
2017-10-30 18:39:38 +01:00
ines
ce98fa7934
Fix formatting
2017-10-30 18:38:55 +01:00
ines
98c35d2585
Fix spacy vocab command
2017-10-30 18:38:41 +01:00
Matthew Honnibal
e98451b5f7
Add -prune-vectors argument to spacy.cly.train
2017-10-30 18:00:10 +01:00
Matthew Honnibal
e026b29ea9
Add prune_vectors method to Vocab
2017-10-30 17:59:43 +01:00
Explosion Bot
d0cf12c8c7
Fix off-by-one error in vectors
2017-10-30 16:22:03 +01:00
Explosion Bot
05a1dd570e
Fix vocab script
2017-10-30 16:19:22 +01:00
Explosion Bot
b46bdce8d2
Add missing import
2017-10-30 16:18:10 +01:00
Explosion Bot
2d2cc294b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 16:15:05 +01:00
Explosion Bot
0fc1209421
Wire up new vocab command
2017-10-30 16:14:50 +01:00
Explosion Bot
aa64031751
Fix clear_vectors() method on Vocab
2017-10-30 16:09:04 +01:00
Explosion Bot
7b56b2f04b
Add Vocab.cfg attr, to hold stuff like oov probs
2017-10-30 16:08:50 +01:00
Explosion Bot
ab5d5ed880
Fix vectors.add()
2017-10-30 16:08:09 +01:00
Explosion Bot
41d0f1665a
Fix add_attrs for cluster
2017-10-30 16:07:50 +01:00
ines
5453821a9f
Update NER annotation scheme
...
Add note on training data sources and include coarse-grained Wikipedia scheme
2017-10-30 13:53:49 +01:00
Explosion Bot
5ede7cec9b
Improve Lexeme.set_attrs method
2017-10-30 11:49:11 +01:00
Explosion Bot
72aea8f105
Update vectors.add() to allow setting keys to rows
2017-10-30 10:03:08 +01:00
Matthew Honnibal
c43cc5361d
Merge pull request #1467 from explosion/feature/better-parser
...
💫 Bug fixes to parser model (requires retraining)
2017-10-29 02:05:22 +02:00
ines
6c2d8d3b2a
Use shortcuts-nightly.json to resolve model shortcuts
2017-10-29 01:28:31 +02:00
Matthew Honnibal
a0c7dabb72
Fix bug in 8-token parser features
2017-10-28 23:01:35 +00:00
Matthew Honnibal
b713d10d97
Switch to 13 features in parser
2017-10-28 23:01:14 +00:00
Matthew Honnibal
3b91097321
Whitespace
2017-10-28 17:05:11 +00:00
Matthew Honnibal
6ef72864fa
Improve initialization for hidden layers
2017-10-28 17:05:01 +00:00
Matthew Honnibal
5414e2f14b
Use missing features in parser
2017-10-28 16:45:54 +00:00
Matthew Honnibal
df4803cc6d
Add learned missing values for parser
2017-10-28 16:45:14 +00:00
Matthew Honnibal
64e4ff7c4b
Merge 'tidy-up' changes into branch. Resolve conflicts
2017-10-28 13:16:06 +02:00
Explosion Bot
fb0c96f39a
Fix optimizer loading
2017-10-28 11:58:16 +02:00
Explosion Bot
b22e42af7f
Merge changes to parser and _ml
2017-10-28 11:52:10 +02:00
ines
d96e72f656
Tidy up rest
2017-10-27 21:07:59 +02:00
ines
a8e10f94e4
Tidy up Lexeme and update docs
2017-10-27 21:07:50 +02:00
ines
ba5e646219
Tidy up pipeline
2017-10-27 20:29:08 +02:00
ines
b4d226a3f1
Tidy up syntax
2017-10-27 19:45:57 +02:00
ines
5167a0cce2
Tidy up Vectors and docs
2017-10-27 19:45:19 +02:00
ines
7946464742
Remove spacy.tagger (now in pipeline)
2017-10-27 19:45:04 +02:00
ines
9c89e2cdef
Remove unused syntax iterators (now in language data)
2017-10-27 18:09:53 +02:00
ines
d2df81d907
Fix not implemented Span getters
2017-10-27 18:09:28 +02:00
ines
544a407b93
Tidy up Doc, Token and Span and add missing docs
2017-10-27 17:07:26 +02:00
ines
a6135336f5
Tidy up gold
2017-10-27 17:02:55 +02:00
ines
6a0483b7aa
Tidy up and document Doc, Token and Span
2017-10-27 15:41:45 +02:00
ines
1a559d4c95
Remove old, unused file
2017-10-27 15:34:35 +02:00
ines
91899d337b
Tidy up language, lemmatizer and scorer
2017-10-27 14:40:14 +02:00
ines
778212efea
Tidy up init and main
2017-10-27 14:39:51 +02:00
ines
e33b7e0b3c
Tidy up parser and ML
2017-10-27 14:39:30 +02:00
ines
e3265998c0
Tidy up displaCy
2017-10-27 14:39:19 +02:00
ines
ea4a41c8fb
Tidy up util and helpers
2017-10-27 14:39:09 +02:00
ines
d941fc3667
Tidy up CLI
2017-10-27 14:38:39 +02:00
Matthew Honnibal
531142a933
Merge remote-tracking branch 'origin/develop' into feature/better-parser
2017-10-27 12:34:48 +00:00
Matthew Honnibal
19a2b9bf27
Fix import of Optimizer
2017-10-27 12:33:42 +00:00
Matthew Honnibal
4d048e94d3
Add compat for thinc.neural.optimizers.Optimizer
2017-10-27 10:23:49 +00:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal
75a637fa43
Remove redundant imports from _ml
2017-10-27 10:19:56 +00:00
Matthew Honnibal
c9987cf131
Avoid use of numpy.tensordot
2017-10-27 10:18:36 +00:00
Matthew Honnibal
f6fef30adc
Remove dead code from spacy._ml
2017-10-27 10:16:41 +00:00
Matthew Honnibal
b9616419e1
Add try/except around bz2 import
2017-10-27 01:18:05 +00:00
Matthew Honnibal
783c0c8795
Remove unnecessary bz2 import
2017-10-27 01:17:54 +00:00
Matthew Honnibal
bb25bdcd92
Adjust call to scatter_add for the new version
2017-10-27 01:16:55 +00:00
Ines Montani
287a3ca256
Merge pull request #1466 from explosion/feature/rename-pipeline
...
💫 Clean up dead linear model code
2017-10-27 02:03:28 +02:00
ines
4eb5bd02e7
Update textcat pre-processing after to_array change
2017-10-27 00:32:12 +02:00
ines
2d6ec99884
Set 'model' as default model name to prevent meta.json errors
2017-10-26 16:12:23 +02:00
ines
9e372913e0
Remove old 'SP' condition in tag map
2017-10-26 16:11:57 +02:00
Matthew Honnibal
c52671420c
Remove old cfile import
2017-10-26 13:28:19 +02:00
Matthew Honnibal
ea03f1ef64
Remove obsolete cfile code
2017-10-26 13:23:36 +02:00
Matthew Honnibal
90d1d9b230
Remove obsolete parser code
2017-10-26 13:22:45 +02:00
ines
6f78e29bed
Add LAW entity label to glossary
2017-10-26 13:04:35 +02:00
ines
9bf78d5fb3
Update spacy.explain docs
2017-10-26 13:04:25 +02:00
Matthew Honnibal
33f8c58782
Remove obsolete parser.pyx
2017-10-26 12:42:05 +02:00
Matthew Honnibal
a8abc47811
Rename BaseThincComponent --> Pipe
2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
b6b4f1aaf7
Merge pull request #1462 from explosion/feature/vector-meta-data
...
💫 Add vector meta data to model meta.json on train/package and show in docs
2017-10-26 11:39:41 +02:00
Matthew Honnibal
35977bdbb9
Update better-parser branch with develop
2017-10-26 00:55:53 +00:00
Ines Montani
090bd00369
Merge pull request #1464 from mayukh18/develop_bengali_pronouns
...
added the bengali pronouns for v2.0
2017-10-25 21:55:25 +02:00
mayukh18
1bc07758fa
added few bengali pronouns
2017-10-25 22:24:40 +05:30
ines
de1e5f35d5
Merge branch 'develop' into feature/disable-pipes
2017-10-25 16:33:12 +02:00
ines
728b609bf9
Merge branch 'develop' into feature/vector-meta-data
2017-10-25 16:32:22 +02:00
ines
c0b55ebdac
Fix PhraseMatcher.__contains__ and add more tests
2017-10-25 16:31:11 +02:00
ines
91beacf5e3
Fix Matcher.__contains__
2017-10-25 16:19:38 +02:00
ines
11e3f19764
Fix vectors data added after training (see #1457 )
2017-10-25 16:08:26 +02:00
ines
057954695b
Read pipeline and vector data off model in --generate-meta
2017-10-25 16:03:26 +02:00
ines
273e638183
Add vector data to model meta after training (see #1457 )
2017-10-25 16:03:05 +02:00
ines
18aae423fb
Remove import of non-existing function
2017-10-25 15:54:10 +02:00
ines
5117a7d24d
Fix whitespace
2017-10-25 15:54:02 +02:00