Matthw Honnibal
73b1f651d4
Rescale gradients for mlm
2019-10-24 17:35:37 +02:00
Matthw Honnibal
7d81d17ce5
Restore LayerNorm for mish
2019-10-24 17:35:10 +02:00
Matthw Honnibal
134056684f
Fix number characters
2019-10-24 17:34:16 +02:00
Matthw Honnibal
754a36d981
Add bos_eos function for tok2vec
2019-10-24 17:33:47 +02:00
Matthw Honnibal
9ca109597d
Fix parser model for depth 1
2019-10-23 05:14:00 +02:00
Matthw Honnibal
9e32d8271c
Avoid failing if cupy random can'tbe set
2019-10-23 05:13:28 +02:00
Matthw Honnibal
f8bf5b7fe5
Fx gpu_id arg in pretrain
2019-10-23 04:41:40 +02:00
Matthw Honnibal
95648dcdd7
Pass parser settings better
2019-10-23 04:41:20 +02:00
Matthw Honnibal
8892ce98aa
Improve settings in _ml
2019-10-23 04:40:10 +02:00
Matthw Honnibal
6c8785a238
Add option for GPU ID to pretrain
2019-10-22 22:44:24 +02:00
Matthw Honnibal
1dce86c555
Pass settings better from parser
2019-10-22 03:26:43 +02:00
Matthw Honnibal
ab7f85dfa2
Use Mish layer if pieces==1 in CNN
2019-10-22 03:26:27 +02:00
Matthw Honnibal
7ef3bcdc1c
Support cnn_maxout_pieces arg in pretrain
2019-10-22 03:25:30 +02:00
Matthw Honnibal
5a272d9029
Add method to decode predicted characters
2019-10-21 03:56:15 +02:00
Matthw Honnibal
f2808f78a7
Fix parser_maxout_pieces for depth=0
2019-10-21 01:25:03 +02:00
Matthw Honnibal
b4e0040d10
Pass Tok2Vec settings a bit better
2019-10-21 01:11:47 +02:00
Matthw Honnibal
fef50277d7
Support parser depth=0
2019-10-21 01:11:30 +02:00
Matthw Honnibal
eba89f08bd
Use chars loss in ClozeMultitask
2019-10-20 17:47:15 +02:00
Matthw Honnibal
77af446d04
Move characters_loss function, add window option
2019-10-20 17:47:00 +02:00
Matthw Honnibal
5a601ef46a
Add cnn_window option to pretrain
2019-10-20 17:46:34 +02:00
Matthw Honnibal
3a67aa857e
Clarify parser model CPU/GPU code
...
The previous version worked with previous thinc, but only
because some thinc ops happened to have gpu/cpu compatible
implementations. It's better to call the right Ops instance.
2019-10-20 17:15:17 +02:00
Matthw Honnibal
ee56c6a4e1
Implement character-based pretraining objective
2019-10-19 11:42:38 +02:00
Matthw Honnibal
36de9bf72a
Add more spacy pretrain options
2019-10-18 17:24:13 +02:00
Matthw Honnibal
f3e2aaea1e
Fix GPU selection in spacy train
2019-10-18 17:23:55 +02:00
Matthw Honnibal
49c0adc706
Add character-based bilstm tok2vec
2019-10-18 17:23:37 +02:00
Matthw Honnibal
727ede6599
Make character copy non-blocking
2019-10-18 17:23:19 +02:00
Matthw Honnibal
4da1c1c211
Try to make cuda call non-blocking
2019-10-18 17:22:16 +02:00
Matthw Honnibal
b2e8f37965
Make cuda streams non-blocking by default
2019-10-18 17:21:57 +02:00
Matthw Honnibal
ca0759b325
Pass config better in nn_parser
2019-10-17 21:10:56 +02:00
Matthw Honnibal
e737750a02
Fix bilstm_depth default in pretrain command
2019-10-17 21:10:08 +02:00
Matthw Honnibal
6aa1c53b1b
Call resume_training for base model in train CLI
2019-10-17 21:09:41 +02:00
Matthw Honnibal
3f26c50a4d
Refactor some of tok2vec
2019-10-17 17:58:00 +02:00
Matthw Honnibal
e63f28079a
Try 3 NER features
2019-10-07 16:51:03 +02:00
Matthw Honnibal
2d55ccdd27
Support option of three NER features
2019-10-07 16:50:44 +02:00
Matthw Honnibal
c8857181f8
Fix get labels for textcat
2019-10-07 16:50:15 +02:00
Matthw Honnibal
a6a2ff217f
Fix char_embed for gpu
2019-10-07 16:49:32 +02:00
Matthw Honnibal
f4040a98f0
Fix passing of cats in gold.pyx
2019-10-07 16:49:00 +02:00
Matthw Honnibal
a132da1558
Fix gold-preproc training mode
2019-10-07 02:07:03 +02:00
Matthw Honnibal
63ff233ba2
Enable GPU in pytorch n use_gpu functon
2019-10-06 19:24:21 +02:00
Matthw Honnibal
9dbaea1ab4
Use cosine loss in Cloze multitask
2019-10-06 19:23:46 +02:00
Matthw Honnibal
157d3d769b
Support bilstm_depth arg in spacy pretrain
2019-10-06 19:22:26 +02:00
Matthw Honnibal
615ebe584f
Add option to ignore zero vectors in get_cossim_loss
2019-10-06 19:20:54 +02:00
adrianeboyd
cbc2cee2c8
Improve URL_PATTERN and handling in tokenizer ( #4374 )
...
* Move prefix and suffix detection for URL_PATTERN
Move prefix and suffix detection for `URL_PATTERN` into the tokenizer.
Remove associated lookahead and lookbehind from `URL_PATTERN`.
Fix tokenization for Hungarian given new modified handling of prefixes
and suffixes.
* Match a wider range of URI schemes
2019-10-05 13:00:09 +02:00
Ines Montani
fec9433044
Make PhraseMatcher.vocab consistent with Matcher.vocab ( closes #4373 )
2019-10-04 12:18:41 +02:00
Matthew Honnibal
37ef874d8b
Set version to v2.2.1
2019-10-03 14:50:39 +02:00
Sofie Van Landeghem
4e7259c6cf
Bugfix initializing DocBin with attributes ( #4368 )
...
* docbin init fix + documentation fix + unit tests
* newline
* try with zlib instead of gzip (python 2 incompatibilities)
2019-10-03 14:48:45 +02:00
Ben Taylor
1db79a33cb
most_similar() return the k most similar vectors ( #4364 )
...
* most_similar return n-most similar vectors
* updated most_similar comment
* add bintay contributor agreement
* sign bintay contributor agreement
* fix most_similar documentation typo
* fixed error in prune_vectors
* updated prune_vectors test
2019-10-03 14:09:44 +02:00
Matthew Honnibal
2eb31012e7
Set version to v2.2.0
2019-10-02 14:40:06 +02:00
Matthew Honnibal
796072e560
Set version to v2.2.0.dev19
2019-10-02 12:51:29 +02:00
Sofie Van Landeghem
9d3ce7cba2
Ensure training doesn't crash with empty batches ( #4360 )
...
* unit test for previously resolved unflatten issue
* prevent batch of empty docs to cause problems
2019-10-02 12:50:47 +02:00