Commit Graph

10936 Commits

Author SHA1 Message Date
Matthw Honnibal
f2808f78a7 Fix parser_maxout_pieces for depth=0 2019-10-21 01:25:03 +02:00
Matthw Honnibal
b4e0040d10 Pass Tok2Vec settings a bit better 2019-10-21 01:11:47 +02:00
Matthw Honnibal
fef50277d7 Support parser depth=0 2019-10-21 01:11:30 +02:00
Matthw Honnibal
eba89f08bd Use chars loss in ClozeMultitask 2019-10-20 17:47:15 +02:00
Matthw Honnibal
77af446d04 Move characters_loss function, add window option 2019-10-20 17:47:00 +02:00
Matthw Honnibal
5a601ef46a Add cnn_window option to pretrain 2019-10-20 17:46:34 +02:00
Matthw Honnibal
3a67aa857e Clarify parser model CPU/GPU code
The previous version worked with previous thinc, but only
because some thinc ops happened to have gpu/cpu compatible
implementations. It's better to call the right Ops instance.
2019-10-20 17:15:17 +02:00
Matthw Honnibal
ee56c6a4e1 Implement character-based pretraining objective 2019-10-19 11:42:38 +02:00
Matthw Honnibal
36de9bf72a Add more spacy pretrain options 2019-10-18 17:24:13 +02:00
Matthw Honnibal
f3e2aaea1e Fix GPU selection in spacy train 2019-10-18 17:23:55 +02:00
Matthw Honnibal
49c0adc706 Add character-based bilstm tok2vec 2019-10-18 17:23:37 +02:00
Matthw Honnibal
727ede6599 Make character copy non-blocking 2019-10-18 17:23:19 +02:00
Matthw Honnibal
4da1c1c211 Try to make cuda call non-blocking 2019-10-18 17:22:16 +02:00
Matthw Honnibal
b2e8f37965 Make cuda streams non-blocking by default 2019-10-18 17:21:57 +02:00
Matthw Honnibal
ca0759b325 Pass config better in nn_parser 2019-10-17 21:10:56 +02:00
Matthw Honnibal
e737750a02 Fix bilstm_depth default in pretrain command 2019-10-17 21:10:08 +02:00
Matthw Honnibal
6aa1c53b1b Call resume_training for base model in train CLI 2019-10-17 21:09:41 +02:00
Matthw Honnibal
3f26c50a4d Refactor some of tok2vec 2019-10-17 17:58:00 +02:00
Matthw Honnibal
e63f28079a Try 3 NER features 2019-10-07 16:51:03 +02:00
Matthw Honnibal
2d55ccdd27 Support option of three NER features 2019-10-07 16:50:44 +02:00
Matthw Honnibal
c8857181f8 Fix get labels for textcat 2019-10-07 16:50:15 +02:00
Matthw Honnibal
a6a2ff217f Fix char_embed for gpu 2019-10-07 16:49:32 +02:00
Matthw Honnibal
f4040a98f0 Fix passing of cats in gold.pyx 2019-10-07 16:49:00 +02:00
Matthw Honnibal
a132da1558 Fix gold-preproc training mode 2019-10-07 02:07:03 +02:00
Matthw Honnibal
63ff233ba2 Enable GPU in pytorch n use_gpu functon 2019-10-06 19:24:21 +02:00
Matthw Honnibal
9dbaea1ab4 Use cosine loss in Cloze multitask 2019-10-06 19:23:46 +02:00
Matthw Honnibal
157d3d769b Support bilstm_depth arg in spacy pretrain 2019-10-06 19:22:26 +02:00
Matthw Honnibal
615ebe584f Add option to ignore zero vectors in get_cossim_loss 2019-10-06 19:20:54 +02:00
adrianeboyd
cbc2cee2c8 Improve URL_PATTERN and handling in tokenizer (#4374)
* Move prefix and suffix detection for URL_PATTERN

Move prefix and suffix detection for `URL_PATTERN` into the tokenizer.
Remove associated lookahead and lookbehind from `URL_PATTERN`.

Fix tokenization for Hungarian given new modified handling of prefixes
and suffixes.

* Match a wider range of URI schemes
2019-10-05 13:00:09 +02:00
Ines Montani
e65dffd80b Clarify serialization of extension attributes (closes #4377) [ci skip] 2019-10-05 11:58:00 +02:00
Ines Montani
fec9433044 Make PhraseMatcher.vocab consistent with Matcher.vocab (closes #4373) 2019-10-04 12:18:41 +02:00
Ines Montani
e7ddc6f662 Add conda install for lookups [ci skip] 2019-10-03 17:52:53 +02:00
Matthew Honnibal
37ef874d8b Set version to v2.2.1 2019-10-03 14:50:39 +02:00
Sofie Van Landeghem
4e7259c6cf Bugfix initializing DocBin with attributes (#4368)
* docbin init fix + documentation fix + unit tests

* newline

* try with zlib instead of gzip (python 2 incompatibilities)
2019-10-03 14:48:45 +02:00
Ines Montani
ce1d441de5 Add docs for Vectors.most_similar [ci skip] 2019-10-03 14:29:47 +02:00
Ben Taylor
1db79a33cb most_similar() return the k most similar vectors (#4364)
* most_similar return n-most similar vectors

* updated most_similar comment

* add bintay contributor agreement

* sign bintay contributor agreement

* fix most_similar documentation typo

* fixed error in prune_vectors

* updated prune_vectors test
2019-10-03 14:09:44 +02:00
Ines Montani
4159936720 Update README.md [ci skip] 2019-10-02 19:15:22 +02:00
Ines Montani
e4782feae9 Update README.md [ci skip] 2019-10-02 18:49:55 +02:00
Ines Montani
80cf385f65 Update v2-2.md [ci skip] 2019-10-02 16:58:21 +02:00
Ines Montani
f8e606c303 Update README.md [ci skip] 2019-10-02 16:47:10 +02:00
Ines Montani
12a941d841 Update binder version [ci skip] 2019-10-02 16:47:01 +02:00
Matthew Honnibal
2eb31012e7 Set version to v2.2.0 2019-10-02 14:40:06 +02:00
Matthew Honnibal
796072e560 Set version to v2.2.0.dev19 2019-10-02 12:51:29 +02:00
Sofie Van Landeghem
9d3ce7cba2 Ensure training doesn't crash with empty batches (#4360)
* unit test for previously resolved unflatten issue

* prevent batch of empty docs to cause problems
2019-10-02 12:50:47 +02:00
Ines Montani
52b5912dbf Tidy up [ci skip] 2019-10-02 12:05:59 +02:00
adrianeboyd
d82241218a Make the default NER labels less model-specific [ci skip] (#4361) 2019-10-02 12:05:17 +02:00
adrianeboyd
dda86118bd Update Ukrainian lemmatizer with new lookups (#4359)
* Update Ukrainian lemmatizer with new lookups

* Add missing import


Co-authored-by: Ines Montani <ines@ines.io>
2019-10-02 12:04:06 +02:00
Ines Montani
b6670bf0c2 Use consistent spelling 2019-10-02 10:37:39 +02:00
Ines Montani
208629615d Auto-format 2019-10-02 10:37:04 +02:00
Ines Montani
867e93aae2 Add Streamlit example [ci skip] 2019-10-02 01:21:20 +02:00