Commit Graph

8513 Commits

Author SHA1 Message Date
Matthew Honnibal
cf5fcf0546 Update serialization test 2018-03-28 20:12:53 +02:00
Matthew Honnibal
4555e3e251 Dont assume pretrained_vectors cfg set in build_tagger 2018-03-28 20:12:45 +02:00
Matthew Honnibal
f8dd905a24 Warn and fallback if vectors have no name 2018-03-28 18:24:53 +02:00
Matthew Honnibal
fd9e259414 Add test for #1660 2018-03-28 18:22:51 +02:00
Matthew Honnibal
bc4afa9881 Remove print statement 2018-03-28 17:48:37 +02:00
Matthew Honnibal
79dc241caa Set pretrained_vectors in parser cfg 2018-03-28 17:35:07 +02:00
Matthew Honnibal
17c3e7efa2 Add message noting vectors 2018-03-28 16:33:43 +02:00
Matthew Honnibal
9bf6e93b3e Set pretrained_vectors in begin_training 2018-03-28 16:32:41 +02:00
Matthew Honnibal
95a9615221 Fix loading of multiple pre-trained vectors
This patch addresses #1660, which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.

The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.

In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
Matthew Honnibal
070b6c6495 Remove dependency on ftfy 2018-03-28 12:07:02 +02:00
ines
6d2c85f428 Drop six and related hacks as a dependency 2018-03-28 10:45:25 +02:00
ines
9e83513004 Add position of invalid token to error message 2018-03-27 23:56:59 +02:00
ines
11c4735ccf Fix issue in Italian lemmatizer data (resolves #2050) 2018-03-27 23:55:22 +02:00
ines
693971dd8f Improve error message if token text is empty string (see #2101) 2018-03-27 22:25:40 +02:00
ines
0c829e6605 Fix whitespace 2018-03-27 22:20:59 +02:00
Ines Montani
e0ae390607
Update CONTRIBUTING.md 2018-03-27 13:47:00 +02:00
Matthew Honnibal
d4680e4d83 Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-27 13:36:37 +02:00
Matthew Honnibal
63a267b34d Fix #2073: Token.set_extension not working 2018-03-27 13:36:20 +02:00
Ines Montani
284bbb1dd1
Merge pull request #2146 from justindujardin/tensorboard-standalone-example
Add example using TensorBoard standalone projector
2018-03-27 13:23:32 +02:00
Justin DuJardin
4eeb178856 Add example using TensorBoard standalone projector
- the tensorboard standalone project expects a different set of files than the plugin to TensorFlow.
2018-03-25 21:50:13 -07:00
Ines Montani
68226109f4
Merge pull request #2142 from jimregan/polish-more-tokens
more exceptions
2018-03-24 19:06:44 +01:00
Matthew Honnibal
d566e673bf Set version to v2.0.10 2018-03-24 18:09:03 +01:00
Matthew Honnibal
0d3bf0d4eb Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-24 17:31:49 +01:00
dejanmarich
ccd1c04c63 Update stop_words.py
Added more words
2018-03-24 17:31:24 +01:00
ines
f1446b0257 Port over Turkish changes 2018-03-24 17:31:07 +01:00
DuyguA
cd604878a4 quick typo fix 2018-03-24 17:26:35 +01:00
Matthew Honnibal
406548b976 Support .gz and .tar.gz files in spacy init-model 2018-03-24 17:18:32 +01:00
ines
6173c4aaa6 Port over contributor agreements 2018-03-24 17:17:37 +01:00
ines
4ec2809eb5 Port over TensorBoard example 2018-03-24 17:15:48 +01:00
ines
5ecc60cf3b Add book to resources [ci skip] 2018-03-24 17:12:56 +01:00
ines
53680642af Port over docs changes [ci skip] 2018-03-24 17:12:48 +01:00
Matthew Honnibal
74cc6bb06a Merge branch 'master' into hotfix/v2.0.9 2018-03-24 17:08:13 +01:00
Matthew Honnibal
11fc69d6ef Merge remote-tracking branch 'origin' 2018-03-24 17:07:50 +01:00
Matthew Honnibal
48f3606a8a Merge branch 'master' into hotfix/v2.0.9 2018-03-24 17:06:50 +01:00
Matthew Honnibal
d4cad89407 Merge branch 'develop' 2018-03-24 17:05:18 +01:00
Jim O'Regan
efe037e8be more exceptions 2018-03-24 00:05:27 +00:00
Ines Montani
a218579be7
Merge pull request #2141 from ottosulin/fin_examples
Finnish examples
2018-03-23 22:57:28 +01:00
Ines Montani
719037cf20
Update formatting and add missing commas 2018-03-23 22:18:20 +01:00
Ines Montani
2b68361501
Merge pull request #2140 from ottosulin/ottosulin_contributor [ci skip]
My contributor agreement
2018-03-23 22:14:34 +01:00
Otto Sulin
266efc2018 Added Finnish examples 2018-03-23 22:58:52 +02:00
Ines Montani
cd97a44894
Merge pull request #2137 from justindujardin/tensorboard-example
Add example for visualizing word vectors with TensorBoard Projector
2018-03-23 21:47:13 +01:00
Otto Sulin
82acb8f399 My contributor agreement 2018-03-23 22:46:58 +02:00
Otto Sulin
1940e54602 Added Finnish numbers 2018-03-23 22:33:08 +02:00
Otto Sulin
4ec3f19e2b fixed stop words -> to-do lex_attrs.py 2018-03-23 22:18:17 +02:00
Justin DuJardin
c7ff8ee66c Add contributor agreement 2018-03-23 13:11:56 -07:00
Justin DuJardin
eef9430f07 Add example for visualizing word vectors with TensorBoard Projector
Use:

```bash
python vectors_tensorboard.py en_core_web_lg ./output_folder spaCy_large
```
2018-03-23 12:49:01 -07:00
Matthew Honnibal
85717f570c Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-23 20:30:42 +01:00
Matthew Honnibal
8902754f0b Fix vector loading for ud_train 2018-03-23 20:30:00 +01:00
Ines Montani
782ec6f4f2
Merge pull request #2131 from calumcalder/fix-displacy-docs-typo
Fix typo in documentation for displacy Visualizer
2018-03-23 13:03:00 +01:00
Xiaoquan Kong
a71b99d7ff bugfix for global-variable-change-in-runtime related issue (#2135)
* Bugfix: setting pollution from spacy/cli/ud_train.py to whole package

* Add contributor agreement of howl-anderson
2018-03-23 11:36:38 +01:00