Commit Graph

238 Commits

Author SHA1 Message Date
Matthew Honnibal
2c37e0ccf6
💫 Use Blis for matrix multiplications (#2966)
Our epic matrix multiplication odyssey is drawing to a close...

I've now finally got the Blis linear algebra routines in a self-contained Python package, with wheels for Windows, Linux and OSX. The only missing platform at the moment is Windows Python 2.7. The result is at https://github.com/explosion/cython-blis

Thinc v7.0.0 will make the change to Blis. I've put a Thinc v7.0.0.dev0 up on PyPi so that we can test these changes with the CI, and even get them out to spacy-nightly, before Thinc v7.0.0 is released. This PR also updates the other dependencies to be in line with the current versions master is using. I've also resolved the msgpack deprecation problems, and gotten spaCy and Thinc up to date with the latest Cython.

The point of switching to Blis is to have control of how our matrix multiplications are executed across platforms. When we were using numpy for this, a different library would be used on pip and conda, OSX would use Accelerate, etc. This would open up different bugs and performance problems, especially when multi-threading was introduced.

With the change to Blis, we now strictly single-thread the matrix multiplications. This will make it much easier to use multiprocessing to parallelise the runtime, since we won't have nested parallelism problems to deal with.

* Use blis

* Use -2 arg to Cython

* Update dependencies

* Fix requirements

* Update setup dependencies

* Fix requirement typo

* Fix msgpack errors

* Remove Python27 test from Appveyor, until Blis works there

* Auto-format setup.py

* Fix murmurhash version
2018-11-27 00:44:04 +01:00
Matthew Honnibal
c89fd19f66 Hack broken pipe error for Python2 2018-11-16 02:22:05 +01:00
Matthew Honnibal
09a0227656 Temporarily add a script to load reddit 2018-11-15 23:18:35 +00:00
Matthew Honnibal
48ed1ca29d Add branch option to push-tag script 2018-08-15 03:16:43 +02:00
Matthew Honnibal
f0024e3b13 Add script to push a tag 2018-07-21 15:10:54 +02:00
ines
5025d709e0 Remove old, outdated files in /bin 2017-10-27 19:44:38 +02:00
ines
d208bcef96 Add entry point-style auto alias for "spacy"
Simplest way to run commands as spacy xxx instead of python -m spacy
xxx, while avoiding environment conflicts
2017-08-14 12:18:39 +02:00
Matthew Honnibal
5dffb85184 Don't use gpu 2017-05-08 08:39:59 -05:00
Matthew Honnibal
bef89ef23d Mergery 2017-05-08 08:29:36 -05:00
Matthew Honnibal
245372973d Don't use tagger to predict tags 2017-05-08 07:55:34 -05:00
Matthew Honnibal
7a33f1e2b7 Add dep to supertag. 2017-05-08 07:50:01 -05:00
Matthew Honnibal
66252f3e71 Change vector width 2017-05-08 14:47:11 +02:00
Matthew Honnibal
2e2268a442 Precomputable hidden now working 2017-05-08 11:36:37 +02:00
Matthew Honnibal
10682d35ab Get pre-computed version working 2017-05-08 00:38:35 +02:00
Matthew Honnibal
6782eedf9b Tmp GPU code 2017-05-07 11:04:24 -05:00
Matthew Honnibal
e420e5a809 Tmp 2017-05-07 07:31:09 -05:00
Matthew Honnibal
f99f5b75dc working residual net 2017-05-07 03:57:26 +02:00
Matthew Honnibal
bdf2dba9fb WIP on refactor, with hidde pre-computing 2017-05-07 02:02:43 +02:00
Matthew Honnibal
b439e04f8d Learning smoothly 2017-05-06 20:38:12 +02:00
Matthew Honnibal
04ae1c01f1 Learns things 2017-05-06 18:21:02 +02:00
Matthew Honnibal
bcf4cd0a5f Learns things 2017-05-06 17:37:36 +02:00
Matthew Honnibal
8e48b58cd6 Gradients look correct 2017-05-06 16:47:15 +02:00
ines
8bc05c2ba9 Delete old training scripts (resolves #911) 2017-03-23 11:07:59 +01:00
Raphaël Bournhonesque
08346dba1a Use specific language class instead of base Language class 2017-03-21 23:18:54 +01:00
Raphaël Bournhonesque
7568cd6bf8 Split CONLLX file using tabs and not default split separators 2017-03-21 23:00:13 +01:00
Matthew Honnibal
ef6bd08e6c Update train_ud for Universal Dependencies 2 2017-03-16 17:08:15 -05:00
Matthew Honnibal
a155482fda Improve printing in train_ud script 2017-03-11 11:11:05 -06:00
Matthew Honnibal
35124b144a Add L1 penalty option to parser 2017-03-09 18:44:53 -06:00
Matthew Honnibal
04a51dab62 Print active parser features during training 2017-03-08 01:37:19 +01:00
Matthew Honnibal
c744ce4b6d Fix bad change to cythonize.py script, re subprocess call 2017-02-16 19:01:25 +01:00
Matthew Honnibal
071d11cb35 Pass environment to Cythonize script. Closes #791 2017-02-17 01:04:16 +11:00
Matthew Honnibal
4ff92184f1 Improve train_ud script 2017-01-09 09:53:46 -06:00
Matthew Honnibal
c1ef07788c Update train_ud.py
Create deps folder if it doesn't exist.
2017-01-09 10:55:44 +11:00
Matthew Honnibal
46e98ec029 Move init_model.py script from repo. These meta-tools should live elsewhere 2016-12-18 14:03:40 +01:00
dafnevk
cdf5dcc40a fixed bug in init_model so that it runs for dutch 2016-12-13 14:33:44 +01:00
Matthew Honnibal
c7889492f9 Fix model saving error for Python 3 2016-11-25 18:04:30 -06:00
Matthew Honnibal
22189e60db Use unicode literals in train_ud 2016-11-25 17:45:45 -06:00
Matthew Honnibal
da5f0cce36 Fix train_ud script, which trains models from the Universal Dependencies format. 2016-11-25 11:19:33 -06:00
Matthew Honnibal
314bc8d34f Fix train script for 1.0 2016-11-25 08:57:37 -06:00
Matthew Honnibal
bd1bfcca61 Update train.py 2016-10-13 03:23:48 +02:00
Matthew Honnibal
ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal
53fbd3dd1c Fix train.py for v1.0.0-rc1 2016-10-05 01:11:46 +02:00
Matthew Honnibal
ae202e7a60 Fix init_model.py 2016-09-25 15:58:51 +02:00
Matthew Honnibal
af847e07fc Fix usage of pathlib for Python3 -- turning paths to strings. 2016-09-24 21:05:27 +02:00
Matthew Honnibal
d310dc73ef Fix bin/init_model.py after refactoring 2016-09-24 20:38:18 +02:00
Matthew Honnibal
8036368d96 * Fix model saving 2016-05-23 12:01:46 +00:00
Matthew Honnibal
35214053fd * Work around get_lex_attr bug introduced during German parsing 2016-05-23 10:53:00 +00:00
Wolfgang Seeker
dae6bc05eb define German dummy lemmatizer until morphology is done 2016-05-02 16:04:53 +02:00
Matthew Honnibal
8569dbc2d0 * Add initial stuff for Chinese parsing 2016-04-24 18:44:24 +02:00
Wolfgang Seeker
f9150ccf2a rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip 2016-04-08 13:38:07 +02:00