spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 04:08:09 +03:00

Author	SHA1	Message	Date
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Matthew Honnibal	449b889454	Fix KeyError in Vectors.most_similar. Fixes #2648	2018-12-10 16:19:18 +01:00
Matthew Honnibal	90aec6d2f6	Fix vectors for reserved words. Closes #2871	2018-12-10 16:09:49 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Suraj Rajan	1cdbb7c97c	[2032] - Changed python set to cpp stl set (#2170 ) Changed python set to cpp stl set #2032 ## Description Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors. Reference : http://www.cplusplus.com/reference/set/set/ ### Types of change Enhancement for `Vectors` for faster initialising of word vectors(fasttext)	2018-03-31 13:28:25 +02:00
Ines Montani	a609a1ca29	Merge pull request #2152 from explosion/feature/tidy-up-dependencies 💫 Tidy up dependencies	2018-03-29 14:35:09 +02:00
Matthew Honnibal	8308bbc617	Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts	2018-03-29 00:14:55 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
Matthew Honnibal	8cefc58abc	Fix Vectors pickling	2018-03-14 16:59:37 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
Matthew Honnibal	29897ed1b3	Allow vector loading to work on 1d data files. Fixes #1831	2018-01-22 19:18:26 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	36b47e3fa6	Fix (and test) vector pickling	2017-12-07 09:53:30 +01:00
Matthew Honnibal	b712de774e	Fix vectors pickling	2017-12-05 12:45:24 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	225cc249c9	Pass string path to numpy, to fix #1479	2017-11-05 14:42:46 +01:00
Matthew Honnibal	fdb4b8e456	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 02:07:17 +01:00
Matthew Honnibal	c48dd0e1d3	Fix vector pruning	2017-11-01 02:06:58 +01:00
ines	5683fd65ed	Update docstrings	2017-11-01 00:42:39 +01:00
Matthew Honnibal	c16310d156	Update vectors with find method	2017-11-01 00:34:55 +01:00
ines	2ad2f09d12	Update docstrings and simplify most_similar	2017-11-01 00:18:08 +01:00
ines	ba2e6c8c6f	Update docstrings and formatting	2017-10-31 23:23:34 +01:00
Matthew Honnibal	d90a22afe6	Fix loading previous vectors models	2017-10-31 19:58:35 +01:00
Matthew Honnibal	997a61557a	Add vectors.n_keys property	2017-10-31 19:30:52 +01:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Matthew Honnibal	4112a991ec	Fix vector pruning	2017-10-30 19:44:40 +01:00
Explosion Bot	d0cf12c8c7	Fix off-by-one error in vectors	2017-10-30 16:22:03 +01:00
Explosion Bot	ab5d5ed880	Fix vectors.add()	2017-10-30 16:08:09 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
ines	5167a0cce2	Tidy up Vectors and docs	2017-10-27 19:45:19 +02:00
Matthew Honnibal	cfae54c507	Make change to Vectors.__init__	2017-10-20 14:19:04 +02:00
Matthew Honnibal	92ac9316b5	Fix initialization of vectors, to address serialization problem	2017-10-20 13:59:24 +02:00
Matthew Honnibal	df488274b1	Fix deserialization of vectors	2017-10-16 20:55:00 +02:00
Matthew Honnibal	d90cc917fa	Merge vectors.pyx doc strings	2017-10-01 17:05:54 -05:00
Matthew Honnibal	b2a8b9be77	Fix inconsistency of Vectors class API	2017-10-01 17:00:34 -05:00
Matthew Honnibal	97c409b602	Add docstrings for spacy.vectors	2017-10-01 22:10:33 +02:00
Matthew Honnibal	4f38a67a89	Make width default to 0 in vectors.pyx	2017-09-17 12:29:14 -05:00
Matthew Honnibal	e0a2aa9289	Support having word vectors data on GPU	2017-09-16 12:45:09 -05:00
Matthew Honnibal	7742a6d559	Add GloVe vectors reader	2017-09-01 16:39:22 +02:00
Matthew Honnibal	b8e1603cc4	Fix load fail for missing vectors	2017-08-19 22:07:00 +02:00
Matthew Honnibal	6a94648373	Fix serialization	2017-08-19 21:27:35 +02:00
Matthew Honnibal	1157294434	Improve vector handling	2017-08-19 20:35:33 +02:00
Matthew Honnibal	93fb8b64e9	Fix vector loading	2017-08-19 19:52:25 +02:00
Matthew Honnibal	3d049af563	Improve vectors to/from disk	2017-08-19 18:42:11 +02:00
Matthew Honnibal	19c495f451	Fix vectors deserialization	2017-08-19 04:33:03 +02:00
Matthew Honnibal	ed4fb991dc	Work on vectors loading	2017-08-18 20:45:48 +02:00

1 2

53 Commits