spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-11 08:42:28 +03:00

Author	SHA1	Message	Date
Ines Montani	323fc26880	Tidy up and format remaining files	2018-11-30 17:43:08 +01:00
Ines Montani	eddeb36c96	💫 Tidy up and auto-format .py files (#2983 ) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files. - [x] Update flake8 config to exclude very large files (lemmatization tables etc.) - [x] Update code to be compatible with flake8 rules - [x] Fix various small bugs, inconsistencies and messy stuff in the language data - [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means) Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results. At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information. ### Types of change enhancement, code style ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-30 17:03:03 +01:00
Matthew Honnibal	ef0820827a	Update hyper-parameters after NER random search (#2972 ) These experiments were completed a few weeks ago, but I didn't make the PR, pending model release. Token vector width: 128->96 Hidden width: 128->64 Embed size: 5000->2000 Dropout: 0.2->0.1 Updated optimizer defaults (unclear how important?) This should improve speed, model size and load time, while keeping similar or slightly better accuracy. The tl;dr is we prefer to prevent over-fitting by reducing model size, rather than using more dropout.	2018-11-27 18:49:52 +01:00
Matthew Honnibal	2527ba68e5	Fix tensorizer	2018-11-02 23:29:54 +00:00
Matthew Honnibal	99a6011580	Avoid adding empty layer in model, to keep models backwards compatible	2018-09-14 22:51:58 +02:00
Matthew Honnibal	afeddfff26	Fix PyTorch BiLSTM	2018-09-13 22:54:34 +00:00
Matthew Honnibal	45032fe9e1	Support option of BiLSTM in Tok2Vec (requires pytorch)	2018-09-13 19:28:35 +02:00
Matthew Honnibal	4d2d7d5866	Fix new feature flags	2018-08-27 02:12:39 +02:00
Matthew Honnibal	8051136d70	Support subword_features and conv_depth params in Tok2Vec	2018-08-27 01:50:48 +02:00
Matthew Honnibal	401213fb1f	Only warn about unnamed vectors if non-zero sized.	2018-05-19 18:51:55 +02:00
Matthew Honnibal	2338e8c7fc	Update develop from master	2018-05-02 01:36:12 +00:00
Matthew Honnibal	548bdff943	Update default Adam settings	2018-05-01 15:18:20 +02:00
Matthew Honnibal	2c4a6d66fa	Merge master into develop. Big merge, many conflicts -- need to review	2018-04-29 14:49:26 +02:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Matthew Honnibal	4555e3e251	Dont assume pretrained_vectors cfg set in build_tagger	2018-03-28 20:12:45 +02:00
Matthew Honnibal	f8dd905a24	Warn and fallback if vectors have no name	2018-03-28 18:24:53 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
Matthew Honnibal	1f7229f40f	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `c9ba3d3c2d`, reversing changes made to `92c26a35d4`.	2018-03-27 19:23:02 +02:00
Matthew Honnibal	6e641f46d4	Create a preprocess function that gets bigrams	2017-11-12 00:43:41 +01:00
Matthew Honnibal	d5537e5516	Work on Windows test failure	2017-11-08 13:25:18 +01:00
Matthew Honnibal	1d5599cd28	Fix dtype	2017-11-08 12:18:32 +01:00
Matthew Honnibal	a8b592783b	Make a dtype more specific, to fix a windows build	2017-11-08 11:24:35 +01:00
Matthew Honnibal	13336a6197	Fix Adam import	2017-11-06 14:25:37 +01:00
Matthew Honnibal	2eb11d60f2	Add function create_default_optimizer to spacy._ml	2017-11-06 14:11:59 +01:00
Matthew Honnibal	33bd2428db	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 13:29:56 +01:00
Matthew Honnibal	c9b118a7e9	Set softmax attr in tagger model	2017-11-03 11:22:01 +01:00
Matthew Honnibal	b3264aa5f0	Expose the softmax layer in the tagger model, to allow setting tensors	2017-11-03 11:19:51 +01:00
Matthew Honnibal	6771780d3f	Fix backprop of padding variable	2017-11-03 01:54:34 +01:00
Matthew Honnibal	260e6ee3fb	Improve efficiency of backprop of padding variable	2017-11-03 00:49:11 +01:00
Matthew Honnibal	e85e31cfbd	Fix backprop of d_pad	2017-11-01 19:27:26 +01:00
Matthew Honnibal	d17a12c71d	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:38:26 +01:00
Matthew Honnibal	9f9439667b	Don't create low-data text classifier if no vectors	2017-11-01 16:34:09 +01:00
Matthew Honnibal	8075726838	Restore vector usage in models	2017-10-31 19:21:17 +01:00
Matthew Honnibal	cb5217012f	Fix vector remapping	2017-10-31 11:40:46 +01:00
Matthew Honnibal	ce876c551e	Fix GPU usage	2017-10-31 02:33:34 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Matthew Honnibal	3b91097321	Whitespace	2017-10-28 17:05:11 +00:00
Matthew Honnibal	6ef72864fa	Improve initialization for hidden layers	2017-10-28 17:05:01 +00:00
Matthew Honnibal	df4803cc6d	Add learned missing values for parser	2017-10-28 16:45:14 +00:00
Matthew Honnibal	64e4ff7c4b	Merge 'tidy-up' changes into branch. Resolve conflicts	2017-10-28 13:16:06 +02:00
Explosion Bot	b22e42af7f	Merge changes to parser and _ml	2017-10-28 11:52:10 +02:00
ines	d96e72f656	Tidy up rest	2017-10-27 21:07:59 +02:00
ines	e33b7e0b3c	Tidy up parser and ML	2017-10-27 14:39:30 +02:00
Matthew Honnibal	531142a933	Merge remote-tracking branch 'origin/develop' into feature/better-parser	2017-10-27 12:34:48 +00:00
Matthew Honnibal	c9987cf131	Avoid use of numpy.tensordot	2017-10-27 10:18:36 +00:00
Matthew Honnibal	f6fef30adc	Remove dead code from spacy._ml	2017-10-27 10:16:41 +00:00
ines	4eb5bd02e7	Update textcat pre-processing after to_array change	2017-10-27 00:32:12 +02:00
Matthew Honnibal	35977bdbb9	Update better-parser branch with develop	2017-10-26 00:55:53 +00:00
Matthew Honnibal	075e8118ea	Update from develop	2017-10-25 12:45:21 +02:00
ines	0b1dcbac14	Remove unused function	2017-10-25 12:08:46 +02:00

1 2 3 4

178 Commits