spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-19 06:49:14 +03:00

Author	SHA1	Message	Date
Ines Montani	1e5b917d75	Fix formatting [ci skip]	2019-03-23 16:45:50 +01:00
Matthew Honnibal	6c783f8045	Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs	2019-03-23 16:44:44 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00
Ines Montani	dcd6e06c47	Improve landing example [ci skip]	2019-03-22 19:02:15 +01:00
Ines Montani	a841324034	Update landing example [ci skip]	2019-03-22 18:50:00 +01:00
Ines Montani	b532386a60	Fix typo [ci skip]	2019-03-22 18:36:17 +01:00
Ines Montani	d8533f0149	Update Binder [ci skip]	2019-03-22 18:16:46 +01:00
Christos Aridas	9cee3f702a	Add missing space in landing page (#3462 ) [ci skip]	2019-03-22 15:17:35 +01:00
Ines Montani	5073ce63fd	Merge branch 'spacy.io' [ci skip]	2019-03-22 15:17:11 +01:00
Ines Montani	0712efc6b3	Update version requirements [ci skip]	2019-03-21 10:23:54 +01:00
Ines Montani	764359c952	Merge branch 'master' into spacy.io	2019-03-20 17:24:28 +01:00
Ines Montani	dac8f8ff99	Update Span.__init__ docs (see #3445 ) [ci skip]	2019-03-20 17:24:17 +01:00
Ines Montani	f7b5ff7907	Move netlify.toml to root	2019-03-19 14:40:14 +01:00
Ines Montani	c6ee030721	Fix docsearch	2019-03-19 14:38:49 +01:00
Ines Montani	0155083e01	Update netlify.toml	2019-03-19 14:07:00 +01:00
Ines Montani	d4eed4a84f	Add note on unicode build to troubleshooting guide (see #3421 ) [ci skip]	2019-03-19 10:27:02 +01:00
Ines Montani	42d4b818e4	Redirect Netlify URL	2019-03-19 10:17:56 +01:00
Ines Montani	1ee97bc282	Add page title fallback, just in case	2019-03-18 18:58:55 +01:00
Ines Montani	728ae7651b	Fix universe page titles if no separate title is set	2019-03-18 18:58:46 +01:00
Ines Montani	a20d3772fd	FIx responsive landing	2019-03-18 16:24:52 +01:00
Ines Montani	08284f3a11	💫 v2.1.0 launch updates (only merge on launch!) (#3414 ) * Update README.md * Use production docsearch [ci skip] * Add option to exclude pages from search	2019-03-18 16:07:26 +01:00
Ines Montani	a611b32fbf	Update model docs [ci skip]	2019-03-17 11:48:18 +01:00
Matthew Honnibal	62afa64a8d	Expose batch size and length caps on CLI for pretrain (#3417 ) Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`. Also improve CLI output. Closes #3216 ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-16 21:38:45 +01:00
Ines Montani	2c5dd4d602	Update Vectors.find docs [ci skip]	2019-03-16 17:10:57 +01:00
Ines Montani	fa0f501165	Use dev DocSearch index	2019-03-15 14:48:38 +01:00
Ines Montani	8af7d01382	Fix general-purpose IDs	2019-03-15 14:48:26 +01:00
Ines Montani	cbcba699dd	Fix missing ids	2019-03-14 17:56:53 +01:00
Ines Montani	cffe63ea24	Fix :target padding for ids	2019-03-14 17:41:02 +01:00
Ines Montani	51b7b88acf	Generate active sidebar heading (h0) at compile time	2019-03-14 17:20:51 +01:00
Ines Montani	4ab1871a75	Add search-exclude classes	2019-03-14 16:51:29 +01:00
Ines Montani	59bbf85986	Add id to body	2019-03-14 16:51:18 +01:00
Ines Montani	6e07750dd8	Fix class name	2019-03-14 11:52:31 +01:00
Ines Montani	a0813b93e0	Server-side render is-active for crawler	2019-03-14 11:46:27 +01:00
Ines Montani	39ace04b55	Fix active style	2019-03-14 11:46:13 +01:00
Ines Montani	4cfe4aa224	Fix small issues in the docs [ci skip]	2019-03-12 22:57:15 +01:00
Ines Montani	ba7eb2d131	Update section [ci skip]	2019-03-12 16:18:34 +01:00
Ines Montani	cecc31b765	Don't auto-slugify accordion links [ci skip]	2019-03-12 15:30:49 +01:00
Ines Montani	d842d5698e	Tidy up website and add eslint config [ci skip]	2019-03-12 15:21:58 +01:00
Ines Montani	72fb324d95	Add vector training script to bin [ci skip]	2019-03-12 12:07:56 +01:00
Ines Montani	3abf0e6b9f	Replace dev-resources links with real examples	2019-03-12 12:07:40 +01:00
Ines Montani	59c0620487	Auto-format	2019-03-12 12:07:11 +01:00
Ines Montani	1664d1fa62	Update universe [ci skip]	2019-03-12 11:13:03 +01:00
Ines Montani	cdd418b93e	Auto-format [ci skip]	2019-03-11 17:10:50 +01:00
Matthew Honnibal	b0b990e405	Fix token.conjuncts (closes #795 ) (#3392 ) * Implement conjuncts method * Add span.conjuncts property * Un-xfail token.conjuncts tests * Update docs for token.conjuncts and span.conjuncts * Fix merge error in token.conjuncts	2019-03-11 17:05:45 +01:00
Ines Montani	25cb764e64	Document new API [ci skip]	2019-03-11 15:23:53 +01:00
Ines Montani	ebcf2bb1c3	Add Doc.lang and Doc.lang_	2019-03-11 14:21:40 +01:00
Ines Montani	7c05ca01e8	💫 Support mutable default values for extension attributes (#3389 ) * Support mutable default values in extensions * Update documentation	2019-03-11 12:50:44 +01:00
Matthew Honnibal	98acf5ffe4	💫 Allow passing of config parameters to specific pipeline components (#3386 ) * Add component_cfg kwarg to begin_training * Document component_cfg arg to begin_training * Update docs and auto-format * Support component_cfg across Language * Format * Update docs and docstrings [ci skip] * Fix begin_training	2019-03-10 23:36:47 +01:00
Ines Montani	8dbf1e9037	Also fix #3387 on develop	2019-03-10 23:36:28 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00

1 2 3 4 5 ...

1264 Commits