Commit Graph

1263 Commits

Author SHA1 Message Date
Matthew Honnibal
6c783f8045 Bug fixes and options for TextCategorizer (#3472)
* Fix code for bag-of-words feature extraction

The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).

* Support 'bow' architecture for TextCategorizer

This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.

* Fix size limits in train_textcat example

* Explain architectures better in docs
2019-03-23 16:44:44 +01:00
Ines Montani
06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani
dcd6e06c47 Improve landing example [ci skip] 2019-03-22 19:02:15 +01:00
Ines Montani
a841324034 Update landing example [ci skip] 2019-03-22 18:50:00 +01:00
Ines Montani
b532386a60 Fix typo [ci skip] 2019-03-22 18:36:17 +01:00
Ines Montani
d8533f0149 Update Binder [ci skip] 2019-03-22 18:16:46 +01:00
Christos Aridas
9cee3f702a Add missing space in landing page (#3462) [ci skip] 2019-03-22 15:17:35 +01:00
Ines Montani
5073ce63fd Merge branch 'spacy.io' [ci skip] 2019-03-22 15:17:11 +01:00
Ines Montani
0712efc6b3 Update version requirements [ci skip] 2019-03-21 10:23:54 +01:00
Ines Montani
764359c952 Merge branch 'master' into spacy.io 2019-03-20 17:24:28 +01:00
Ines Montani
dac8f8ff99 Update Span.__init__ docs (see #3445) [ci skip] 2019-03-20 17:24:17 +01:00
Ines Montani
f7b5ff7907 Move netlify.toml to root 2019-03-19 14:40:14 +01:00
Ines Montani
c6ee030721 Fix docsearch 2019-03-19 14:38:49 +01:00
Ines Montani
0155083e01 Update netlify.toml 2019-03-19 14:07:00 +01:00
Ines Montani
d4eed4a84f Add note on unicode build to troubleshooting guide (see #3421) [ci skip] 2019-03-19 10:27:02 +01:00
Ines Montani
42d4b818e4 Redirect Netlify URL 2019-03-19 10:17:56 +01:00
Ines Montani
1ee97bc282 Add page title fallback, just in case 2019-03-18 18:58:55 +01:00
Ines Montani
728ae7651b Fix universe page titles if no separate title is set 2019-03-18 18:58:46 +01:00
Ines Montani
a20d3772fd FIx responsive landing 2019-03-18 16:24:52 +01:00
Ines Montani
08284f3a11
💫 v2.1.0 launch updates (only merge on launch!) (#3414)
* Update README.md

* Use production docsearch [ci skip]

* Add option to exclude pages from search
2019-03-18 16:07:26 +01:00
Ines Montani
a611b32fbf Update model docs [ci skip] 2019-03-17 11:48:18 +01:00
Matthew Honnibal
62afa64a8d Expose batch size and length caps on CLI for pretrain (#3417)
Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`.

Also improve CLI output.

Closes #3216 

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:38:45 +01:00
Ines Montani
2c5dd4d602 Update Vectors.find docs [ci skip] 2019-03-16 17:10:57 +01:00
Ines Montani
fa0f501165 Use dev DocSearch index 2019-03-15 14:48:38 +01:00
Ines Montani
8af7d01382 Fix general-purpose IDs 2019-03-15 14:48:26 +01:00
Ines Montani
cbcba699dd Fix missing ids 2019-03-14 17:56:53 +01:00
Ines Montani
cffe63ea24 Fix :target padding for ids 2019-03-14 17:41:02 +01:00
Ines Montani
51b7b88acf Generate active sidebar heading (h0) at compile time 2019-03-14 17:20:51 +01:00
Ines Montani
4ab1871a75 Add search-exclude classes 2019-03-14 16:51:29 +01:00
Ines Montani
59bbf85986 Add id to body 2019-03-14 16:51:18 +01:00
Ines Montani
6e07750dd8 Fix class name 2019-03-14 11:52:31 +01:00
Ines Montani
a0813b93e0 Server-side render is-active for crawler 2019-03-14 11:46:27 +01:00
Ines Montani
39ace04b55 Fix active style 2019-03-14 11:46:13 +01:00
Ines Montani
4cfe4aa224 Fix small issues in the docs [ci skip] 2019-03-12 22:57:15 +01:00
Ines Montani
ba7eb2d131 Update section [ci skip] 2019-03-12 16:18:34 +01:00
Ines Montani
cecc31b765 Don't auto-slugify accordion links [ci skip] 2019-03-12 15:30:49 +01:00
Ines Montani
d842d5698e Tidy up website and add eslint config [ci skip] 2019-03-12 15:21:58 +01:00
Ines Montani
72fb324d95 Add vector training script to bin [ci skip] 2019-03-12 12:07:56 +01:00
Ines Montani
3abf0e6b9f Replace dev-resources links with real examples 2019-03-12 12:07:40 +01:00
Ines Montani
59c0620487 Auto-format 2019-03-12 12:07:11 +01:00
Ines Montani
1664d1fa62 Update universe [ci skip] 2019-03-12 11:13:03 +01:00
Ines Montani
cdd418b93e Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
Matthew Honnibal
b0b990e405 Fix token.conjuncts (closes #795) (#3392)
* Implement conjuncts method

* Add span.conjuncts property

* Un-xfail token.conjuncts tests

* Update docs for token.conjuncts and span.conjuncts

* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Ines Montani
25cb764e64 Document new API [ci skip] 2019-03-11 15:23:53 +01:00
Ines Montani
ebcf2bb1c3 Add Doc.lang and Doc.lang_ 2019-03-11 14:21:40 +01:00
Ines Montani
7c05ca01e8 💫 Support mutable default values for extension attributes (#3389)
* Support mutable default values in extensions

* Update documentation
2019-03-11 12:50:44 +01:00
Matthew Honnibal
98acf5ffe4 💫 Allow passing of config parameters to specific pipeline components (#3386)
* Add component_cfg kwarg to begin_training

* Document component_cfg arg to begin_training

* Update docs and auto-format

* Support component_cfg across Language

* Format

* Update docs and docstrings [ci skip]

* Fix begin_training
2019-03-10 23:36:47 +01:00
Ines Montani
8dbf1e9037 Also fix #3387 on develop 2019-03-10 23:36:28 +01:00
Ines Montani
7ba3a5d95c 💫 Make serialization methods consistent (#3385)
* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani
9a8f169e5c Update v2-1.md 2019-03-10 18:58:51 +01:00