Commit Graph

9 Commits

Author SHA1 Message Date
Ines Montani
1e5b917d75 Fix formatting [ci skip] 2019-03-23 16:45:50 +01:00
Matthew Honnibal
6c783f8045 Bug fixes and options for TextCategorizer (#3472)
* Fix code for bag-of-words feature extraction

The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).

* Support 'bow' architecture for TextCategorizer

This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.

* Fix size limits in train_textcat example

* Explain architectures better in docs
2019-03-23 16:44:44 +01:00
Ines Montani
7ba3a5d95c 💫 Make serialization methods consistent (#3385)
* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani
296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani
d0b3af9222 Fix remaining inaccuracies in API docs (closes #2329) 2019-02-24 22:21:25 +01:00
Ines Montani
46ec5cdccc Update TextCategorizer docs 2019-02-24 13:11:57 +01:00
Ines Montani
c03cb1cc63 Improve built-in component API docs 2019-02-24 13:11:49 +01:00
Ines Montani
04b4df0ec9 Remove n_threads 2019-02-17 22:25:42 +01:00
Ines Montani
e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00