Commit Graph

287 Commits

Author SHA1 Message Date
Matthew Honnibal
7ed9124a45
Fix Python2 error on example 2018-11-14 19:35:17 +01:00
Matthew Honnibal
09aa616182 Make pretraining script work without GPU 2018-11-04 17:09:52 +01:00
Matthew Honnibal
bc8cda818c Improve pretrain textcat example 2018-11-04 00:17:09 +00:00
Matthew Honnibal
3e7a96f99d Improve pretrain textcat example 2018-11-03 17:44:12 +00:00
Matthew Honnibal
c87c50af62 Rename new example 2018-11-03 13:09:46 +00:00
Matthew Honnibal
8e8ccc0f92 Work on pretraining script 2018-11-03 12:53:25 +00:00
Matthew Honnibal
0127f10ba3 Improve train tensorizer script 2018-11-03 10:54:20 +00:00
Matthew Honnibal
baf7feae68 Add tensorizer training example 2018-11-02 23:30:06 +00:00
Matthew Honnibal
5a4aeb96b7 Add example showing a fix-up rule for space entities 2018-10-28 16:06:00 +01:00
Ines Montani
4cd9ec0f00
💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->

## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.

### Types of change
enhancements

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-10 01:40:29 +02:00
John Stewart
9faea3ff10 Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example

* created contributor agreement

* baseline for Parikh model

* initial version of parikh 2016 implemented

* tested asymmetric models

* fixed grevious error in normalization

* use standard SNLI test file

* begin to rework parikh example

* initial version of running example

* start to document the new version

* start to document the new version

* Update Decompositional Attention.ipynb

* fixed calls to similarity

* updated the README

* import sys package duh

* simplified indexing on mapping word to IDs

* stupid python indent error

* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
2018-10-01 10:28:45 +02:00
John Stewart
2d15859d2a Fixed spaCy+Keras example (#2763)
* bug fixes in keras example

* created contributor agreement
2018-09-15 13:06:39 +02:00
Matthew Honnibal
4336397ecb Update develop from master 2018-08-14 03:04:28 +02:00
Matthew Honnibal
f762d52b24 Add example for Issue #2627 2018-08-05 13:33:52 +02:00
ines
4339f64128 Merge branch 'master' into develop 2018-07-19 16:15:03 +02:00
ines
d489ffb78b Fix formatting [ci skip] 2018-07-19 13:22:25 +02:00
himkt
57311d5d47 replace janome with mecab in the documentation and the test (#2415)
* Add links to Reddit data (see #2401)

* replace janome with mecab in the documentation and the test

* add the assignment
2018-06-11 00:33:13 +02:00
Ines Montani
3f2e3cbd27
Add links to Reddit data (see #2401) 2018-05-31 16:22:43 +02:00
Matthew Honnibal
546dd99cdf Merge master into develop -- mostly Arabic and website 2018-05-15 18:14:28 +02:00
Matt Upson
9a1d3b63fb Add missing default to .set_extension (#2297)
Failing to set a default, method, or getter results in a ValueError:

ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0
2018-05-04 18:47:01 +02:00
Matthew Honnibal
2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani
49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00
Matthew Honnibal
cca7e7ad11 Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-29 20:27:06 +02:00
Matthew Honnibal
68ad366935 Improve train_new_entity_type example 2018-03-29 20:26:41 +02:00
ines
07b8c255a5 Updatee example with note to install requests 2018-03-28 12:46:27 +02:00
Matthew Honnibal
1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Justin DuJardin
4eeb178856 Add example using TensorBoard standalone projector
- the tensorboard standalone project expects a different set of files than the plugin to TensorFlow.
2018-03-25 21:50:13 -07:00
ines
4ec2809eb5 Port over TensorBoard example 2018-03-24 17:15:48 +01:00
Matthew Honnibal
00557c5fdd Add example of NER multitask objective 2018-01-21 19:46:37 +01:00
avinash
b379c9d7d3 typos corrected 2018-01-03 16:54:22 +05:30
mpuels
1e8147aec7
fix: Add missing period in train data 2017-12-13 10:51:05 +01:00
mpuels
ee4d6fdd40
Fix typo in comment 2017-12-09 13:14:57 +01:00
ines
726fb2d0b5 Use fewer iterations by default to avoid overfitting on blank model (resolves #1632) 2017-11-23 15:27:12 +01:00
ines
ec08996000 Add note on tags matching tokenization (see #1613) 2017-11-20 15:12:47 +01:00
ines
1a38575de3 Make example Python 2 compatible (see #1617) 2017-11-20 13:57:51 +01:00
ines
7d5afadf5e Update vectors_loc description 2017-11-17 14:57:11 +01:00
ines
c57e05bec1 Make sure nr_dim is an int
In some languages (e.g. Dutch), the nr_dim is extracted as a byte string, causing an error down the line.
2017-11-17 14:56:27 +01:00
yogendrasoni
334ed433b2
rstrip line before rsplit
loading english fast text giving error because line contains new line at the end and rsplit is splitting it incorrectly
2017-11-15 13:55:08 +05:30
Matthew Honnibal
f0e28e8ae5
Make fasttext reader accommodate whitespace 2017-11-12 12:07:13 +01:00
ines
f36fab39b0 Don't rename component in intent parser example (resolves #1551)
Otherwise, the default saved model won't know that it's supposed to create spaCy's 'parser'.
2017-11-10 23:35:38 +01:00
Ines Montani
1a23a0f87e
Remove broken link (resolves #1541) 2017-11-10 12:28:39 +01:00
ines
3597a29c24 Update fastText vectors example (see #1525)
Add option to specify language, and add note on "lang" being required to save out model
2017-11-09 14:54:39 +01:00
ines
33b84f4c39 Change clear_vectors to reset_vectors (resolves #1516) 2017-11-08 18:11:23 +01:00
ines
89bd40b821 Fix print statement in textcat training example (resolves #1515) 2017-11-08 17:17:40 +01:00
ines
a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
ines
173b1551af Update examples 2017-11-07 01:22:30 +01:00
ines
1b1c9105b4 Update example compatibility statements 2017-11-07 01:11:45 +01:00
ines
8fb48b9b91 Update and document new util functions 2017-11-07 00:22:43 +01:00
Matthew Honnibal
d7016d4050 Update intent parser example 2017-11-06 23:31:11 +01:00
ines
fe498b3d5e Update training examples to use "simple style" 2017-11-06 23:14:04 +01:00
ines
c646365e2f Port over changes and add note on compat (see #1445) 2017-11-06 13:58:34 +01:00
ines
2dca9e71a1 Add notes on catastrophic forgetting (see #1496) 2017-11-06 13:17:02 +01:00
Matthew Honnibal
717e8124fb Update Keras sentiment analysis example 2017-11-05 17:11:00 +01:00
Matthew Honnibal
cfb83c231c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:08:19 +01:00
Matthew Honnibal
ba0201de07 Update multiprocessing example 2017-11-04 23:07:57 +01:00
ines
70a9504560 Add inbetween print statement 2017-11-04 23:06:55 +01:00
Matthew Honnibal
e033162a1d Update tagger training example 2017-11-01 21:49:08 +01:00
ines
8f1d3fc3ee Update textcat example 2017-11-01 17:09:22 +01:00
Matthew Honnibal
dad8f09fba Fix print statements in text classifier example 2017-11-01 16:34:31 +01:00
ines
bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines
0ca152a015 Fix syntax error 2017-11-01 00:43:28 +01:00
ines
4b196fdf7f Fix formatting 2017-11-01 00:43:22 +01:00
ines
33af6ac69a Use even smaller examle size
100 was still too much, so try 20 instead
2017-10-30 19:46:45 +01:00
ines
f02b0af821 Fix path and use smaller example size
500 was too larger and caused laggy rendering
2017-10-30 19:44:35 +01:00
ines
18dde7869a Update training data docs and add vocab JSONL 2017-10-30 19:40:05 +01:00
ines
b5643d8575 Update intent parser docs and add to usage docs 2017-10-27 04:49:05 +02:00
ines
9dfca0f2f8 Add example for custom intent parser 2017-10-27 03:55:11 +02:00
ines
4d272e25ee Fix examples 2017-10-27 03:55:04 +02:00
ines
44f83b35bc Update pipeline component examples to use plac 2017-10-27 02:58:14 +02:00
ines
af28ca1ba0 Move example to pipeline directory 2017-10-27 02:00:01 +02:00
ines
1d69a46cd4 Update multi-processing example and add to docs 2017-10-27 01:58:55 +02:00
ines
4eabaafd66 Update docstring and example 2017-10-27 01:50:44 +02:00
ines
ed69bd69f4 Update parallel tagging example 2017-10-27 01:48:52 +02:00
ines
096a80170d Remove old example files 2017-10-27 01:48:39 +02:00
ines
a7b9074b4c Update textcat training example and docs 2017-10-27 00:48:45 +02:00
ines
b61866a2e4 Update textcat example 2017-10-27 00:32:19 +02:00
ines
f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines
b7b285971f Update examples README 2017-10-26 18:47:11 +02:00
ines
cc2917c9e8 Update fastText example and add to examples in docs 2017-10-26 18:47:02 +02:00
ines
db843735d3 Remove outdated examples 2017-10-26 18:46:25 +02:00
ines
daed7ff8fe Update information extraction examples 2017-10-26 18:46:11 +02:00
ines
bca5372fb1 Clean up examples 2017-10-26 17:32:59 +02:00
ines
f57043e6fe Update docstring 2017-10-26 16:29:08 +02:00
ines
b90e958975 Update tagger and parser examples and add to docs 2017-10-26 16:27:42 +02:00
ines
f1529463a8 Update tagger training example 2017-10-26 16:19:02 +02:00
ines
e44bbb5361 Remove old example 2017-10-26 16:12:41 +02:00
ines
421c3837e8 Fix formatting 2017-10-26 16:11:25 +02:00
ines
4d896171ae Use plac annotations for arguments 2017-10-26 16:11:20 +02:00
ines
c3b681e5fb Use plac annotations for arguments and add n_iter 2017-10-26 16:11:05 +02:00
ines
bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines
b5c74dbb34 Update parser training example 2017-10-26 15:15:37 +02:00
ines
586b9047fd Use create_pipe instead of importing the entity recognizer 2017-10-26 15:15:26 +02:00
ines
d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines
9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines
e904075f35 Remove stray print statements 2017-10-26 14:24:00 +02:00
ines
c30258c3a2 Remove old example 2017-10-26 14:23:52 +02:00
ines
615c315d70 Update train_new_entity_type example to use disable_pipes 2017-10-25 14:56:53 +02:00
ines
2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines
9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00
ines
6675755005 Add training data JSON example 2017-10-24 12:05:10 +02:00