Justin DuJardin
|
eef9430f07
|
Add example for visualizing word vectors with TensorBoard Projector
Use:
```bash
python vectors_tensorboard.py en_core_web_lg ./output_folder spaCy_large
```
|
2018-03-23 12:49:01 -07:00 |
|
Matthew Honnibal
|
7441fce7ba
|
Fix undefined variable in conllu script
|
2018-02-26 14:59:56 +01:00 |
|
Matthew Honnibal
|
f0478635df
|
Fix Japanese tokenizer flag
|
2018-02-26 10:32:12 +01:00 |
|
Matthew Honnibal
|
5faae803c6
|
Add option to not use Janome for Japanese tokenization
|
2018-02-26 09:39:46 +01:00 |
|
Matthew Honnibal
|
9b406181cd
|
Add Chinese.Defaults.use_jieba setting, for UD
|
2018-02-25 15:12:38 +01:00 |
|
Matthew Honnibal
|
9e960d24fc
|
Refactor conllu script, fix interface, generalize
|
2018-02-25 14:54:47 +01:00 |
|
Matthew Honnibal
|
551c93fe01
|
Shuffle data after each epoch. Improve script
|
2018-02-25 13:35:32 +01:00 |
|
Matthew Honnibal
|
bdb0174571
|
Update conllu training script
|
2018-02-25 13:12:39 +01:00 |
|
Matthew Honnibal
|
e09070eca7
|
Refactor conllu script
|
2018-02-25 12:50:29 +01:00 |
|
Matthew Honnibal
|
44e496a82e
|
Refactor conllu script
|
2018-02-25 12:48:22 +01:00 |
|
Matthew Honnibal
|
c388833ca6
|
Minibatch by number of tokens, support other vectors, refactor CoNLL printing
|
2018-02-25 10:38:06 +01:00 |
|
Matthew Honnibal
|
dd78ef066a
|
Unset data size limit in conll script
|
2018-02-24 18:14:57 +01:00 |
|
Matthew Honnibal
|
8adeea3746
|
Generalize conllu script. Now handling Chinese (maybe badly)
|
2018-02-24 16:04:27 +01:00 |
|
Matthew Honnibal
|
329b14c9e6
|
Clean up conllu script
|
2018-02-24 10:31:53 +01:00 |
|
Matthew Honnibal
|
5be092ee72
|
CONLLU scoring 80.9% UAS with no oracle segments
|
2018-02-23 23:49:17 +01:00 |
|
Matthew Honnibal
|
23236340f4
|
Update CoNLL script. Don't preset SBD. Set batch size to 8, avoid writing twice
|
2018-02-22 21:35:50 +01:00 |
|
Matthew Honnibal
|
a26e399f84
|
Update conllu script
|
2018-02-22 19:43:54 +01:00 |
|
Matthew Honnibal
|
001e2ec6d6
|
Refactor CoNLL training script
|
2018-02-22 16:00:34 +01:00 |
|
Matthew Honnibal
|
6a27a4f77c
|
Set accelerating batch size in CONLL train script
|
2018-02-21 21:02:41 +01:00 |
|
Matthew Honnibal
|
4dc0fc9954
|
Replace labels that didn't make freq cutoff
|
2018-02-21 15:59:22 +01:00 |
|
Matthew Honnibal
|
97164b1763
|
Fix conllu script
|
2018-02-21 14:46:54 +01:00 |
|
Matthew Honnibal
|
24fb2c246f
|
Add script to do conllu training
|
2018-02-21 13:53:59 +01:00 |
|
Matthew Honnibal
|
00557c5fdd
|
Add example of NER multitask objective
|
2018-01-21 19:46:37 +01:00 |
|
avinash
|
b379c9d7d3
|
typos corrected
|
2018-01-03 16:54:22 +05:30 |
|
mpuels
|
1e8147aec7
|
fix: Add missing period in train data
|
2017-12-13 10:51:05 +01:00 |
|
mpuels
|
ee4d6fdd40
|
Fix typo in comment
|
2017-12-09 13:14:57 +01:00 |
|
ines
|
726fb2d0b5
|
Use fewer iterations by default to avoid overfitting on blank model (resolves #1632)
|
2017-11-23 15:27:12 +01:00 |
|
ines
|
ec08996000
|
Add note on tags matching tokenization (see #1613)
|
2017-11-20 15:12:47 +01:00 |
|
ines
|
1a38575de3
|
Make example Python 2 compatible (see #1617)
|
2017-11-20 13:57:51 +01:00 |
|
ines
|
7d5afadf5e
|
Update vectors_loc description
|
2017-11-17 14:57:11 +01:00 |
|
ines
|
c57e05bec1
|
Make sure nr_dim is an int
In some languages (e.g. Dutch), the nr_dim is extracted as a byte string, causing an error down the line.
|
2017-11-17 14:56:27 +01:00 |
|
yogendrasoni
|
334ed433b2
|
rstrip line before rsplit
loading english fast text giving error because line contains new line at the end and rsplit is splitting it incorrectly
|
2017-11-15 13:55:08 +05:30 |
|
Matthew Honnibal
|
f0e28e8ae5
|
Make fasttext reader accommodate whitespace
|
2017-11-12 12:07:13 +01:00 |
|
ines
|
f36fab39b0
|
Don't rename component in intent parser example (resolves #1551)
Otherwise, the default saved model won't know that it's supposed to create spaCy's 'parser'.
|
2017-11-10 23:35:38 +01:00 |
|
Ines Montani
|
1a23a0f87e
|
Remove broken link (resolves #1541)
|
2017-11-10 12:28:39 +01:00 |
|
ines
|
3597a29c24
|
Update fastText vectors example (see #1525)
Add option to specify language, and add note on "lang" being required to save out model
|
2017-11-09 14:54:39 +01:00 |
|
ines
|
33b84f4c39
|
Change clear_vectors to reset_vectors (resolves #1516)
|
2017-11-08 18:11:23 +01:00 |
|
ines
|
89bd40b821
|
Fix print statement in textcat training example (resolves #1515)
|
2017-11-08 17:17:40 +01:00 |
|
ines
|
a09c096d3c
|
Get docs ready for v2.0.0
|
2017-11-07 12:00:43 +01:00 |
|
ines
|
173b1551af
|
Update examples
|
2017-11-07 01:22:30 +01:00 |
|
ines
|
1b1c9105b4
|
Update example compatibility statements
|
2017-11-07 01:11:45 +01:00 |
|
ines
|
8fb48b9b91
|
Update and document new util functions
|
2017-11-07 00:22:43 +01:00 |
|
Matthew Honnibal
|
d7016d4050
|
Update intent parser example
|
2017-11-06 23:31:11 +01:00 |
|
ines
|
fe498b3d5e
|
Update training examples to use "simple style"
|
2017-11-06 23:14:04 +01:00 |
|
ines
|
c646365e2f
|
Port over changes and add note on compat (see #1445)
|
2017-11-06 13:58:34 +01:00 |
|
ines
|
2dca9e71a1
|
Add notes on catastrophic forgetting (see #1496)
|
2017-11-06 13:17:02 +01:00 |
|
Matthew Honnibal
|
717e8124fb
|
Update Keras sentiment analysis example
|
2017-11-05 17:11:00 +01:00 |
|
Matthew Honnibal
|
cfb83c231c
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-11-04 23:08:19 +01:00 |
|
Matthew Honnibal
|
ba0201de07
|
Update multiprocessing example
|
2017-11-04 23:07:57 +01:00 |
|
ines
|
70a9504560
|
Add inbetween print statement
|
2017-11-04 23:06:55 +01:00 |
|