Matthew Honnibal
27c00f4f22
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-09-14 12:30:57 +02:00
Matthew Honnibal
f32b52e611
Fix bug that caused deprojectivisation to run multiple times
2018-09-14 12:12:54 +02:00
Matthew Honnibal
8f2a6367e9
Fix usage of PyTorch BiLSTM in ud_train
2018-09-13 22:54:59 +00:00
Matthew Honnibal
afeddfff26
Fix PyTorch BiLSTM
2018-09-13 22:54:34 +00:00
Matthew Honnibal
a26fe8e7bb
Small hack in Language.update to make torch work
2018-09-13 22:51:52 +00:00
Matthew Honnibal
445b81ce3f
Support bilstm_depth argument in ud-train
2018-09-13 19:30:22 +02:00
Matthew Honnibal
b43643a953
Support bilstm_depth option in parser
2018-09-13 19:29:49 +02:00
Matthew Honnibal
45032fe9e1
Support option of BiLSTM in Tok2Vec (requires pytorch)
2018-09-13 19:28:35 +02:00
Matthew Honnibal
3eb9f3e2b8
Fix defaults for ud-train
2018-09-13 18:05:48 +02:00
Matthew Honnibal
59cf533879
Improve ud-train script. Make config optional
2018-09-13 14:24:08 +02:00
Matthew Honnibal
3e3a309764
Fix tagger
2018-09-13 14:14:38 +02:00
Matthew Honnibal
da7650e84b
Fix maximum doc length in ud_train script
2018-09-13 14:10:25 +02:00
Matthew Honnibal
a95eea4c06
Fix multi-task objective for parser
2018-09-13 14:08:55 +02:00
Matthew Honnibal
21321cd6cf
Add tok2vec property to parser model
2018-09-13 14:08:43 +02:00
Matthew Honnibal
d6aa60139d
Fix tagger training on GPU
2018-09-13 14:05:37 +02:00
Matthew Honnibal
b2cb1fc67d
Merge matcher tests
2018-09-06 01:39:53 +02:00
Suraj Krishnan Rajan
356af7b0a1
Fix tests
2018-09-06 01:39:36 +02:00
Matthew Honnibal
4d2d7d5866
Fix new feature flags
2018-08-27 02:12:39 +02:00
Matthew Honnibal
598dbf1ce0
Fix character-based tokenization for Japanese
2018-08-27 01:51:38 +02:00
Matthew Honnibal
3763e20afc
Pass subword_features and conv_depth params
2018-08-27 01:51:15 +02:00
Matthew Honnibal
8051136d70
Support subword_features and conv_depth params in Tok2Vec
2018-08-27 01:50:48 +02:00
Matthew Honnibal
9c33d4d1df
Add more hyper-parameters to spacy ud-train
...
* subword_features: Controls whether subword features are used in the
word embeddings. True by default (specifically, prefix, suffix and word
shape). Should be set to False for languages like Chinese and Japanese.
* conv_depth: Depth of the convolutional layers. Defaults to 4.
2018-08-27 01:48:46 +02:00
Matthew Honnibal
51a9efbf3b
Add draft Binder class
2018-08-22 13:12:51 +02:00
Matthew Honnibal
f0e6be689a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-08-16 17:18:19 +02:00
Matthew Honnibal
5ce459d2ee
Fix error in vocab
2018-08-16 17:18:09 +02:00
Ines Montani
aeb49eb625
Update version [ci skip]
2018-08-16 16:56:02 +02:00
Ines Montani
a0eacd3293
Merge branch 'master' into develop
2018-08-16 16:55:05 +02:00
Ines Montani
c0fa9903f4
Update model directory JS [ci skip]
...
Prevent the default release URL from being overwritten and add license type
2018-08-16 16:54:50 +02:00
Ines Montani
03f661fefb
Add Greek to models directory [ci skip]
2018-08-16 16:51:56 +02:00
Matthew Honnibal
00febda2e3
Improve alignment around quotes
2018-08-16 01:04:34 +02:00
Matthew Honnibal
66a3f2ba21
Lower-case text before alignment
2018-08-16 00:42:36 +02:00
Matthew Honnibal
595c893791
Expose noise_level option in train CLI
2018-08-16 00:41:44 +02:00
Matthew Honnibal
8365226bf3
Fix lookup of symbols in vocab.
2018-08-15 23:43:34 +02:00
Matthew Honnibal
b9f0588580
Set version to v2.1.0a1
2018-08-15 17:22:39 +02:00
Matthew Honnibal
e968016417
Note link between issues #2671 and #2675
2018-08-15 17:18:28 +02:00
Matthew Honnibal
63bdc734ba
Skip flakey test
2018-08-15 16:56:55 +02:00
Matthew Honnibal
ce512e1d47
Fix #2671 : Incorrect match ID on some patterns
2018-08-15 16:19:08 +02:00
Matthew Honnibal
f12b9190f6
Xfail test for issue #2671
2018-08-15 15:55:31 +02:00
Matthew Honnibal
7cfa665ce6
Add failing test for issue 2671: Incorrect rule ID returned from matcher
2018-08-15 15:54:33 +02:00
Matthew Honnibal
1b2a5869ab
Set version to v2.1.0a2.dev0
2018-08-15 15:38:52 +02:00
Matthew Honnibal
5080760288
Add extra comment on 'add label' in parser
2018-08-15 15:37:24 +02:00
Matthew Honnibal
6e749d3c70
Skip flakey parser test
2018-08-15 15:37:04 +02:00
Ines Montani
fd9d175a53
Update live code [ci skip]
2018-08-15 15:28:48 +02:00
Matthew Honnibal
48ed1ca29d
Add branch option to push-tag script
2018-08-15 03:16:43 +02:00
Matthew Honnibal
6ea981c839
Add converter for jsonl NER data
2018-08-14 14:04:32 +02:00
Matthew Honnibal
a9fb6d5511
Fix docs2jsonl function
2018-08-14 14:03:48 +02:00
Matthew Honnibal
ea2edd1e2c
Merge branch 'feature/docs_to_json' into develop
2018-08-14 13:23:42 +02:00
Matthew Honnibal
6ec236ab08
Fix label-clobber bug in parser.begin_training()
...
The parser.begin_training() method was rewritten in v2.1. The rewrite
introduced a regression, where if you added labels prior to
begin_training(), these labels were discarded. This patch fixes that.
2018-08-14 13:20:19 +02:00
Matthew Honnibal
02c5c114d0
Fix usage of deprecated freqs.txt in init-model
2018-08-14 13:19:15 +02:00
Matthew Honnibal
2a5a61683e
Add function to get train format from Doc objects
...
Our JSON training format is annoying to work with, and we've wanted to
retire it for some time. In the meantime, we can at least add some
missing functions to make it easier to live with.
This patch adds a function that generates the JSON format from a list
of Doc objects, one per paragraph. This should be a convenient way to handle
a lot of data conversions: whatever format you have the source
information in, you can use it to setup a Doc object. This approach
should offer better future-proofing as well. Hopefully, we can steadily
rewrite code that is sensitive to the current data-format, so that it
instead goes through this function. Then when we change the data format,
we won't have such a problem.
2018-08-14 13:13:10 +02:00