Ines Montani
017bc2ef2f
Expose TextCategorizer via __all__
2018-11-18 00:06:13 +01:00
Ines Montani
b4581435f6
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-11-16 13:08:22 +01:00
Ines Montani
e2f75eb492
Fix message formatting
2018-11-16 13:08:20 +01:00
Matthew Honnibal
c89fd19f66
Hack broken pipe error for Python2
2018-11-16 02:22:05 +01:00
Matthew Honnibal
2874b8efd8
Fix tok2vec loading in spacy train
2018-11-15 23:34:54 +00:00
Matthew Honnibal
2ddd428834
Fix pretrain script
2018-11-15 23:34:35 +00:00
Matthew Honnibal
09a0227656
Temporarily add a script to load reddit
2018-11-15 23:18:35 +00:00
Matthew Honnibal
f8afaa0c1c
Fix pretrain
2018-11-15 22:46:53 +00:00
Matthew Honnibal
6af6950e46
Fix pretrain
2018-11-15 22:45:36 +00:00
Matthew Honnibal
3e7b214e57
Make pretrain script work with stream from stdin
2018-11-15 22:44:07 +00:00
Matthew Honnibal
8fdb9bc278
💫 Add experimental ULMFit/BERT/Elmo-like pretraining ( #2931 )
...
* Add 'spacy pretrain' command
* Fix pretrain command for Python 2
* Fix pretrain command
* Fix pretrain command
2018-11-15 22:17:16 +01:00
Ines Montani
e89708c3eb
💫 Allow matching non-ORTH attributes in PhraseMatcher ( #2925 )
...
* Allow matching non-orth attributes in PhraseMatcher (see #1971 )
Usage: PhraseMatcher(nlp.vocab, attr='POS')
* Allow attr argument to be int
* Fix formatting
* Fix typo
2018-11-15 03:00:58 +01:00
Matthew Honnibal
7ed9124a45
Fix Python2 error on example
2018-11-14 19:35:17 +01:00
Ines Montani
0d5b142c78
Fix typos and whitespace
2018-11-14 19:12:34 +01:00
Ines Montani
bd1b0e396a
Add deprecation warning for PhraseMatcher max_length
2018-11-14 19:10:46 +01:00
Ines Montani
64257bf3a7
Fix formatting
2018-11-14 19:10:21 +01:00
Ines Montani
b3cadd5b81
Delete _matcher2_notes.py
2018-11-14 16:19:12 +01:00
Matthew Honnibal
5fc98ade04
Set version to 2.1.0a2
2018-11-08 09:56:56 +01:00
Matthew Honnibal
09aa616182
Make pretraining script work without GPU
2018-11-04 17:09:52 +01:00
Matthew Honnibal
bc8cda818c
Improve pretrain textcat example
2018-11-04 00:17:09 +00:00
Matthew Honnibal
3e7a96f99d
Improve pretrain textcat example
2018-11-03 17:44:12 +00:00
Matthew Honnibal
c87c50af62
Rename new example
2018-11-03 13:09:46 +00:00
Matthew Honnibal
8e8ccc0f92
Work on pretraining script
2018-11-03 12:53:25 +00:00
Matthew Honnibal
ad44982f01
Fix dropout in tensorizer, update comment
2018-11-03 12:46:58 +00:00
Matthew Honnibal
0127f10ba3
Improve train tensorizer script
2018-11-03 10:54:20 +00:00
Matthew Honnibal
ba365ae1c9
Normalize gradient by number of words in tensorizer
2018-11-03 10:53:22 +00:00
Matthew Honnibal
dac3f1b280
Improve Tensorizer
2018-11-03 10:52:50 +00:00
Matthew Honnibal
baf7feae68
Add tensorizer training example
2018-11-02 23:30:06 +00:00
Matthew Honnibal
2527ba68e5
Fix tensorizer
2018-11-02 23:29:54 +00:00
Suraj Rajan
0bf14082a4
Added more constucts for dependency tree matcher ( #2836 )
2018-10-29 23:21:39 +01:00
Matthew Honnibal
817e1fc5e5
Fix out-of-bounds access in NER training
...
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 01:12:50 +02:00
Ines Montani
ea20b72c08
💫 Make like_num work for prefixed numbers ( #2808 )
...
* Only split + prefix if not numbers
* Make like_num work for prefixed numbers
* Add test for like_num
2018-10-01 10:49:14 +02:00
Matthew Honnibal
b39810d692
Fix copy_reg compatibility on _serialize module
2018-09-28 15:23:14 +02:00
Matthew Honnibal
f82f8ba5dd
Fix serialization when empty parser model. Closes #2482
2018-09-28 15:18:52 +02:00
Matthew Honnibal
d5a6c63b62
Add regression test for #2482
2018-09-28 15:18:30 +02:00
Matthew Honnibal
e3e9fe18d4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-09-28 14:27:35 +02:00
Matthew Honnibal
0323f5be0c
Fix _serialize module
2018-09-28 14:27:24 +02:00
Ines Montani
5d56eb70d7
Tidy up tests
2018-09-27 16:41:57 +02:00
Ines Montani
1f1bab9264
Remove unused import
2018-09-27 16:41:37 +02:00
Matthew Honnibal
b42c123e5d
Fix regression introduced by 1759abf1e
2018-09-25 11:08:58 +02:00
Matthew Honnibal
500898907b
Fix regression in parser.begin_training()
2018-09-25 11:08:31 +02:00
Matthew Honnibal
1759abf1e5
Fix bug in sentence starts for non-projective parses
...
The set_children_from_heads function assumed parse trees were
projective. However, non-projective parses may be passed in during
deserialization, or after deprojectivising. This caused incorrect
sentence boundaries to be set for non-projective parses. Close #2772 .
2018-09-19 14:50:06 +02:00
Matthew Honnibal
48fd36bf05
Fix test for issue 27772
2018-09-19 14:47:27 +02:00
Matthew Honnibal
6cd920e088
Add xfail test for deprojectivization SBD bug
2018-09-19 14:00:31 +02:00
Matthew Honnibal
99a6011580
Avoid adding empty layer in model, to keep models backwards compatible
2018-09-14 22:51:58 +02:00
Matthew Honnibal
c046392317
Trigger on_data hooks in parser model
2018-09-14 20:51:21 +02:00
Matthew Honnibal
5afd98dff5
Add a stepping function, for changing batch sizes or learning rates
2018-09-14 18:37:16 +02:00
Matthew Honnibal
27c00f4f22
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-09-14 12:30:57 +02:00
Matthew Honnibal
f32b52e611
Fix bug that caused deprojectivisation to run multiple times
2018-09-14 12:12:54 +02:00
Matthew Honnibal
8f2a6367e9
Fix usage of PyTorch BiLSTM in ud_train
2018-09-13 22:54:59 +00:00