Matthew Honnibal
dd90fe09f5
Remove extraneous label from textcat class
2017-11-06 22:09:02 +01:00
Matthew Honnibal
8fea512ac8
Don't set tensor in textcat
2017-11-06 19:20:14 +01:00
Matthew Honnibal
75e1618ec3
Fix lemma clobbering
2017-11-06 16:56:19 +01:00
Matthew Honnibal
25859dbb48
Return optimizer from begin_training, creating if necessary
2017-11-06 14:26:49 +01:00
Matthew Honnibal
31babe3c3f
Fix non-clobbering lemmatization
2017-11-06 12:36:05 +01:00
Matthew Honnibal
2b35bb76ad
Fix tensorizer on GPU
2017-11-05 15:34:40 +01:00
uwol
a2162b8908
tensorizer return parameter fix
2017-11-05 12:25:10 +01:00
Matthew Honnibal
17c63906f9
Update tensorizer component
2017-11-03 20:20:26 +01:00
Matthew Honnibal
6681058abd
Fix tensor extending in tagger
2017-11-03 13:29:36 +01:00
Matthew Honnibal
d6fc39c8a6
Set Doc.tensor from Tagger
2017-11-03 11:20:05 +01:00
Matthew Honnibal
b30dd36179
Allow Tagger.add_label() before training
2017-11-01 21:49:24 +01:00
Matthew Honnibal
b84d99b281
Revert tagger.add_label() changes, to fix model
2017-11-01 21:10:45 +01:00
Matthew Honnibal
f5855e539b
Fix tagger model loading
2017-11-01 20:42:36 +01:00
Matthew Honnibal
190522efd3
Fix tagger when some tags aren't in Morphology
2017-11-01 19:27:49 +01:00
Matthew Honnibal
7ae1aacdb8
Fix add_label methods
2017-11-01 17:06:43 +01:00
Matthew Honnibal
e7a9174877
Add add_label methods to Tagger and TextCategorizer
2017-11-01 16:32:44 +01:00
ines
ba5e646219
Tidy up pipeline
2017-10-27 20:29:08 +02:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
ines
9e372913e0
Remove old 'SP' condition in tag map
2017-10-26 16:11:57 +02:00
Matthew Honnibal
a8abc47811
Rename BaseThincComponent --> Pipe
2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
ed8da9b11f
Add missing return statement in SentenceSegmenter
2017-10-17 15:32:56 +02:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
8978212ee5
Patch serialization bug raised in #1105
2017-10-10 03:58:12 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
Matthew Honnibal
563f46f026
Fix multi-label support for text classification
...
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.
For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.
To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.
Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
5454b20cd7
Update thinc imports for 6.9
2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c
Fix thinc imports
2017-10-03 19:21:26 +02:00
Matthew Honnibal
66c388ee01
Remove unhelpful multitask objectives
2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a
Fix hard-coded vector width
2017-09-27 11:43:58 -05:00
Matthew Honnibal
defb68e94f
Update feature/noshare with recent develop changes
2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd
Use dep and ent multi-task objectives for parser'
2017-09-26 08:13:52 -05:00
Matthew Honnibal
18a27c7579
Fix typo in tensorizer serialization
2017-09-26 06:45:14 -05:00
Matthew Honnibal
bf917225ab
Allow multi-task objectives during training
2017-09-26 05:42:52 -05:00
ines
d2d35b63b7
Fix formatting
2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779
Add docstrings for Pipe API
2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7
Add docstrings for Pipe API
2017-09-25 16:20:49 +02:00
Matthew Honnibal
4348c479fc
Merge pre-trained vectors and noshare patches
2017-09-22 20:07:28 -05:00
Matthew Honnibal
386c1a5bd8
Fix tagger training
2017-09-23 02:58:06 +02:00
Matthew Honnibal
05596159bf
Fix serialization when pre-trained vectors
2017-09-22 15:33:27 -05:00
Matthew Honnibal
d9124f1aa3
Add link_vectors_to_models function
2017-09-22 09:38:22 -05:00
Matthew Honnibal
40a4873b70
Fix serialization of model options
2017-09-21 13:07:26 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
24e85c2048
Pass values for CNN maxout pieces option
2017-09-20 19:16:12 -05:00
Matthew Honnibal
b36a38f63d
Fix serialization of pretrained_dims property
2017-09-19 23:42:27 +02:00
Matthew Honnibal
40837b275d
Fix tensorizer with pretrained vectors
2017-09-18 18:05:38 -05:00
Matthew Honnibal
84e637e2e6
Pass option for pretrained vectors in pipeline
2017-09-16 12:46:02 -05:00
Matthew Honnibal
7fdafcc4c4
Fix config loading in tagger
2017-09-04 16:38:49 +02:00
Matthew Honnibal
382ce566eb
Fix deserialization bug
2017-09-04 15:19:01 +02:00
Matthew Honnibal
9e378bdac5
Fix textcat serialization
2017-09-02 15:17:20 +02:00