Matthew Honnibal
e7a9174877
Add add_label methods to Tagger and TextCategorizer
2017-11-01 16:32:44 +01:00
ines
ba5e646219
Tidy up pipeline
2017-10-27 20:29:08 +02:00
Ines Montani
4033e70c71
Merge pull request #1461 from explosion/feature/disable-pipes
...
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
ines
9e372913e0
Remove old 'SP' condition in tag map
2017-10-26 16:11:57 +02:00
Matthew Honnibal
a8abc47811
Rename BaseThincComponent --> Pipe
2017-10-26 12:40:40 +02:00
Matthew Honnibal
b0f3ea2200
Fix names of pipeline components
...
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder --> Tensorizer
NeuralLabeller --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
ed8da9b11f
Add missing return statement in SentenceSegmenter
2017-10-17 15:32:56 +02:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
8978212ee5
Patch serialization bug raised in #1105
2017-10-10 03:58:12 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
Matthew Honnibal
563f46f026
Fix multi-label support for text classification
...
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.
For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.
To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.
Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
5454b20cd7
Update thinc imports for 6.9
2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c
Fix thinc imports
2017-10-03 19:21:26 +02:00
Matthew Honnibal
66c388ee01
Remove unhelpful multitask objectives
2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a
Fix hard-coded vector width
2017-09-27 11:43:58 -05:00
Matthew Honnibal
defb68e94f
Update feature/noshare with recent develop changes
2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd
Use dep and ent multi-task objectives for parser'
2017-09-26 08:13:52 -05:00
Matthew Honnibal
18a27c7579
Fix typo in tensorizer serialization
2017-09-26 06:45:14 -05:00
Matthew Honnibal
bf917225ab
Allow multi-task objectives during training
2017-09-26 05:42:52 -05:00
ines
d2d35b63b7
Fix formatting
2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779
Add docstrings for Pipe API
2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7
Add docstrings for Pipe API
2017-09-25 16:20:49 +02:00
Matthew Honnibal
4348c479fc
Merge pre-trained vectors and noshare patches
2017-09-22 20:07:28 -05:00
Matthew Honnibal
386c1a5bd8
Fix tagger training
2017-09-23 02:58:06 +02:00
Matthew Honnibal
05596159bf
Fix serialization when pre-trained vectors
2017-09-22 15:33:27 -05:00
Matthew Honnibal
d9124f1aa3
Add link_vectors_to_models function
2017-09-22 09:38:22 -05:00
Matthew Honnibal
40a4873b70
Fix serialization of model options
2017-09-21 13:07:26 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
24e85c2048
Pass values for CNN maxout pieces option
2017-09-20 19:16:12 -05:00
Matthew Honnibal
b36a38f63d
Fix serialization of pretrained_dims property
2017-09-19 23:42:27 +02:00
Matthew Honnibal
40837b275d
Fix tensorizer with pretrained vectors
2017-09-18 18:05:38 -05:00
Matthew Honnibal
84e637e2e6
Pass option for pretrained vectors in pipeline
2017-09-16 12:46:02 -05:00
Matthew Honnibal
7fdafcc4c4
Fix config loading in tagger
2017-09-04 16:38:49 +02:00
Matthew Honnibal
382ce566eb
Fix deserialization bug
2017-09-04 15:19:01 +02:00
Matthew Honnibal
9e378bdac5
Fix textcat serialization
2017-09-02 15:17:20 +02:00
Matthew Honnibal
a3b69bcb3d
Add low_data mode in textcat
2017-09-02 14:56:30 +02:00
Matthew Honnibal
5e6a9e7dcc
Add rule-based SBD
2017-09-02 12:53:38 +02:00
Matthew Honnibal
c1d3ff517a
Track loss in tagger
2017-08-20 14:42:23 +02:00
Matthew Honnibal
ec482580b5
Restore changes to pipeline.pyx from nn-beam-parser branch
2017-08-18 22:02:35 +02:00
Matthew Honnibal
426f84937f
Resolve conflicts when merging new beam parsing stuff
2017-08-18 13:38:32 -05:00
Matthew Honnibal
1cb2f15d65
Clean up unused predict_confidences function
2017-08-16 18:22:26 -05:00
Matthew Honnibal
52c180ecf5
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit ea8de11ad5
, reversing
changes made to 08e443e083
.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
3e30712b62
Improve defaults
2017-08-12 19:24:17 -05:00
Matthew Honnibal
680043ebca
Improve efficiency of tagger.set_annotations for GPU
2017-08-12 08:54:21 -05:00
Matthew Honnibal
3cb8f06881
Fix NeuralLabeller
2017-08-06 14:15:14 +02:00
Matthew Honnibal
e9ab800e15
Fix tagging model
2017-08-06 01:50:08 +02:00
Matthew Honnibal
468c138ab3
WIP: Add fine-tuning logic to tagger model, re #1182
2017-08-06 01:13:23 +02:00
Matthew Honnibal
6780132821
Fix tagger loading
2017-07-25 19:41:11 +02:00
Matthew Honnibal
c4a81a47a4
Fix deserialization
2017-07-23 14:11:07 +02:00
Matthew Honnibal
4fe77bced2
Add cfg attr to pipeline components
2017-07-23 00:52:47 +02:00
Matthew Honnibal
a88a7deffe
Five save/load of textcat config
2017-07-23 00:33:43 +02:00
Matthew Honnibal
b55714d5d1
Make gold_tuples arg optional in begin_training
2017-07-22 20:04:43 +02:00
Matthew Honnibal
b3a749610e
Fix name of TextCategorizer
2017-07-22 01:14:07 +02:00
Matthew Honnibal
a231b56d40
Add text-classification hook to pipeline
2017-07-20 00:18:15 +02:00
Matthew Honnibal
d59fa32df1
Add experimental SimilarityHook omponent
2017-06-05 15:40:03 +02:00
Matthew Honnibal
b3b5521625
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-04 20:17:18 -05:00
Matthew Honnibal
7b2ede783d
Add SP tag to tag map if missing
2017-06-04 20:16:30 -05:00
Matthew Honnibal
516798e9fc
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-06-05 01:35:21 +02:00
Matthew Honnibal
193bf913c0
Set is_tagged=True after tagging
2017-06-05 01:35:07 +02:00
Matthew Honnibal
b78cc318c3
Fix loading of morphology exceptions
2017-06-04 16:34:32 -05:00
Matthew Honnibal
3680c51b8f
Avoid clobbering preset POS tags
2017-06-04 15:52:42 -05:00
ines
1b593bbd6d
Fix encoding on tagger serialization
2017-06-02 17:29:21 +02:00
Matthew Honnibal
5f4d328e2c
Fix serialization of tag_map in NeuralTagger
2017-06-02 10:18:37 -05:00
Matthew Honnibal
307d615c5f
Fix serialization for tagger when tag_map has changed
2017-06-01 12:18:36 -05:00
ines
7a2380f617
Rename "nn_tagger" to "tagger"
2017-06-01 17:37:53 +02:00
Matthew Honnibal
5eae3b9a1e
Fix to/from disk in tagger
2017-06-01 04:55:49 -05:00
Matthew Honnibal
53d00a0371
Move weight serialization to Thinc
2017-06-01 03:04:36 -05:00
Matthew Honnibal
ae8010b526
Move weight serialization to Thinc
2017-06-01 02:56:12 -05:00
Matthew Honnibal
33e5ec737f
Fix to/from disk methods
2017-05-31 13:43:10 +02:00
Matthew Honnibal
293d1b425b
Serialize in consistent order
2017-05-29 17:53:06 -05:00
Matthew Honnibal
6522ea6c8b
More serialization fixes. Still broken
2017-05-29 13:23:47 -05:00
Matthew Honnibal
aa4c33914b
Work on serialization
2017-05-29 08:40:45 -05:00
Matthew Honnibal
ff26aa6c37
Work on to/from bytes/disk serialization methods
2017-05-29 11:45:45 +02:00
Matthew Honnibal
6b019b0540
Update to/from bytes methods
2017-05-29 10:14:20 +02:00
Matthew Honnibal
6dad4117ad
Work on serialization for models
2017-05-29 01:37:57 +02:00
Matthew Honnibal
8a24c60c1e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-28 08:12:05 -05:00
Matthew Honnibal
bc97bc292c
Fix __call__ method
2017-05-28 08:11:58 -05:00
Matthew Honnibal
c1263a844b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-05-27 18:32:57 -05:00
Matthew Honnibal
9e711c3476
Divide d_loss by batch size
2017-05-27 18:32:46 -05:00
Matthew Honnibal
34bbad8e0e
Add __reduce__ methods on parser subclasses. Fixes pickling.
2017-05-27 15:46:06 -05:00
Matthew Honnibal
467bbeadb8
Add hidden layers for tagger
2017-05-24 20:09:51 -05:00
Matthew Honnibal
5b67bcbee0
Increase default embed size to 7500
2017-05-23 15:20:16 -05:00
Matthew Honnibal
3959d778ac
Revert "Revert "WIP on improving parser efficiency""
...
This reverts commit 532afef4a8
.
2017-05-23 03:06:53 -05:00
Matthew Honnibal
532afef4a8
Revert "WIP on improving parser efficiency"
...
This reverts commit bdaac7ab44
.
2017-05-23 03:05:25 -05:00
Matthew Honnibal
bdaac7ab44
WIP on improving parser efficiency
2017-05-23 02:59:31 -05:00
Matthew Honnibal
a7ee63c0ac
Fix labeller loss for unseen labels
2017-05-22 10:41:20 -05:00
Matthew Honnibal
83ffd16474
Fix offset calculation for other negative values
2017-05-22 08:00:53 -05:00
Matthew Honnibal
b45b4aa392
PseudoProjectivity --> nonproj
2017-05-22 05:17:44 -05:00
Matthew Honnibal
8d1e64be69
Add experimental NeuralLabeller
2017-05-22 04:51:08 -05:00
Matthew Honnibal
9b1b0742fd
Fix prediction for tok2vec
2017-05-22 04:51:08 -05:00
Matthew Honnibal
5db89053aa
Merge docstrings
2017-05-21 13:46:23 -05:00
Matthew Honnibal
180e5afede
Fix tokvecs flattening in pipeline
2017-05-21 09:05:34 -05:00
ines
99b631617d
Reformat docstrings
2017-05-21 13:32:15 +02:00
ines
d82ae9a585
Change "function" to "callable" in docs
2017-05-21 13:17:40 +02:00
Matthew Honnibal
3b7c108246
Pass tokvecs through as a list, instead of concatenated. Also fix padding
2017-05-20 13:23:32 -05:00
Matthew Honnibal
d52b65aec2
Revert "Move to contiguous buffer for token_ids and d_vectors"
...
This reverts commit 3ff8c35a79
.
2017-05-20 11:26:23 -05:00
Matthew Honnibal
3ff8c35a79
Move to contiguous buffer for token_ids and d_vectors
2017-05-20 04:17:30 -05:00
Matthew Honnibal
c12ab47a56
Remove state argument in pipeline. Other changes
2017-05-19 13:26:36 -05:00
ines
0fc05e54e4
Document TokenVectorEncoder
2017-05-19 00:00:02 +02:00
Matthew Honnibal
c2c825127a
Fix use_params and pipe methods
2017-05-18 08:30:59 -05:00