spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-14 05:37:03 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	ed8da9b11f	Add missing return statement in SentenceSegmenter	2017-10-17 15:32:56 +02:00
Matthew Honnibal	09d61ada5e	Merge pull request #1396 from explosion/feature/pipeline-management 💫 Improve pipeline and factory management	2017-10-10 04:29:54 +02:00
Matthew Honnibal	8978212ee5	Patch serialization bug raised in #1105	2017-10-10 03:58:12 +02:00
Matthew Honnibal	0384f08218	Trigger nonproj.deprojectivize as a postprocess	2017-10-07 02:00:47 +02:00
Matthew Honnibal	563f46f026	Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.	2017-10-05 18:43:02 -05:00
Matthew Honnibal	5454b20cd7	Update thinc imports for 6.9	2017-10-03 20:07:17 +02:00
Matthew Honnibal	4a59f6358c	Fix thinc imports	2017-10-03 19:21:26 +02:00
Matthew Honnibal	66c388ee01	Remove unhelpful multitask objectives	2017-09-27 11:44:16 -05:00
Matthew Honnibal	983201a83a	Fix hard-coded vector width	2017-09-27 11:43:58 -05:00
Matthew Honnibal	defb68e94f	Update feature/noshare with recent develop changes	2017-09-26 08:15:14 -05:00
Matthew Honnibal	ca28590ddd	Use dep and ent multi-task objectives for parser'	2017-09-26 08:13:52 -05:00
Matthew Honnibal	18a27c7579	Fix typo in tensorizer serialization	2017-09-26 06:45:14 -05:00
Matthew Honnibal	bf917225ab	Allow multi-task objectives during training	2017-09-26 05:42:52 -05:00
ines	d2d35b63b7	Fix formatting	2017-09-25 18:37:13 +02:00
Matthew Honnibal	8eb0b7b779	Add docstrings for Pipe API	2017-09-25 16:22:07 +02:00
Matthew Honnibal	39f390dba7	Add docstrings for Pipe API	2017-09-25 16:20:49 +02:00
Matthew Honnibal	4348c479fc	Merge pre-trained vectors and noshare patches	2017-09-22 20:07:28 -05:00
Matthew Honnibal	386c1a5bd8	Fix tagger training	2017-09-23 02:58:06 +02:00
Matthew Honnibal	05596159bf	Fix serialization when pre-trained vectors	2017-09-22 15:33:27 -05:00
Matthew Honnibal	d9124f1aa3	Add link_vectors_to_models function	2017-09-22 09:38:22 -05:00
Matthew Honnibal	40a4873b70	Fix serialization of model options	2017-09-21 13:07:26 -05:00
Matthew Honnibal	20193371f5	Don't share CNN, to reduce complexities	2017-09-21 14:59:48 +02:00
Matthew Honnibal	24e85c2048	Pass values for CNN maxout pieces option	2017-09-20 19:16:12 -05:00
Matthew Honnibal	b36a38f63d	Fix serialization of pretrained_dims property	2017-09-19 23:42:27 +02:00
Matthew Honnibal	40837b275d	Fix tensorizer with pretrained vectors	2017-09-18 18:05:38 -05:00
Matthew Honnibal	84e637e2e6	Pass option for pretrained vectors in pipeline	2017-09-16 12:46:02 -05:00
Matthew Honnibal	7fdafcc4c4	Fix config loading in tagger	2017-09-04 16:38:49 +02:00
Matthew Honnibal	382ce566eb	Fix deserialization bug	2017-09-04 15:19:01 +02:00
Matthew Honnibal	9e378bdac5	Fix textcat serialization	2017-09-02 15:17:20 +02:00
Matthew Honnibal	a3b69bcb3d	Add low_data mode in textcat	2017-09-02 14:56:30 +02:00
Matthew Honnibal	5e6a9e7dcc	Add rule-based SBD	2017-09-02 12:53:38 +02:00
Matthew Honnibal	c1d3ff517a	Track loss in tagger	2017-08-20 14:42:23 +02:00
Matthew Honnibal	ec482580b5	Restore changes to pipeline.pyx from nn-beam-parser branch	2017-08-18 22:02:35 +02:00
Matthew Honnibal	426f84937f	Resolve conflicts when merging new beam parsing stuff	2017-08-18 13:38:32 -05:00
Matthew Honnibal	1cb2f15d65	Clean up unused predict_confidences function	2017-08-16 18:22:26 -05:00
Matthew Honnibal	52c180ecf5	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `ea8de11ad5`, reversing changes made to `08e443e083`.	2017-08-14 13:00:23 +02:00
Matthew Honnibal	3e30712b62	Improve defaults	2017-08-12 19:24:17 -05:00
Matthew Honnibal	680043ebca	Improve efficiency of tagger.set_annotations for GPU	2017-08-12 08:54:21 -05:00
Matthew Honnibal	3cb8f06881	Fix NeuralLabeller	2017-08-06 14:15:14 +02:00
Matthew Honnibal	e9ab800e15	Fix tagging model	2017-08-06 01:50:08 +02:00
Matthew Honnibal	468c138ab3	WIP: Add fine-tuning logic to tagger model, re #1182	2017-08-06 01:13:23 +02:00
Matthew Honnibal	6780132821	Fix tagger loading	2017-07-25 19:41:11 +02:00
Matthew Honnibal	c4a81a47a4	Fix deserialization	2017-07-23 14:11:07 +02:00
Matthew Honnibal	4fe77bced2	Add cfg attr to pipeline components	2017-07-23 00:52:47 +02:00
Matthew Honnibal	a88a7deffe	Five save/load of textcat config	2017-07-23 00:33:43 +02:00
Matthew Honnibal	b55714d5d1	Make gold_tuples arg optional in begin_training	2017-07-22 20:04:43 +02:00
Matthew Honnibal	b3a749610e	Fix name of TextCategorizer	2017-07-22 01:14:07 +02:00
Matthew Honnibal	a231b56d40	Add text-classification hook to pipeline	2017-07-20 00:18:15 +02:00
Matthew Honnibal	d59fa32df1	Add experimental SimilarityHook omponent	2017-06-05 15:40:03 +02:00
Matthew Honnibal	b3b5521625	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-06-04 20:17:18 -05:00

1 2 3

126 Commits