spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-02 18:06:46 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	e31ef9c7f6	Add some property vars for testing	2018-04-03 15:44:31 +02:00
Matthew Honnibal	e9d1e6d66b	Fix head alignment for split tokens	2018-04-03 02:32:09 +02:00
Matthew Honnibal	9c5c940441	Fix head alignment in GoldParse	2018-04-03 01:54:45 +02:00
Matthew Honnibal	06a5be9dfd	Fix handling of heads for undersegmented tokens	2018-04-03 00:55:05 +02:00
Matthew Honnibal	c8ba54e052	Fix Alignment class for undersegmentation	2018-04-02 23:39:26 +02:00
Matthew Honnibal	e6641a11b1	Refactor alignment into its own class	2018-04-02 21:54:29 +02:00
Matthew Honnibal	fb9c3984b5	Add GoldParse.resize_arrays method	2018-04-01 22:10:53 +02:00
Matthew Honnibal	cb6988f2f4	Fix comment in GoldParse	2018-04-01 22:10:26 +02:00
Matthew Honnibal	3d182fbc43	Represent fused tokens in GoldParse Entries in GoldParse.{words, heads, tags, deps, ner} can now be lists instead of single values, to handle getting the analysis for fused tokens. For instance, let's say we have a token like "hows", while the gold-standard has two tokens, ["how", "s"]. We need to store the gold data for each of the two subtokens. Example gold.words: [["how", "s"], "it", "going"] Things get more complicated for heads, as we need to address particular subtokens. Let's say the gold heads for ["how", "s", "it", "going"] is [1, 1, 3, 1], i.e. the root "s" is within the subtoken. The gold.heads list would be: [[(0, 1), (0, 1)], 2, (0, 1)] The tuples indicate token 0, subtoken 1. A helper method _flatten_fused_heads is available that unpacks the above to [1, 1, 3, 1].	2018-04-01 17:18:18 +02:00
Matthew Honnibal	728d9841c7	Allocate fused tokens array in GoldParseC	2018-04-01 13:43:56 +02:00
Matthew Honnibal	1f7229f40f	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `c9ba3d3c2d`, reversing changes made to `92c26a35d4`.	2018-03-27 19:23:02 +02:00
ines	c699aec089	Add offsets_from_biluo_tags helper and tests (see #1626 )	2017-11-26 16:38:01 +01:00
Matthew Honnibal	86ddf692a1	Fix bug in limit calculation on dev data	2017-11-14 01:37:10 +01:00
Matthew Honnibal	1cab703bba	Move minibatch function to util	2017-11-06 23:45:36 +01:00
ines	d96e72f656	Tidy up rest	2017-10-27 21:07:59 +02:00
ines	a6135336f5	Tidy up gold	2017-10-27 17:02:55 +02:00
Matthew Honnibal	6e552c9d83	Prune number of non-projective labels more aggressiely	2017-10-11 02:46:44 -05:00
Matthew Honnibal	563f46f026	Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.	2017-10-05 18:43:02 -05:00
Matthew Honnibal	ba23d63c35	Fix minibatch function, for fixed batch size	2017-09-14 13:37:41 +02:00
Matthew Honnibal	4bb6bc3f9e	Add support for sent_start to GoldParse	2017-08-25 20:03:14 -05:00
Matthew Honnibal	84b7ed49e4	Ensure updates aren't made if no gold available	2017-08-20 14:41:38 +02:00
Matthew Honnibal	ec63f4fe7b	Add option to control how missing entities are handled when getting NER tags	2017-07-29 21:58:37 +02:00
Matthew Honnibal	9bae0ddc50	Fix minibatching	2017-07-22 20:14:49 +02:00
Matthew Honnibal	ed6c85fa3c	Fix loading of text categories in GoldParse	2017-07-22 20:04:03 +02:00
Matthew Honnibal	7ea50182a5	Add support for text-classification labels to GoldParse	2017-07-20 00:17:47 +02:00
Matthew Honnibal	ebb6c49cd5	Make alignment case-insensitive for gold	2017-06-04 20:26:42 -05:00
Matthew Honnibal	fc4dd62e84	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-06-04 20:19:05 -05:00
Matthew Honnibal	a053b1218e	Fix item counting during training	2017-06-04 20:18:20 -05:00
Matthew Honnibal	9bc4a26213	Add option of data augmentation noise	2017-06-04 20:16:57 -05:00
Matthew Honnibal	f6955a459c	Fix prev commit	2017-06-03 14:38:37 -05:00
Matthew Honnibal	468ca6c760	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-06-03 14:33:51 -05:00
Matthew Honnibal	c647a0d33e	Fix training counter for gold preprocessing	2017-06-03 14:33:39 -05:00
Matthew Honnibal	e62f46d39f	Clarify gold.pyx slightly	2017-06-03 13:28:52 -05:00
Matthew Honnibal	be4a640f0c	Fix arc eager label costs for uint64	2017-05-30 20:37:58 +02:00
Matthew Honnibal	84e66ca6d4	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
Matthew Honnibal	d06f235fc9	Fix conflict on convert.py	2017-05-26 11:33:29 -05:00
Matthew Honnibal	2e587c6417	Export iob_to_biluo utility	2017-05-26 11:32:55 -05:00
Matthew Honnibal	daac3e3573	Always shuffle gold data, and support length cap	2017-05-26 11:30:52 -05:00
Matthew Honnibal	3a6e59cc53	Add minibatch function in spacy.gold	2017-05-25 17:15:09 -05:00
Matthew Honnibal	3959d778ac	Revert "Revert "WIP on improving parser efficiency"" This reverts commit `532afef4a8`.	2017-05-23 03:06:53 -05:00
Matthew Honnibal	532afef4a8	Revert "WIP on improving parser efficiency" This reverts commit `bdaac7ab44`.	2017-05-23 03:05:25 -05:00
Matthew Honnibal	bdaac7ab44	WIP on improving parser efficiency	2017-05-23 02:59:31 -05:00
Matthew Honnibal	c9760b2104	Support sentence limits in GoldCorpus	2017-05-22 10:40:46 -05:00
ines	54f04a9fe0	Update API docs with changes in spacy.gold and spacy.language	2017-05-22 12:29:30 +02:00
Matthew Honnibal	2a5eb9f61e	Make nonproj methods top-level functions, instead of class methods	2017-05-22 04:51:08 -05:00
Matthew Honnibal	025d9bbc37	Fix handling of non-projective deps	2017-05-22 04:51:08 -05:00
Matthew Honnibal	f13d6c7359	Support gold preprocessing and single gold files	2017-05-22 04:51:08 -05:00
Matthew Honnibal	5db89053aa	Merge docstrings	2017-05-21 13:46:23 -05:00
Matthew Honnibal	432b3499b3	Fix memory leak	2017-05-21 13:38:46 -05:00
Matthew Honnibal	4803b3b69e	Add GoldCorpus class, to manage data streaming	2017-05-21 09:06:17 -05:00

1 2

100 Commits