spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-10 10:41:14 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	e79fc41ff8	Merge pull request #1391 from explosion/feature/multilabel-textcat 💫 Fix multi-label support for text classification	2017-10-09 04:22:31 +02:00
Matthew Honnibal	6c79841c0d	Fix tests for history features	2017-10-09 04:12:24 +02:00
Matthew Honnibal	be4f0b6460	Update defaults	2017-10-08 02:08:12 -05:00
Matthew Honnibal	42b401d08b	Change default hidden depth to 1	2017-10-07 21:05:21 -05:00
Matthew Honnibal	9d66a915da	Update training defaults	2017-10-07 21:02:38 -05:00
Matthew Honnibal	d163115e91	Add non-linearity after history features	2017-10-07 21:00:43 -05:00
Matthew Honnibal	92c5d78b42	Unhack NER.add_action	2017-10-07 19:02:40 +02:00
Matthew Honnibal	f2b590f672	Increment version	2017-10-07 19:01:01 +02:00
Matthew Honnibal	eb0595bea9	Merge pull request #1392 from explosion/feature/parser-history-model 💫 Parser history features	2017-10-07 15:07:02 +02:00
ines	d70cf19158	Fix formatting	2017-10-07 15:06:38 +02:00
Ines Montani	36c68015f3	Merge pull request #1397 from explosion/feature/matcher-wildcard-token 💫 Allow empty dictionaries to match any token in Matcher	2017-10-07 15:05:24 +02:00
ines	c970b4f226	Add missing token attribute	2017-10-07 15:04:16 +02:00
ines	37f755897f	Update rule-based matching docs	2017-10-07 15:04:09 +02:00
Matthew Honnibal	3d22ccf495	Update default hyper-parameters	2017-10-07 07:16:41 -05:00
Matthew Honnibal	e22067e3b5	Document new hyper-parameters	2017-10-07 07:10:10 -05:00
Matthew Honnibal	09442d25ec	Merge remote-tracking branch 'origin/develop' into feature/parser-history-model	2017-10-07 07:05:04 -05:00
Matthew Honnibal	3b67eabfea	Allow empty dictionaries to match any token in Matcher Often patterns need to match "any token". A clean way to denote this is with the empty dict {}: this sets no constraints on the token, so should always match. The problem was that having attributes length==0 was used as an end-of-array signal, so the matcher didn't handle this case correctly. This patch compiles empty token spec dicts into a constraint NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the lexeme -- so this always matches.	2017-10-07 03:36:15 +02:00
Matthew Honnibal	8be46d766e	Remove print statement	2017-10-06 16:19:02 -05:00
ines	3468d535ad	Update model benchmarks	2017-10-06 21:39:06 +02:00
Matthew Honnibal	8e731009fe	Fix parser config serialization	2017-10-06 13:50:52 -05:00
Matthew Honnibal	f4c9a98166	Fix spacy evaluate command on non-GPU	2017-10-06 13:17:47 -05:00
Matthew Honnibal	16ba6aa8a6	Fix parser config serialization	2017-10-06 13:17:31 -05:00
ines	96a4e79d13	Fix PhraseMatcher example	2017-10-06 18:22:10 +02:00
Matthew Honnibal	c66399d8ae	Fix depth definition with history features	2017-10-06 06:20:05 -05:00
Matthew Honnibal	5c750a9c2f	Reserve 0 for 'missing' in history features	2017-10-06 06:10:13 -05:00
Matthew Honnibal	fbba7c517e	Pass dropout through to embed tables	2017-10-06 06:09:18 -05:00
Matthew Honnibal	21d11936fe	Fix significant train/test skew error in history feats	2017-10-06 06:08:50 -05:00
Matthew Honnibal	555d8c8bff	Fix beam history features	2017-10-05 22:21:50 -05:00
Matthew Honnibal	3db0a32fd6	Fix dropout for history features	2017-10-05 22:21:30 -05:00
Matthew Honnibal	b0618def8d	Add support for 2-token state option	2017-10-05 21:54:12 -05:00
Matthew Honnibal	363aa47b40	Clean up dead parsing code	2017-10-05 21:53:49 -05:00
Matthew Honnibal	ca12764772	Enable history features for beam parser	2017-10-05 21:53:29 -05:00
Matthew Honnibal	fc06b0a333	Fix training when hist_size==0	2017-10-05 21:52:28 -05:00
Matthew Honnibal	e25ffcb11f	Move history size under feature flags	2017-10-05 19:38:13 -05:00
Matthew Honnibal	563f46f026	Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.	2017-10-05 18:43:02 -05:00
Matthew Honnibal	c36d4596bf	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-05 18:27:56 +02:00
Matthew Honnibal	056b08c0df	Delete obsolete nn_text_class example	2017-10-05 18:27:10 +02:00
Matthew Honnibal	c6cd81f192	Wrap try/except around model saving	2017-10-05 08:14:24 -05:00
Matthew Honnibal	5743b06e36	Wrap model saving in try/except	2017-10-05 08:12:50 -05:00
Matthew Honnibal	fd4baff475	Update tests	2017-10-05 08:12:27 -05:00
Matthew Honnibal	dcdfa071aa	Disable LayerNorm hack	2017-10-04 20:06:52 -05:00
Matthew Honnibal	943af4423a	Make depth setting in parser work again	2017-10-04 20:06:05 -05:00
Matthew Honnibal	bfabc333be	Merge remote-tracking branch 'origin/develop' into feature/parser-history-model	2017-10-04 20:00:36 -05:00
Matthew Honnibal	92066b04d6	Fix Embed and HistoryFeatures	2017-10-04 19:55:34 -05:00
ines	b621a2e964	Fix build emoji	2017-10-04 18:37:27 +02:00
Matthew Honnibal	5560c46a59	Update buildkite	2017-10-04 18:29:41 +02:00
Matthew Honnibal	e3c93f87a4	Update sdist	2017-10-04 18:18:07 +02:00
Matthew Honnibal	c4c7def9ce	Fix yml	2017-10-04 18:14:33 +02:00
Matthew Honnibal	71825f9737	Fix yml	2017-10-04 18:12:16 +02:00
Matthew Honnibal	6304c5e146	Fix yml	2017-10-04 18:08:34 +02:00

1 2 3 4 5 ...

6590 Commits