Commit Graph

6706 Commits

Author SHA1 Message Date
Matthew Honnibal
563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
c36d4596bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-05 18:27:56 +02:00
Matthew Honnibal
056b08c0df Delete obsolete nn_text_class example 2017-10-05 18:27:10 +02:00
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
5743b06e36 Wrap model saving in try/except 2017-10-05 08:12:50 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
dcdfa071aa Disable LayerNorm hack 2017-10-04 20:06:52 -05:00
Matthew Honnibal
943af4423a Make depth setting in parser work again 2017-10-04 20:06:05 -05:00
Matthew Honnibal
bfabc333be Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-04 20:00:36 -05:00
Matthew Honnibal
92066b04d6 Fix Embed and HistoryFeatures 2017-10-04 19:55:34 -05:00
ines
b621a2e964 Fix build emoji 2017-10-04 18:37:27 +02:00
Matthew Honnibal
5560c46a59 Update buildkite 2017-10-04 18:29:41 +02:00
Matthew Honnibal
e3c93f87a4 Update sdist 2017-10-04 18:18:07 +02:00
Matthew Honnibal
c4c7def9ce Fix yml 2017-10-04 18:14:33 +02:00
Matthew Honnibal
71825f9737 Fix yml 2017-10-04 18:12:16 +02:00
Matthew Honnibal
6304c5e146 Fix yml 2017-10-04 18:08:34 +02:00
Matthew Honnibal
ff24b6d04a Fix yml 2017-10-04 18:05:45 +02:00
Matthew Honnibal
cc29e8b497 Add buildkite.yml for making sdists 2017-10-04 18:00:37 +02:00
Matthew Honnibal
d903986439 Increment version 2017-10-04 17:14:26 +02:00
Matthew Honnibal
fb75eb52f1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-04 16:37:00 +02:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
ines
bb13aa4bf3 Fix typos in PhraseMatcher docs 2017-10-04 16:12:09 +02:00
Matthew Honnibal
bd8e84998a Add nO attribute to TextCategorizer model 2017-10-04 16:07:30 +02:00
Matthew Honnibal
f8a0614527 Improve textcat model slightly 2017-10-04 15:15:53 +02:00
Matthew Honnibal
f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00
Matthew Honnibal
39798b0172 Uncomment layernorm adjustment hack 2017-10-04 15:12:09 +02:00
Matthew Honnibal
b3a7082bf8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-04 14:56:46 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
79a94bc166 Update textcat exampe 2017-10-04 14:55:30 +02:00
Matthew Honnibal
774f5732bd Fix dimensionality of textcat when no vectors available 2017-10-04 14:55:15 +02:00
Ines Montani
28ba0b9b51 Merge pull request #1385 from explosion/feature/new-website
💫 New spaCy website
2017-10-04 14:35:52 +02:00
ines
33cf9cecdd Port over changes from #1386 2017-10-04 13:34:03 +02:00
Matthew Honnibal
af75b74208 Unset LayerNorm backwards compat hack 2017-10-03 20:47:10 -05:00
ines
36ff525ff5 Add NER P and NER R scores to model overview 2017-10-04 00:37:15 +02:00
ines
15ec7ddd09 Add docs for new spacy evaluate command 2017-10-04 00:19:03 +02:00
ines
464f14019d Fix typos 2017-10-04 00:18:47 +02:00
ines
bfb512f45a Add website package.json and fix gitignore 2017-10-04 00:18:41 +02:00
ines
73ac0aa0b5 Update spacy evaluate and add displaCy option 2017-10-04 00:03:15 +02:00
Matthew Honnibal
246612cb53 Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-03 16:56:42 -05:00
Matthew Honnibal
f24c2e3a8a Fix evaluate for non-GPU 2017-10-03 22:47:31 +02:00
Matthew Honnibal
32b9f3d1a6 Require new thinc 2017-10-03 22:17:31 +02:00
Matthew Honnibal
2eb0fe4957 Fix setup.py 2017-10-03 21:40:04 +02:00
Matthew Honnibal
c69b0836a0 Fix fabfile 2017-10-03 21:31:41 +02:00
Matthew Honnibal
252299ca2a Add sdist command 2017-10-03 21:29:43 +02:00
Matthew Honnibal
5cbefcba17 Set backwards compatibility flag 2017-10-03 20:29:58 +02:00
Matthew Honnibal
5454b20cd7 Update thinc imports for 6.9 2017-10-03 20:07:17 +02:00
ines
80a2fb6193 Update visualizers docs and add submenu 2017-10-03 19:40:39 +02:00
Matthew Honnibal
4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Matthew Honnibal
cbb1fbef80 Update train_ner_standalone example 2017-10-03 18:49:38 +02:00
Matthew Honnibal
e514d6aa0a Import thinc modules more explicitly, to avoid cycles 2017-10-03 18:49:25 +02:00