Commit Graph

6609 Commits

Author SHA1 Message Date
Matthew Honnibal
748d525801 Add more matcher operator tests 2017-10-16 13:38:01 +02:00
Matthew Honnibal
0433181658 Document operator semantics in Matcher docstring 2017-10-16 12:06:33 +02:00
Matthew Honnibal
2534cd57d7 Add bandaid solution to the 'shadowing' problem in #864 2017-10-09 08:59:35 +02:00
Matthew Honnibal
d8a2506023 Merge pull request #1401 from explosion/feature/add-parser-action
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f Merge pull request #1400 from explosion/feature/sentence-parsing
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
e79fc41ff8 Merge pull request #1391 from explosion/feature/multilabel-textcat
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal
fad2b8315f Merge branch 'develop' into feature/add-parser-action 2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d Fix tests for history features 2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d Add tests for adding parser actions 2017-10-09 03:42:35 +02:00
Matthew Honnibal
b2b8506f2c Remove whitespace 2017-10-09 03:35:57 +02:00
Matthew Honnibal
d43a83e37a Allow parser.add_label for pretrained models 2017-10-09 03:35:40 +02:00
Matthew Honnibal
81a64119db Fix string-to-unicode problem 2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119 Fix test 2017-10-09 00:29:37 +02:00
Matthew Honnibal
4cc84b0234 Prohibit Break when sent_start < 0 2017-10-09 00:02:45 +02:00
Matthew Honnibal
5a67efeccc Add tests for sentence segmentation presetting 2017-10-09 00:02:23 +02:00
Matthew Honnibal
e938bce320 Adjust parsing transition system to allow preset sentence segments. 2017-10-08 23:53:34 +02:00
Matthew Honnibal
080afd4924 Add ternary value setting to Token.sent_start 2017-10-08 23:51:58 +02:00
Matthew Honnibal
7ae67ec6a1 Add Span.as_doc method 2017-10-08 23:50:20 +02:00
Matthew Honnibal
20309fb9db Make history features default to zero 2017-10-08 20:32:14 +02:00
Matthew Honnibal
e74c8d2fad Merge remote-tracking branch 'origin/develop' into feature/sentence-parsing 2017-10-08 20:20:41 +02:00
Matthew Honnibal
18063803de Make TokenC.sent_tart an int, to allow ternary value 2017-10-08 19:58:54 +02:00
Matthew Honnibal
be4f0b6460 Update defaults 2017-10-08 02:08:12 -05:00
Matthew Honnibal
42b401d08b Change default hidden depth to 1 2017-10-07 21:05:21 -05:00
Matthew Honnibal
9d66a915da Update training defaults 2017-10-07 21:02:38 -05:00
Matthew Honnibal
d163115e91 Add non-linearity after history features 2017-10-07 21:00:43 -05:00
Matthew Honnibal
92c5d78b42 Unhack NER.add_action 2017-10-07 19:02:40 +02:00
Matthew Honnibal
f2b590f672 Increment version 2017-10-07 19:01:01 +02:00
Matthew Honnibal
eb0595bea9 Merge pull request #1392 from explosion/feature/parser-history-model
💫 Parser history features
2017-10-07 15:07:02 +02:00
ines
d70cf19158 Fix formatting 2017-10-07 15:06:38 +02:00
Ines Montani
36c68015f3 Merge pull request #1397 from explosion/feature/matcher-wildcard-token
💫 Allow empty dictionaries to match any token in Matcher
2017-10-07 15:05:24 +02:00
ines
c970b4f226 Add missing token attribute 2017-10-07 15:04:16 +02:00
ines
37f755897f Update rule-based matching docs 2017-10-07 15:04:09 +02:00
Matthew Honnibal
3d22ccf495 Update default hyper-parameters 2017-10-07 07:16:41 -05:00
Matthew Honnibal
e22067e3b5 Document new hyper-parameters 2017-10-07 07:10:10 -05:00
Matthew Honnibal
09442d25ec Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-07 07:05:04 -05:00
Matthew Honnibal
3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
Matthew Honnibal
8be46d766e Remove print statement 2017-10-06 16:19:02 -05:00
ines
3468d535ad Update model benchmarks 2017-10-06 21:39:06 +02:00
Matthew Honnibal
8e731009fe Fix parser config serialization 2017-10-06 13:50:52 -05:00
Matthew Honnibal
f4c9a98166 Fix spacy evaluate command on non-GPU 2017-10-06 13:17:47 -05:00
Matthew Honnibal
16ba6aa8a6 Fix parser config serialization 2017-10-06 13:17:31 -05:00
ines
96a4e79d13 Fix PhraseMatcher example 2017-10-06 18:22:10 +02:00
Matthew Honnibal
c66399d8ae Fix depth definition with history features 2017-10-06 06:20:05 -05:00
Matthew Honnibal
5c750a9c2f Reserve 0 for 'missing' in history features 2017-10-06 06:10:13 -05:00
Matthew Honnibal
fbba7c517e Pass dropout through to embed tables 2017-10-06 06:09:18 -05:00
Matthew Honnibal
21d11936fe Fix significant train/test skew error in history feats 2017-10-06 06:08:50 -05:00
Matthew Honnibal
555d8c8bff Fix beam history features 2017-10-05 22:21:50 -05:00
Matthew Honnibal
3db0a32fd6 Fix dropout for history features 2017-10-05 22:21:30 -05:00
Matthew Honnibal
b0618def8d Add support for 2-token state option 2017-10-05 21:54:12 -05:00
Matthew Honnibal
363aa47b40 Clean up dead parsing code 2017-10-05 21:53:49 -05:00