Commit Graph

692 Commits

Author SHA1 Message Date
Matthew Honnibal
3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Ines Montani
959c46eabe Merge pull request #1365 from wannaphongcom/develop
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4 fix thai test 2017-09-26 23:54:15 +07:00
Matthew Honnibal
41cc5c4c17 Merge branch 'develop' into feature/phrasematcher 2017-09-26 09:59:17 -05:00
Wannaphong Phatthiyaphaibun
5cba67146c add thai in spacy2 2017-09-26 21:36:27 +07:00
Matthew Honnibal
74f08e1ad5 Update test 2017-09-26 06:45:56 -05:00
Matthew Honnibal
20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal
cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5 Update matcher tests 2017-09-20 21:54:49 +02:00
Matthew Honnibal
c013e5996f Fix parser test 2017-09-17 13:13:20 -05:00
ines
ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
Matthew Honnibal
ebf8942564 Fix test for Python3 2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb Excuse emoji failure on narrow unicode builds 2017-09-16 16:21:13 +02:00
Matthew Honnibal
3fa5b40b5c Add test for hash consistency 2017-09-16 11:21:35 +02:00
Matthew Honnibal
456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal
497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal
5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal
d5fbf27335 Fix test 2017-09-04 16:45:11 +02:00
Matthew Honnibal
cb4839033c Fix loader for EN tests 2017-09-04 15:19:18 +02:00
Matthew Honnibal
644d6c9e1a Improve lemmatization tests, re #1296 2017-09-04 15:17:44 +02:00
Jim Geovedi
fbc62a09c7 added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0 added indonesian lang test 2017-08-20 12:17:14 +07:00
Jim Geovedi
fa544e6c9a Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-20 11:49:40 +07:00
Matthew Honnibal
41c2218c53 Fix test for vectors 2017-08-19 22:09:12 +02:00
Matthew Honnibal
ef87562741 Restore vectors test utils 2017-08-19 20:35:16 +02:00
Matthew Honnibal
1391f9da37 Restore vectors tests 2017-08-19 20:34:58 +02:00
Matthew Honnibal
d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00
Matthew Honnibal
4fda02c7e6 Add test for new Span.to_array method 2017-08-19 16:24:38 +02:00
Matthew Honnibal
c606b4a42c Add test for Doc.char_span 2017-08-19 16:18:23 +02:00
Matthew Honnibal
42d47c1e5c Fix tagger serialization 2017-08-19 04:16:32 +02:00
Matthew Honnibal
2da96a0ec7 Fix beam test 2017-08-19 04:15:46 +02:00
Matthew Honnibal
a7309a217d Update tagger serialization 2017-08-18 23:12:05 +02:00
Matthew Honnibal
de7e8703e3 Restore tests for beam parser 2017-08-18 22:27:42 +02:00
Matthew Honnibal
52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
92ebab6073 Update beam-update tests 2017-08-13 08:56:02 +02:00
Matthew Honnibal
24b45b45c6 Add test for beam update 2017-08-12 17:15:28 -05:00
Matthew Honnibal
b353e4d843 Work on parser beam training 2017-08-12 14:47:45 -05:00
Jim Geovedi
cc4772cac2 reworks 2017-08-03 13:08:38 +07:00
Jim Geovedi
783f7d8b86 added test set for Indonesian language 2017-07-29 18:21:07 +07:00
Matthew Honnibal
d6a5c2c85a Add test for NER 2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da Add test for beam parsing 2017-07-22 01:48:35 +02:00
Matthew Honnibal
2424493970 Remove unnecessary import of Mock 2017-07-22 01:13:54 +02:00
Matthew Honnibal
289f23df51 Test beam parsing 2017-07-20 15:03:10 +02:00
Matthew Honnibal
f014138c11 Fix parser tests 2017-07-20 00:16:52 +02:00