Commit Graph

706 Commits

Author SHA1 Message Date
Matthew Honnibal
09d61ada5e Merge pull request #1396 from explosion/feature/pipeline-management
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
f0f2739ae3 Add test for serialization issue raised in #1105 2017-10-10 03:57:58 +02:00
Matthew Honnibal
d8a2506023 Merge pull request #1401 from explosion/feature/add-parser-action
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f Merge pull request #1400 from explosion/feature/sentence-parsing
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
fad2b8315f Merge branch 'develop' into feature/add-parser-action 2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d Fix tests for history features 2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d Add tests for adding parser actions 2017-10-09 03:42:35 +02:00
Matthew Honnibal
81a64119db Fix string-to-unicode problem 2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119 Fix test 2017-10-09 00:29:37 +02:00
Matthew Honnibal
5a67efeccc Add tests for sentence segmentation presetting 2017-10-09 00:02:23 +02:00
Matthew Honnibal
3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0 Fix beam parse model test 2017-10-07 02:15:15 +02:00
ines
b38a8f4a94 Fix and update pipe methods tests 2017-10-07 02:06:23 +02:00
Matthew Honnibal
3a65a0c970 Start adding tests for new pipeline management 2017-10-07 01:48:23 +02:00
ines
61a503a611 Fix parser test 2017-10-07 00:38:51 +02:00
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Ines Montani
959c46eabe Merge pull request #1365 from wannaphongcom/develop
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4 fix thai test 2017-09-26 23:54:15 +07:00
Matthew Honnibal
41cc5c4c17 Merge branch 'develop' into feature/phrasematcher 2017-09-26 09:59:17 -05:00
Wannaphong Phatthiyaphaibun
5cba67146c add thai in spacy2 2017-09-26 21:36:27 +07:00
Matthew Honnibal
74f08e1ad5 Update test 2017-09-26 06:45:56 -05:00
Matthew Honnibal
20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal
cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5 Update matcher tests 2017-09-20 21:54:49 +02:00
Matthew Honnibal
c013e5996f Fix parser test 2017-09-17 13:13:20 -05:00
ines
ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
Matthew Honnibal
ebf8942564 Fix test for Python3 2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb Excuse emoji failure on narrow unicode builds 2017-09-16 16:21:13 +02:00
Matthew Honnibal
3fa5b40b5c Add test for hash consistency 2017-09-16 11:21:35 +02:00
Matthew Honnibal
456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal
497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal
5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal
d5fbf27335 Fix test 2017-09-04 16:45:11 +02:00
Matthew Honnibal
cb4839033c Fix loader for EN tests 2017-09-04 15:19:18 +02:00
Matthew Honnibal
644d6c9e1a Improve lemmatization tests, re #1296 2017-09-04 15:17:44 +02:00
Jim Geovedi
fbc62a09c7 added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0 added indonesian lang test 2017-08-20 12:17:14 +07:00
Jim Geovedi
fa544e6c9a Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-20 11:49:40 +07:00
Matthew Honnibal
41c2218c53 Fix test for vectors 2017-08-19 22:09:12 +02:00
Matthew Honnibal
ef87562741 Restore vectors test utils 2017-08-19 20:35:16 +02:00
Matthew Honnibal
1391f9da37 Restore vectors tests 2017-08-19 20:34:58 +02:00
Matthew Honnibal
d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00
Matthew Honnibal
4fda02c7e6 Add test for new Span.to_array method 2017-08-19 16:24:38 +02:00
Matthew Honnibal
c606b4a42c Add test for Doc.char_span 2017-08-19 16:18:23 +02:00
Matthew Honnibal
42d47c1e5c Fix tagger serialization 2017-08-19 04:16:32 +02:00