Commit Graph

206 Commits

Author SHA1 Message Date
Matthew Honnibal
20c948361b Use local path in test_lemmatizer 2016-10-12 19:35:00 +02:00
Matthew Honnibal
1318d0bc65 Test with the non-loaded versions of the English and German pipelines. 2016-10-12 19:13:31 +02:00
Matthew Honnibal
bd7fe6420c Revert "Changes to test for new string-store"
This reverts commit 21e90d7d0b.
2016-09-30 20:11:01 +02:00
Matthew Honnibal
21e90d7d0b Changes to test for new string-store 2016-09-30 20:00:58 +02:00
Matthew Honnibal
81a47c01d8 Fix test for empty sentence string. 2016-09-27 19:21:22 +02:00
Matthew Honnibal
fc4a7ad794 Test and fix Issue #411: IndexError when .sents property is used on empty string. 2016-09-27 18:49:14 +02:00
Matthew Honnibal
3d370b7d45 Add test for Issue #445, fixed in 3cb4d455d, with improved lemmatizer logic 2016-09-27 18:39:46 +02:00
Matthew Honnibal
9c8ac91d72 Add test for Issue #435 2016-09-27 13:52:38 +02:00
Matthew Honnibal
e233328d38 Fix Issue #371: Lexeme objects were unhashable. 2016-09-27 13:22:30 +02:00
Matthew Honnibal
2debc4e0a2 Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class. 2016-09-26 11:57:54 +02:00
Matthew Honnibal
95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal
fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Matthew Honnibal
b00f683a0c Fix matcher test 2016-09-24 11:20:58 +02:00
Matthew Honnibal
939a791a52 Update tests 2016-09-24 01:17:03 +02:00
Matthew Honnibal
f6e587b1c7 Fix matcher tests 2016-09-21 20:45:20 +02:00
Matthew Honnibal
58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal
cc8bf62208 * Fix Issue #360: Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens. 2016-05-09 13:23:47 +02:00
Matthew Honnibal
5d86c30f0b * Fix Issue #367: Missing has_vector property on Doc and Span objects 2016-05-09 12:36:14 +02:00
Matthew Honnibal
26095f9722 * Add span.sent property, re Issue #366 2016-05-06 00:17:38 +02:00
Matthew Honnibal
a6a25166ba * Remove print from test 2016-05-05 11:10:59 +02:00
Matthew Honnibal
7441ca30ee * Add tests for Issue #361: Lexeme rich comparison 2016-05-05 01:31:58 +02:00
Matthew Honnibal
72564213e3 * Add test for Issue #309 2016-05-04 16:00:28 +02:00
Matthew Honnibal
76f1d871da Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-04 15:54:00 +02:00
Matthew Honnibal
b4bfc6ae55 * Add test for Issue #351: Indices off when leading whitespace 2016-05-04 15:53:17 +02:00
Wolfgang Seeker
a06fca9fdf German noun chunk iterator now doesn't return tokens more than once 2016-05-03 16:58:59 +02:00
Wolfgang Seeker
7825b75548 add tests for German noun chunker 2016-05-03 15:01:28 +02:00
Wolfgang Seeker
7b246c13cb reformulate noun chunk tests for English 2016-05-03 14:24:35 +02:00
Wolfgang Seeker
1786331cd8 add model sanity test 2016-05-03 12:51:47 +02:00
Matthew Honnibal
308a28c26c * Whitespace 2016-05-02 16:08:11 +02:00
Matthew Honnibal
c1c11a8ae0 * Fix formatting on serializer tests 2016-05-02 16:07:21 +02:00
Matthew Honnibal
902a389d85 * Fix merge conflict in test_parse 2016-05-02 15:28:07 +02:00
Matthew Honnibal
02c23cc1d0 * Fix sentence boundary test 2016-05-02 15:26:07 +02:00
Matthew Honnibal
d2f469b809 * Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct 2016-05-02 15:25:27 +02:00
Wolfgang Seeker
b11cbb06c6 remove old tests for sentence boundary detection 2016-05-02 14:36:35 +02:00
Matthew Honnibal
508fd1f6dc * Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples. 2016-05-02 14:25:10 +02:00
Wolfgang Seeker
fa961ea694 add tests for serialization bug 2016-05-02 11:01:56 +02:00
Wolfgang Seeker
1003e7ccec remove debug output from tests 2016-04-25 12:12:40 +02:00
Wolfgang Seeker
f57f843e85 fix bug in updating tree structure when introducing additional roots 2016-04-25 12:01:19 +02:00
Wolfgang Seeker
b6477fc4f4 adjusted tests to Travis Setup 2016-04-21 17:15:10 +02:00
Wolfgang Seeker
736ffcb9a2 remove whitespace 2016-04-21 16:55:55 +02:00
Wolfgang Seeker
6c7301cc6d the parser now introduces sentence boundaries properly when predicting dependents with root labels 2016-04-21 16:50:53 +02:00
Wolfgang Seeker
12024b0b0a bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Matthew Honnibal
2add5206aa * Fix description of matcher test 2016-04-17 15:40:21 +02:00
Matthew Honnibal
2b419d5b8c * Update test for Issue #242 2016-04-17 15:34:23 +02:00
Matthew Honnibal
f12b043308 * Add test for Issue #242: Overlapping matches not well recognised. 2016-04-17 15:19:17 +02:00
Matthew Honnibal
c0909afe22 Merge pull request #312 from wbwseeker/space_head_bug
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Matthew Honnibal
6f82065761 * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
Matthew Honnibal
0f957dd586 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-04-14 10:37:56 +02:00
Wolfgang Seeker
d99a9cbce9 different handling of space tokens
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal
04d0209be9 * Recognise multiple infixes in a token. 2016-04-13 18:38:26 +10:00
Henning Peters
a473d6e937 fix tests (use english model) 2016-04-12 16:41:57 +02:00
Matthew Honnibal
6df3858dbc * Fix Issue #323: Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition. 2016-04-12 13:17:59 +10:00
Wolfgang Seeker
80bea62842 bugfix in unit test 2016-04-08 16:46:44 +02:00
Matthew Honnibal
26622f0ffc Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-03-29 14:31:52 +11:00
Matthew Honnibal
b1fe41b45d * Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment. 2016-03-29 14:31:05 +11:00
Matthew Honnibal
9c73983bdd * Add test for hyphenation problem in Issue #302 2016-03-29 14:27:13 +11:00
Matthew Honnibal
4a37fdcee1 Merge pull request #287 from wbwseeker/deproj_sentbnd_bug
add function to Token for setting head and dep (and dep_)
2016-03-25 09:47:45 +11:00
Henning Peters
c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Wolfgang Seeker
46e3f979f1 add function for setting head and label to token
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Matthew Honnibal
963fe5258e * Add missing __contains__ method to vocab 2016-03-08 15:49:10 +00:00
Wolfgang Seeker
9d1e6de4a0 make a proper list from zip iterator 2016-03-03 19:51:01 +01:00
Wolfgang Seeker
49f9d1c085 change test_nonproj.py to not use zip inside numpy.asarray 2016-03-03 19:42:09 +01:00
Matthew Honnibal
fcaa0ad7ce Merge pull request #280 from wbwseeker/german_parser
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker
690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Wolfgang Seeker
3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Henning Peters
f3df736e0a remove unidecode-related test 2016-02-24 18:22:22 +01:00
Wolfgang Seeker
4b2297d5d4 add class PseudoProjective for pseudo-projective parsing
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Wolfgang Seeker
8d531c958b replace tests for non-projectivity
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Henning Peters
9d8966a2c0 Update test_tokenizer.py 2016-02-10 19:24:37 +01:00
Henning Peters
3b5f1e753b py26 compatibility 2016-02-10 14:32:54 +01:00
Henning Peters
ee1f1ac300 mark test_sentence_space() as model test 2016-02-10 07:49:11 +01:00
Matthew Honnibal
c6623889c1 * Add test for Issue #251: Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc 2016-02-06 23:47:51 +01:00
Matthew Honnibal
161b01d4c0 * Tweak usage example for multi-processing 2016-02-06 14:44:11 +01:00
Matthew Honnibal
7f24229f10 * Don't try to pickle the tokenizer 2016-02-06 14:09:05 +01:00
Matthew Honnibal
e66d45bf66 * Restore previous patch to Span.root, as it seems it wasn't the cause of the problem. 2016-02-06 13:37:41 +01:00
Matthew Honnibal
031b00cb91 * Fix Span.root calculation 2016-02-05 20:12:09 +01:00
Matthew Honnibal
1cf0100bf6 * Add test for multithreading 2016-02-05 19:38:22 +01:00
Matthew Honnibal
1ef84a0557 * Merge master into rethinc2 2016-02-05 12:55:59 +01:00
Matthew Honnibal
c0e63feccc * xfail pickle tests 2016-02-05 12:46:58 +01:00
Matthew Honnibal
48ce09687d * Skip pickling the vocab in the tests 2016-02-04 15:51:19 +01:00
Matthew Honnibal
ee975d36d0 * Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions 2016-02-04 13:02:25 +01:00
Matthew Honnibal
907e8cf07d * Add u prefix to string in web example 2016-01-25 15:51:38 +01:00
Matthew Honnibal
eba03695ef * Comment out pickle tests 2016-01-25 15:51:13 +01:00
Matthew Honnibal
de94e6c525 * Mark pickle tests as xfail, due to temp files problem 2016-01-25 15:24:17 +01:00
Matthew Honnibal
87172a15c6 * Fix runtime error bug that arose from updated Span.root function. 2016-01-25 15:22:42 +01:00
Matthew Honnibal
2c8dd91785 * Fix first code example on the website 2016-01-23 18:09:19 +01:00
Matthew Honnibal
82d011ac43 * Fix test for whitespace 2016-01-19 20:38:26 +01:00
Matthew Honnibal
e89069dcae * Fix matcher test 2016-01-19 20:24:01 +01:00
Matthew Honnibal
e1282b7f2f * Require user-custom NER classes to work without adding the label. 2016-01-19 20:11:03 +01:00
Matthew Honnibal
f0f92793f6 * Add test for user NER classes in matcher blocking the NER model. Re Issue #178 and Issue #217 2016-01-19 19:23:16 +01:00
Matthew Honnibal
515493c675 * Add xfail test for Issue #225: tokenization with non-whitespace delimiters 2016-01-19 13:20:14 +01:00
Matthew Honnibal
04177debd0 * Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184, but may cause further problems. Needs testing. 2016-01-19 02:54:15 +01:00
Matthew Honnibal
7893de3203 * Add test for Issue #184: Whitespace at sentence boundary causes sentence boundary error. 2016-01-18 23:04:38 +01:00
Matthew Honnibal
e825fd9554 * Make some of the website tests work without models 2016-01-18 18:14:44 +01:00
Matthew Honnibal
bed36ab0ff * Fix import of HEAD attribute 2016-01-18 17:34:43 +01:00
Matthew Honnibal
28c659c1fe * Fix import for numpy 2016-01-18 17:25:04 +01:00
Matthew Honnibal
fc36bcf458 * Fix import for English 2016-01-18 17:14:40 +01:00
Matthew Honnibal
cc4c335e14 * Set heads for test_merge_tokens, to make the test run without models 2016-01-18 17:00:11 +01:00
Matthew Honnibal
714cbc03d5 * Add test for Issue #203: nested noun chunks. 2016-01-16 18:02:30 +01:00