Commit Graph

1177 Commits

Author SHA1 Message Date
Ines Montani
038002d616 Reformat HU tokenizer tests and adapt to general style
Improve readability of test cases and add conftest.py with fixture
2017-01-05 18:06:44 +01:00
Ines Montani
637f785036 Add general sanity tests for all tokenizers 2017-01-05 16:25:38 +01:00
Ines Montani
c5f2dc15de Move English tokenizer tests to directory /en 2017-01-05 16:25:04 +01:00
Ines Montani
8b45363b4d Modernize and merge general tokenizer tests 2017-01-05 13:17:05 +01:00
Ines Montani
02cfda48c9 Modernize and merge tokenizer tests for string loading 2017-01-05 13:16:55 +01:00
Ines Montani
a11f684822 Modernize and merge tokenizer tests for whitespace 2017-01-05 13:16:33 +01:00
Ines Montani
8b284fc6f1 Modernize and merge tokenizer tests for text from file 2017-01-05 13:15:52 +01:00
Ines Montani
2c2e878653 Modernize and merge tokenizer tests for punctuation 2017-01-05 13:14:16 +01:00
Ines Montani
8a74129cdf Modernize and merge tokenizer tests for prefixes/suffixes/infixes 2017-01-05 13:13:12 +01:00
Ines Montani
0e65dca9a5 Modernize and merge tokenizer tests for exception and emoticons 2017-01-05 13:11:31 +01:00
Ines Montani
34c47bb20d Fix formatting 2017-01-05 13:10:51 +01:00
Ines Montani
2e72683baa Add missing docstrings 2017-01-05 13:10:21 +01:00
Ines Montani
da10a049a6 Add unicode declarations 2017-01-05 13:09:48 +01:00
Ines Montani
58adae8774 Remove unused file 2017-01-05 13:09:22 +01:00
Ines Montani
c6e5a5349d Move regression test for #360 into own file 2017-01-04 00:49:31 +01:00
Ines Montani
8279993a6f Modernize and merge tokenizer tests for punctuation 2017-01-04 00:49:20 +01:00
Ines Montani
550630df73 Update tokenizer tests for contractions 2017-01-04 00:48:42 +01:00
Ines Montani
109f202e8f Update conftest fixture 2017-01-04 00:48:21 +01:00
Ines Montani
ee6b49b293 Modernize tokenizer tests for emoticons 2017-01-04 00:47:59 +01:00
Ines Montani
f09b5a5dfd Modernize tokenizer tests for infixes 2017-01-04 00:47:42 +01:00
Ines Montani
59059fed27 Move regression test for #351 to own file 2017-01-04 00:47:11 +01:00
Ines Montani
667051375d Modernize tokenizer tests for whitespace 2017-01-04 00:46:35 +01:00
Ines Montani
aafc894285 Modernize tokenizer tests for contractions
Use @pytest.mark.parametrize.
2017-01-03 23:02:21 +01:00
Ines Montani
fb9d3bb022 Revert "Merge remote-tracking branch 'origin/master'"
This reverts commit d3b181cdf1, reversing
changes made to b19cfcc144.
2017-01-03 18:21:36 +01:00
Matthew Honnibal
3ba7c167a8 Fix URL tests 2016-12-30 17:10:08 -06:00
Matthew Honnibal
9936a1b9b5 Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns 2016-12-30 14:53:40 -06:00
kengz
73a38bd4d1 Merge remote-tracking branch 'upstream/master' 2016-12-30 12:19:59 -05:00
kengz
da44183ae1 move parse_tree logic to a new tokens/printers.py file 2016-12-30 12:19:18 -05:00
Matthew Honnibal
3e8d9c772e Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
2016-12-31 00:52:17 +11:00
Gyorgy Orosz
45e045a87b Unicode/UTF8 compatibility for Python2 2016-12-24 00:21:00 +01:00
Gyorgy Orosz
72b61b6d03 Typo fix. 2016-12-24 00:10:29 +01:00
Gyorgy Orosz
1748549aeb Added exception pattern mechanism to the tokenizer. 2016-12-21 23:16:19 +01:00
Gyorgy Orosz
ab2f6ea46c Removed data files from tests.. 2016-12-21 20:22:09 +01:00
Gyorgy Orosz
3d5306acb9 Added further testcases. 2016-12-20 23:49:35 +01:00
Gyorgy Orosz
23956e72ff Improved partial support for tokenzing Hungarian numbers 2016-12-20 23:36:59 +01:00
Gyorgy Orosz
6add156075 Refactored language data structure 2016-12-20 22:28:20 +01:00
Gyorgy Orosz
366b3f8685 Merge branch 'master' into hu_tokenizer 2016-12-20 20:53:31 +01:00
Gyorgy Orosz
c035928156 Partial Hungarian number tokenization is added. 2016-12-20 20:46:20 +01:00
Matthew Honnibal
f38eb25fe1 Fix test for word vector 2016-12-18 23:31:55 +01:00
Matthew Honnibal
e4c951c153 Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data 2016-12-18 17:01:08 +01:00
Ines Montani
d1c1d3f9cd Fix tokenizer test 2016-12-18 16:55:32 +01:00
Matthew Honnibal
bdcecb3c96 Add import in regression test 2016-12-18 16:51:31 +01:00
Ines Montani
77cf2fb0f6 Remove unnecessary argument in test 2016-12-18 14:06:27 +01:00
Ines Montani
121c310566 Remove trailing whitespace 2016-12-18 14:06:27 +01:00
Matthew Honnibal
0595cc0635 Change test595 to mock data, instead of requiring model. 2016-12-18 13:28:51 +01:00
Ines Montani
f2c48ef504 Resolve stopwords conflict to merge Dutch 2016-12-17 13:08:16 +01:00
Janneke van der Zwaan
4a3fdcce8a Merge github.com:explosion/spaCy into dutch 2016-12-13 09:25:23 +01:00
Gyorgy Orosz
0cf2144d24 Adding partial hyphen and quote handling support. 2016-12-11 00:14:36 +01:00
Gyorgy Orosz
2051726fd3 Passing Hungatian abbrev tests. 2016-12-10 23:37:58 +01:00
Gyorgy Orosz
0289b8ceaa Additional abbreviation tests. 2016-12-08 12:17:44 +01:00
Gyorgy Orosz
5b00039955 First steps towards the Hungarian tokenizer code. 2016-12-07 23:07:43 +01:00
Ines Montani
8350d65695 Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Ines Montani
52e7d634df Remove trailing whitespace 2016-12-07 21:12:19 +01:00
Ines Montani
07f0efb102 Add test for tokenizer regular expressions 2016-12-07 20:33:28 +01:00
Matthew Honnibal
f6e356aada Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667 2016-12-02 11:05:50 +01:00
Janneke van der Zwaan
88869e0e07 Merge github.com:explosion/spaCy into dutch 2016-11-30 17:13:39 +01:00
Matthew Honnibal
6652f2a135 Test #656, #624: special case rules for tokenizer with attributes. 2016-11-25 12:44:13 +01:00
Matthew Honnibal
53d8ca8f51 Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries. 2016-11-25 11:34:30 +01:00
dafnevk
3db8b0d322 Added language class and some language data (with some TODOs) for Dutch 2016-11-24 15:56:38 +01:00
Matthew Honnibal
e01c1875ee Work on test for #615 2016-11-23 23:48:41 +01:00
Matthew Honnibal
e86f440ca6 Fix test for issue 617 2016-11-10 22:48:10 +01:00
Matthew Honnibal
faa7610c56 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-10 22:46:38 +01:00
Matthew Honnibal
a2c7de8329 spacy/tests/regression/test_issue617.py
Test Issue #617
2016-11-10 22:46:23 +01:00
tiago
2a3e342c1f Added a test case to cover the span.merge returning values 2016-11-09 18:57:50 +00:00
Dmitry Sadovnychyi
86c056ba64 Add basic test for PhraseMatcher
#613
2016-11-09 00:10:32 +08:00
Matthew Honnibal
3ea15b257f Fix test for 605 2016-11-06 11:59:26 +01:00
Matthew Honnibal
efe7790439 Test #590: Order dependence in Matcher rules. 2016-11-06 11:21:36 +01:00
Matthew Honnibal
75805397dd Test Issue #605 2016-11-06 10:42:32 +01:00
Matthew Honnibal
4a8a2b6001 Test #595 -- Bug in lemmatization of base forms. 2016-11-04 00:27:32 +01:00
Matthew Honnibal
72b9bd57ec Test Issue #588: Matcher accepts invalid, empty patterns. 2016-11-03 00:09:35 +01:00
Matthew Honnibal
b6b01d4680 Remove deprecated tokens_from_list test. 2016-11-02 23:47:21 +01:00
Matthew Honnibal
3d6c79e595 Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents. 2016-11-02 23:40:11 +01:00
Matthew Honnibal
125c910a8d Test Issue #600 2016-11-02 23:24:13 +01:00
Matthew Honnibal
80824f6d29 Fix test 2016-11-02 20:48:40 +01:00
Matthew Honnibal
c09a8ce5bb Add test for french tokenizer 2016-11-02 20:40:31 +01:00
Matthew Honnibal
b012ae3044 Add test for loading languages 2016-11-02 20:38:48 +01:00
Matthew Honnibal
d8db648ebf Add __init__.py file for regression tests 2016-11-01 13:45:06 +01:00
Matthew Honnibal
6977a2b8cd Add test for Issue #589 2016-11-01 12:33:36 +01:00
Matthew Honnibal
7e5f63a595 Improve test slightly 2016-10-28 17:41:16 +02:00
Matthew Honnibal
782e4814f4 Test Issue #587: Matcher segfaults on particular input 2016-10-28 16:38:32 +02:00
Matthew Honnibal
afea6505f3 Test Issue 429: No valid actions for NER after matcher adds a new entity label. 2016-10-27 18:01:34 +02:00
Matthew Honnibal
6c47048912 Fix test, after IOB tweak. 2016-10-26 17:22:03 +02:00
Matthew Honnibal
d3a617aa99 Test workaround for Issue #285: Streaming data memory growth 2016-10-24 13:48:06 +02:00
Matthew Honnibal
64e5f02cf7 Update test 2016-10-23 21:08:07 +02:00
Matthew Honnibal
66d7a6eca2 Update test 2016-10-23 21:02:05 +02:00
Matthew Honnibal
90bf797125 Update test 2016-10-23 20:54:17 +02:00
Matthew Honnibal
5e76320ffe Update test 2016-10-23 20:44:54 +02:00
Matthew Honnibal
aa105927f3 Update test 2016-10-23 20:31:25 +02:00
Matthew Honnibal
e120561294 Fix vector_norm test. 2016-10-23 19:56:16 +02:00
Matthew Honnibal
c05cd2356e Fix similarity test for Python 3 2016-10-23 18:16:56 +02:00
Matthew Honnibal
79aa03fe98 Test Issue #514: Serializer fails when new entity type has been added. 2016-10-23 17:41:44 +02:00
Matthew Honnibal
f97548c6f1 Fix broken test, re Issue #461 2016-10-23 17:02:23 +02:00
Matthew Honnibal
4de30a8e38 Test Issue #514: Serialization fails after adding a new entity label. 2016-10-23 16:40:27 +02:00
Matthew Honnibal
e99b3f5322 Test Issue #459: Fail to deserialize empty doc 2016-10-23 16:30:22 +02:00
Matthew Honnibal
99ff8b902f Test that huffman codec works with empty freqs dict 2016-10-23 16:27:45 +02:00
Matthew Honnibal
e5627134d9 Test Issue #461: ent_iob tag incorrect after setting entities. 2016-10-23 15:50:04 +02:00
Matthew Honnibal
2989072aac Add tests to verify that Issue #442 is fixed in 1.1 2016-10-23 14:33:13 +02:00
Matthew Honnibal
e838b6d53f Add tests for using the new Entity ID tracking in the rule matcher 2016-10-23 14:04:01 +02:00
Matthew Honnibal
e7af75e0a9 Add test for vector resizing, re Issue #544 2016-10-21 17:07:21 +02:00
Matthew Honnibal
c3a8a1cf51 Update serializer test. 2016-10-18 16:18:46 +02:00
Matthew Honnibal
7d446e5094 Revert "Update matcher test, to reflect character offset return instead of token offset."
This reverts commit f8d3e3bcfe.
2016-10-17 16:49:49 +02:00
Matthew Honnibal
4bf2c53c13 Revert "Hack on matcher tests, for new implementation."
This reverts commit dbe60644ab.
2016-10-17 16:49:48 +02:00
Matthew Honnibal
dbe60644ab Hack on matcher tests, for new implementation. 2016-10-17 16:12:22 +02:00
Matthew Honnibal
f8d3e3bcfe Update matcher test, to reflect character offset return instead of token offset. 2016-10-17 16:00:10 +02:00
Matthew Honnibal
be48a7b4f3 Fix conftest for website tests. 2016-10-17 01:54:26 +02:00
Matthew Honnibal
8951bf6989 Update matcher tests 2016-10-17 01:53:24 +02:00
Matthew Honnibal
0cf4aff470 Set default path in EN/DE tests. 2016-10-17 01:52:49 +02:00
Matthew Honnibal
cd71b6b0a9 Remove test of parser pickle 2016-10-17 01:52:10 +02:00
kengz
fb92e2d061 activate parse_tree test, use from_array, test for root correctness 2016-10-16 15:12:08 -04:00
kengz
17b7832419 mark test as needing models 2016-10-16 14:39:07 -04:00
kengz
f046e0d7c8 add parse_tree method to language, separate from __call__ for efficiency, but will use __call__ to get the doc 2016-10-16 14:20:23 -04:00
Matthew Honnibal
5444d38cc6 Update test for biluo tags 2016-10-16 11:42:45 +02:00
Matthew Honnibal
47afef7d6b Add init.py for gold tests 2016-10-15 21:51:28 +02:00
Matthew Honnibal
2163fd238f Add tests for entity->biluo transformation 2016-10-15 21:50:43 +02:00
Matthew Honnibal
2516382106 Fix loading of English in span test 2016-10-15 14:44:37 +02:00
Matthew Honnibal
049197e0ae Update tests, somewhat messily. 2016-10-15 14:14:04 +02:00
Matthew Honnibal
1e1a1d9517 Update matcher test 2016-10-15 14:13:41 +02:00
Matthew Honnibal
9cc9ce0f14 Load with default path=False in tests. 2016-10-15 14:13:23 +02:00
Matthew Honnibal
788657f062 Ensure words are added to vocab before test, so that the lexicon is updated correctly. 2016-10-15 14:12:18 +02:00
Matthew Honnibal
2cc515b2ed Add add_flag method to Vocab, re Issue #504. 2016-10-14 12:15:38 +02:00
Matthew Honnibal
a42fbcf946 Require model for test_is_properties 2016-10-12 19:35:18 +02:00
Matthew Honnibal
20c948361b Use local path in test_lemmatizer 2016-10-12 19:35:00 +02:00
Matthew Honnibal
1318d0bc65 Test with the non-loaded versions of the English and German pipelines. 2016-10-12 19:13:31 +02:00
Matthew Honnibal
bd7fe6420c Revert "Changes to test for new string-store"
This reverts commit 21e90d7d0b.
2016-09-30 20:11:01 +02:00
Matthew Honnibal
21e90d7d0b Changes to test for new string-store 2016-09-30 20:00:58 +02:00
Matthew Honnibal
81a47c01d8 Fix test for empty sentence string. 2016-09-27 19:21:22 +02:00
Matthew Honnibal
fc4a7ad794 Test and fix Issue #411: IndexError when .sents property is used on empty string. 2016-09-27 18:49:14 +02:00
Matthew Honnibal
3d370b7d45 Add test for Issue #445, fixed in 3cb4d455d, with improved lemmatizer logic 2016-09-27 18:39:46 +02:00
Matthew Honnibal
9c8ac91d72 Add test for Issue #435 2016-09-27 13:52:38 +02:00
Matthew Honnibal
e233328d38 Fix Issue #371: Lexeme objects were unhashable. 2016-09-27 13:22:30 +02:00
Matthew Honnibal
2debc4e0a2 Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class. 2016-09-26 11:57:54 +02:00
Matthew Honnibal
95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal
fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Matthew Honnibal
b00f683a0c Fix matcher test 2016-09-24 11:20:58 +02:00
Matthew Honnibal
939a791a52 Update tests 2016-09-24 01:17:03 +02:00
Matthew Honnibal
f6e587b1c7 Fix matcher tests 2016-09-21 20:45:20 +02:00
Matthew Honnibal
58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal
cc8bf62208 * Fix Issue #360: Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens. 2016-05-09 13:23:47 +02:00
Matthew Honnibal
5d86c30f0b * Fix Issue #367: Missing has_vector property on Doc and Span objects 2016-05-09 12:36:14 +02:00
Matthew Honnibal
26095f9722 * Add span.sent property, re Issue #366 2016-05-06 00:17:38 +02:00
Matthew Honnibal
a6a25166ba * Remove print from test 2016-05-05 11:10:59 +02:00
Matthew Honnibal
7441ca30ee * Add tests for Issue #361: Lexeme rich comparison 2016-05-05 01:31:58 +02:00
Matthew Honnibal
72564213e3 * Add test for Issue #309 2016-05-04 16:00:28 +02:00
Matthew Honnibal
76f1d871da Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-04 15:54:00 +02:00
Matthew Honnibal
b4bfc6ae55 * Add test for Issue #351: Indices off when leading whitespace 2016-05-04 15:53:17 +02:00
Wolfgang Seeker
a06fca9fdf German noun chunk iterator now doesn't return tokens more than once 2016-05-03 16:58:59 +02:00
Wolfgang Seeker
7825b75548 add tests for German noun chunker 2016-05-03 15:01:28 +02:00
Wolfgang Seeker
7b246c13cb reformulate noun chunk tests for English 2016-05-03 14:24:35 +02:00
Wolfgang Seeker
1786331cd8 add model sanity test 2016-05-03 12:51:47 +02:00
Matthew Honnibal
308a28c26c * Whitespace 2016-05-02 16:08:11 +02:00
Matthew Honnibal
c1c11a8ae0 * Fix formatting on serializer tests 2016-05-02 16:07:21 +02:00
Matthew Honnibal
902a389d85 * Fix merge conflict in test_parse 2016-05-02 15:28:07 +02:00
Matthew Honnibal
02c23cc1d0 * Fix sentence boundary test 2016-05-02 15:26:07 +02:00
Matthew Honnibal
d2f469b809 * Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct 2016-05-02 15:25:27 +02:00
Wolfgang Seeker
b11cbb06c6 remove old tests for sentence boundary detection 2016-05-02 14:36:35 +02:00
Matthew Honnibal
508fd1f6dc * Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples. 2016-05-02 14:25:10 +02:00
Wolfgang Seeker
fa961ea694 add tests for serialization bug 2016-05-02 11:01:56 +02:00
Wolfgang Seeker
1003e7ccec remove debug output from tests 2016-04-25 12:12:40 +02:00
Wolfgang Seeker
f57f843e85 fix bug in updating tree structure when introducing additional roots 2016-04-25 12:01:19 +02:00
Wolfgang Seeker
b6477fc4f4 adjusted tests to Travis Setup 2016-04-21 17:15:10 +02:00
Wolfgang Seeker
736ffcb9a2 remove whitespace 2016-04-21 16:55:55 +02:00
Wolfgang Seeker
6c7301cc6d the parser now introduces sentence boundaries properly when predicting dependents with root labels 2016-04-21 16:50:53 +02:00
Wolfgang Seeker
12024b0b0a bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Matthew Honnibal
2add5206aa * Fix description of matcher test 2016-04-17 15:40:21 +02:00
Matthew Honnibal
2b419d5b8c * Update test for Issue #242 2016-04-17 15:34:23 +02:00
Matthew Honnibal
f12b043308 * Add test for Issue #242: Overlapping matches not well recognised. 2016-04-17 15:19:17 +02:00
Matthew Honnibal
c0909afe22 Merge pull request #312 from wbwseeker/space_head_bug
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Matthew Honnibal
6f82065761 * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
Matthew Honnibal
0f957dd586 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-04-14 10:37:56 +02:00
Wolfgang Seeker
d99a9cbce9 different handling of space tokens
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal
04d0209be9 * Recognise multiple infixes in a token. 2016-04-13 18:38:26 +10:00
Henning Peters
a473d6e937 fix tests (use english model) 2016-04-12 16:41:57 +02:00
Matthew Honnibal
6df3858dbc * Fix Issue #323: Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition. 2016-04-12 13:17:59 +10:00
Wolfgang Seeker
80bea62842 bugfix in unit test 2016-04-08 16:46:44 +02:00
Matthew Honnibal
26622f0ffc Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-03-29 14:31:52 +11:00
Matthew Honnibal
b1fe41b45d * Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment. 2016-03-29 14:31:05 +11:00
Matthew Honnibal
9c73983bdd * Add test for hyphenation problem in Issue #302 2016-03-29 14:27:13 +11:00
Matthew Honnibal
4a37fdcee1 Merge pull request #287 from wbwseeker/deproj_sentbnd_bug
add function to Token for setting head and dep (and dep_)
2016-03-25 09:47:45 +11:00
Henning Peters
c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Wolfgang Seeker
46e3f979f1 add function for setting head and label to token
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Matthew Honnibal
963fe5258e * Add missing __contains__ method to vocab 2016-03-08 15:49:10 +00:00
Wolfgang Seeker
9d1e6de4a0 make a proper list from zip iterator 2016-03-03 19:51:01 +01:00
Wolfgang Seeker
49f9d1c085 change test_nonproj.py to not use zip inside numpy.asarray 2016-03-03 19:42:09 +01:00
Matthew Honnibal
fcaa0ad7ce Merge pull request #280 from wbwseeker/german_parser
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker
690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Wolfgang Seeker
3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Henning Peters
f3df736e0a remove unidecode-related test 2016-02-24 18:22:22 +01:00
Wolfgang Seeker
4b2297d5d4 add class PseudoProjective for pseudo-projective parsing
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Wolfgang Seeker
8d531c958b replace tests for non-projectivity
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Henning Peters
9d8966a2c0 Update test_tokenizer.py 2016-02-10 19:24:37 +01:00
Henning Peters
3b5f1e753b py26 compatibility 2016-02-10 14:32:54 +01:00
Henning Peters
ee1f1ac300 mark test_sentence_space() as model test 2016-02-10 07:49:11 +01:00
Matthew Honnibal
c6623889c1 * Add test for Issue #251: Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc 2016-02-06 23:47:51 +01:00
Matthew Honnibal
161b01d4c0 * Tweak usage example for multi-processing 2016-02-06 14:44:11 +01:00
Matthew Honnibal
7f24229f10 * Don't try to pickle the tokenizer 2016-02-06 14:09:05 +01:00
Matthew Honnibal
e66d45bf66 * Restore previous patch to Span.root, as it seems it wasn't the cause of the problem. 2016-02-06 13:37:41 +01:00
Matthew Honnibal
031b00cb91 * Fix Span.root calculation 2016-02-05 20:12:09 +01:00
Matthew Honnibal
1cf0100bf6 * Add test for multithreading 2016-02-05 19:38:22 +01:00
Matthew Honnibal
1ef84a0557 * Merge master into rethinc2 2016-02-05 12:55:59 +01:00
Matthew Honnibal
c0e63feccc * xfail pickle tests 2016-02-05 12:46:58 +01:00
Matthew Honnibal
48ce09687d * Skip pickling the vocab in the tests 2016-02-04 15:51:19 +01:00
Matthew Honnibal
ee975d36d0 * Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions 2016-02-04 13:02:25 +01:00
Matthew Honnibal
907e8cf07d * Add u prefix to string in web example 2016-01-25 15:51:38 +01:00
Matthew Honnibal
eba03695ef * Comment out pickle tests 2016-01-25 15:51:13 +01:00
Matthew Honnibal
de94e6c525 * Mark pickle tests as xfail, due to temp files problem 2016-01-25 15:24:17 +01:00
Matthew Honnibal
87172a15c6 * Fix runtime error bug that arose from updated Span.root function. 2016-01-25 15:22:42 +01:00
Matthew Honnibal
2c8dd91785 * Fix first code example on the website 2016-01-23 18:09:19 +01:00
Matthew Honnibal
82d011ac43 * Fix test for whitespace 2016-01-19 20:38:26 +01:00
Matthew Honnibal
e89069dcae * Fix matcher test 2016-01-19 20:24:01 +01:00
Matthew Honnibal
e1282b7f2f * Require user-custom NER classes to work without adding the label. 2016-01-19 20:11:03 +01:00
Matthew Honnibal
f0f92793f6 * Add test for user NER classes in matcher blocking the NER model. Re Issue #178 and Issue #217 2016-01-19 19:23:16 +01:00
Matthew Honnibal
515493c675 * Add xfail test for Issue #225: tokenization with non-whitespace delimiters 2016-01-19 13:20:14 +01:00
Matthew Honnibal
04177debd0 * Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184, but may cause further problems. Needs testing. 2016-01-19 02:54:15 +01:00
Matthew Honnibal
7893de3203 * Add test for Issue #184: Whitespace at sentence boundary causes sentence boundary error. 2016-01-18 23:04:38 +01:00
Matthew Honnibal
e825fd9554 * Make some of the website tests work without models 2016-01-18 18:14:44 +01:00
Matthew Honnibal
bed36ab0ff * Fix import of HEAD attribute 2016-01-18 17:34:43 +01:00
Matthew Honnibal
28c659c1fe * Fix import for numpy 2016-01-18 17:25:04 +01:00
Matthew Honnibal
fc36bcf458 * Fix import for English 2016-01-18 17:14:40 +01:00
Matthew Honnibal
cc4c335e14 * Set heads for test_merge_tokens, to make the test run without models 2016-01-18 17:00:11 +01:00
Matthew Honnibal
714cbc03d5 * Add test for Issue #203: nested noun chunks. 2016-01-16 18:02:30 +01:00
Matthew Honnibal
4e2253170c * Move test for doc.merge to tokens_api file, to avoid name conflicts which upset pytest 2016-01-16 18:01:36 +01:00
Matthew Honnibal
34a157511f * Move test_merge_hang to test_tokens_api 2016-01-16 18:00:26 +01:00
Matthew Honnibal
4a16dbfeca * Add test for Issue #203: noun chunks should be flat, but sometimes are nested 2016-01-16 17:41:25 +01:00
Matthew Honnibal
223d2b3484 * Add test for Issue #154: Additional whitespace introduced when string ends with a whitespace token. 2016-01-16 17:08:07 +01:00
Matthew Honnibal
3dc398b727 * Fix merge conflict in requirements.txt 2016-01-16 16:20:49 +01:00
Matthew Honnibal
fc5962a77d * Improve test for root token in Span 2016-01-16 16:19:09 +01:00
Matthew Honnibal
aa0dd79f52 * Delete test_token_references, which checked a flakey strategy for preventing orphan tokens from a while ago. Now orphan tokens simply hold a reference to Pool, preventing the memory from being freed underneath them. This means that we don't need to run this slow test. 2016-01-16 16:03:35 +01:00
Matthew Honnibal
c1039fa4b4 * Add test for Issue #214. Resolved in change to Span.root 2016-01-16 15:37:47 +01:00
Henning Peters
235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Matthew Honnibal
478a79a3d5 * Add test for Issue #220: Whitespace being tagged as noun 2016-01-15 16:17:07 +01:00
Henning Peters
bc229790ac integrate with sputnik 2016-01-13 19:46:17 +01:00
Matthew Honnibal
3fbfba575a * xfail the contractions test 2015-12-31 13:16:28 +01:00
Matthew Honnibal
3bd910ccad * Merge therell test 2015-12-31 11:55:18 +01:00
Matthew Honnibal
eaf2ad59f1 * Fix use of mock Package object 2015-12-31 04:13:15 +01:00
Matthew Honnibal
a6ba43ecaf * Fix errors in packaging revision 2015-12-29 18:37:26 +01:00
Matthew Honnibal
4b4eec8b47 * Fix Issue #201: Tokenization of there'll 2015-12-29 18:09:09 +01:00
Matthew Honnibal
86ee9d046d * Remove test that belongs to a change for master 2015-12-29 18:07:23 +01:00
Matthew Honnibal
aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal
8b61d45ed0 * Fix merge conflicts for headers branch 2015-12-27 17:46:25 +01:00
Matthew Honnibal
6bb9c7f311 Merge pull request #202 from henningpeters/sputnik
access model via sputnik
2015-12-28 03:29:53 +11:00
Henning Peters
7f7299cafb Merge branch 'tmpdir' into headers 2015-12-18 12:25:25 +01:00
Henning Peters
cfa187aaf0 fix tests 2015-12-18 10:58:02 +01:00
Henning Peters
8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00
Henning Peters
4f3efb8eaf avoid writing to /tmp (not cross-platform compatible) 2015-12-16 19:56:40 +01:00
Henning Peters
4ada39f472 avoid writing to /tmp (not cross-platform compatible) 2015-12-16 19:53:06 +01:00
Henning Peters
ac318b568c new approach to dependency headers 2015-12-13 11:49:17 +01:00
Henning Peters
9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Matthew Honnibal
ec7d36c3a4 * Add test for matcher end-point problem 2015-11-12 05:00:40 +11:00
Matthew Honnibal
d309622a27 * Add test for matcher end-point problem 2015-11-12 04:59:11 +11:00
Matthew Honnibal
56ea20a886 * Add test for matcher end-point problem 2015-11-12 04:58:53 +11:00
Matthew Honnibal
cfa4062147 * Add test for matcher end-point problem 2015-11-12 04:56:07 +11:00
Matthew Honnibal
d67d7d5a86 * Add test for NER inconsistency bug 2015-11-08 16:19:33 +01:00
Matthew Honnibal
fde9a22ec2 * Add new test for ner 2015-11-08 13:57:15 +01:00
Matthew Honnibal
31da42eb27 * Mark tests that require models 2015-11-07 19:27:38 +11:00
Matthew Honnibal
8e26a28616 * Mark tests that require models 2015-11-07 19:10:56 +11:00
Matthew Honnibal
15eab7354f * Remove extraneous test files 2015-11-07 18:45:13 +11:00
Matthew Honnibal
06f26d258e * Fix test_basic_create 2015-11-07 10:04:37 +11:00
Matthew Honnibal
1d3884c46d * Fix test_basic_create 2015-11-07 10:03:56 +11:00
Andreas Grivas
83ca4e0b93 * use old merge tests - add more 2015-11-07 07:57:04 +11:00
Matthew Honnibal
3c162dcac3 * Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc. 2015-11-07 03:24:30 +11:00
Matthew Honnibal
ee3f9ba581 * Fix test of serializer 2015-11-03 19:45:16 +11:00
Matthew Honnibal
d06ba26371 * Fix test of serializer 2015-11-03 19:43:27 +11:00
Matthew Honnibal
85372468e3 * Fix serialize test 2015-11-03 08:51:33 +01:00
Matthew Honnibal
389a373807 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-11-03 18:07:25 +11:00
Matthew Honnibal
3f44b3e43f * Mark serializer test as requiring models 2015-11-03 18:07:08 +11:00
Matthew Honnibal
25ed7be8f8 Merge branch 'master' of https://github.com/honnibal/spaCy 2015-11-03 07:58:17 +01:00
Matthew Honnibal
5e040855a5 * Ensure morphological features and lemmas are loaded in from_array, re Issue #152 2015-11-03 17:56:50 +11:00
Matthew Honnibal
5668feb235 * Fix pickle test for python3 2015-11-03 04:57:02 +01:00
Andreas Grivas
d418f00eb1 fixed error when printing unicode 2015-11-02 20:23:18 +02:00
Matthew Honnibal
1c0356e4c2 * Set test file mode to w+t 2015-10-26 22:40:48 +11:00
Matthew Honnibal
0fe98f358b * Fix mode on text file for Python3 in strings test 2015-10-26 22:25:16 +11:00
Matthew Honnibal
8ba9cf905e * Fix mode on text file for Python3 in strings test 2015-10-26 21:44:34 +11:00
Matthew Honnibal
a0730699b1 * Fix mode on text file for Python3 in strings test 2015-10-26 21:25:56 +11:00
Matthew Honnibal
725344d349 * Fix tempfile in test 2015-10-26 21:08:18 +11:00
Matthew Honnibal
a824a98312 * Add tests for pickling vectors, re: Issue #125 2015-10-26 12:31:05 +11:00
Matthew Honnibal
4e16f9e435 * Move tests underneath spacy/ 2015-10-26 00:07:31 +11:00