Gyorgy Orosz
45e045a87b
Unicode/UTF8 compatibility for Python2
2016-12-24 00:21:00 +01:00
Gyorgy Orosz
72b61b6d03
Typo fix.
2016-12-24 00:10:29 +01:00
Gyorgy Orosz
1748549aeb
Added exception pattern mechanism to the tokenizer.
2016-12-21 23:16:19 +01:00
Gyorgy Orosz
ab2f6ea46c
Removed data files from tests..
2016-12-21 20:22:09 +01:00
Gyorgy Orosz
3d5306acb9
Added further testcases.
2016-12-20 23:49:35 +01:00
Gyorgy Orosz
23956e72ff
Improved partial support for tokenzing Hungarian numbers
2016-12-20 23:36:59 +01:00
Gyorgy Orosz
6add156075
Refactored language data structure
2016-12-20 22:28:20 +01:00
Gyorgy Orosz
366b3f8685
Merge branch 'master' into hu_tokenizer
2016-12-20 20:53:31 +01:00
Gyorgy Orosz
c035928156
Partial Hungarian number tokenization is added.
2016-12-20 20:46:20 +01:00
Matthew Honnibal
f38eb25fe1
Fix test for word vector
2016-12-18 23:31:55 +01:00
Matthew Honnibal
e4c951c153
Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data
2016-12-18 17:01:08 +01:00
Ines Montani
d1c1d3f9cd
Fix tokenizer test
2016-12-18 16:55:32 +01:00
Matthew Honnibal
bdcecb3c96
Add import in regression test
2016-12-18 16:51:31 +01:00
Ines Montani
77cf2fb0f6
Remove unnecessary argument in test
2016-12-18 14:06:27 +01:00
Ines Montani
121c310566
Remove trailing whitespace
2016-12-18 14:06:27 +01:00
Matthew Honnibal
0595cc0635
Change test595 to mock data, instead of requiring model.
2016-12-18 13:28:51 +01:00
Ines Montani
f2c48ef504
Resolve stopwords conflict to merge Dutch
2016-12-17 13:08:16 +01:00
Janneke van der Zwaan
4a3fdcce8a
Merge github.com:explosion/spaCy into dutch
2016-12-13 09:25:23 +01:00
Gyorgy Orosz
0cf2144d24
Adding partial hyphen and quote handling support.
2016-12-11 00:14:36 +01:00
Gyorgy Orosz
2051726fd3
Passing Hungatian abbrev tests.
2016-12-10 23:37:58 +01:00
Gyorgy Orosz
0289b8ceaa
Additional abbreviation tests.
2016-12-08 12:17:44 +01:00
Gyorgy Orosz
5b00039955
First steps towards the Hungarian tokenizer code.
2016-12-07 23:07:43 +01:00
Ines Montani
8350d65695
Change morphology and lemmatizer API
...
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Ines Montani
52e7d634df
Remove trailing whitespace
2016-12-07 21:12:19 +01:00
Ines Montani
07f0efb102
Add test for tokenizer regular expressions
2016-12-07 20:33:28 +01:00
Matthew Honnibal
f6e356aada
Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667
2016-12-02 11:05:50 +01:00
Janneke van der Zwaan
88869e0e07
Merge github.com:explosion/spaCy into dutch
2016-11-30 17:13:39 +01:00
Matthew Honnibal
6652f2a135
Test #656 , #624 : special case rules for tokenizer with attributes.
2016-11-25 12:44:13 +01:00
Matthew Honnibal
53d8ca8f51
Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.
2016-11-25 11:34:30 +01:00
dafnevk
3db8b0d322
Added language class and some language data (with some TODOs) for Dutch
2016-11-24 15:56:38 +01:00
Matthew Honnibal
e01c1875ee
Work on test for #615
2016-11-23 23:48:41 +01:00
Matthew Honnibal
e86f440ca6
Fix test for issue 617
2016-11-10 22:48:10 +01:00
Matthew Honnibal
faa7610c56
Merge branch 'master' of ssh://github.com/explosion/spaCy
2016-11-10 22:46:38 +01:00
Matthew Honnibal
a2c7de8329
spacy/tests/regression/test_issue617.py
...
Test Issue #617
2016-11-10 22:46:23 +01:00
tiago
2a3e342c1f
Added a test case to cover the span.merge returning values
2016-11-09 18:57:50 +00:00
Dmitry Sadovnychyi
86c056ba64
Add basic test for PhraseMatcher
...
#613
2016-11-09 00:10:32 +08:00
Matthew Honnibal
3ea15b257f
Fix test for 605
2016-11-06 11:59:26 +01:00
Matthew Honnibal
efe7790439
Test #590 : Order dependence in Matcher rules.
2016-11-06 11:21:36 +01:00
Matthew Honnibal
75805397dd
Test Issue #605
2016-11-06 10:42:32 +01:00
Matthew Honnibal
4a8a2b6001
Test #595 -- Bug in lemmatization of base forms.
2016-11-04 00:27:32 +01:00
Matthew Honnibal
72b9bd57ec
Test Issue #588 : Matcher accepts invalid, empty patterns.
2016-11-03 00:09:35 +01:00
Matthew Honnibal
b6b01d4680
Remove deprecated tokens_from_list test.
2016-11-02 23:47:21 +01:00
Matthew Honnibal
3d6c79e595
Test Issue #599 : .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents.
2016-11-02 23:40:11 +01:00
Matthew Honnibal
125c910a8d
Test Issue #600
2016-11-02 23:24:13 +01:00
Matthew Honnibal
80824f6d29
Fix test
2016-11-02 20:48:40 +01:00
Matthew Honnibal
c09a8ce5bb
Add test for french tokenizer
2016-11-02 20:40:31 +01:00
Matthew Honnibal
b012ae3044
Add test for loading languages
2016-11-02 20:38:48 +01:00
Matthew Honnibal
d8db648ebf
Add __init__.py file for regression tests
2016-11-01 13:45:06 +01:00
Matthew Honnibal
6977a2b8cd
Add test for Issue #589
2016-11-01 12:33:36 +01:00
Matthew Honnibal
7e5f63a595
Improve test slightly
2016-10-28 17:41:16 +02:00
Matthew Honnibal
782e4814f4
Test Issue #587 : Matcher segfaults on particular input
2016-10-28 16:38:32 +02:00
Matthew Honnibal
afea6505f3
Test Issue 429: No valid actions for NER after matcher adds a new entity label.
2016-10-27 18:01:34 +02:00
Matthew Honnibal
6c47048912
Fix test, after IOB tweak.
2016-10-26 17:22:03 +02:00
Matthew Honnibal
d3a617aa99
Test workaround for Issue #285 : Streaming data memory growth
2016-10-24 13:48:06 +02:00
Matthew Honnibal
64e5f02cf7
Update test
2016-10-23 21:08:07 +02:00
Matthew Honnibal
66d7a6eca2
Update test
2016-10-23 21:02:05 +02:00
Matthew Honnibal
90bf797125
Update test
2016-10-23 20:54:17 +02:00
Matthew Honnibal
5e76320ffe
Update test
2016-10-23 20:44:54 +02:00
Matthew Honnibal
aa105927f3
Update test
2016-10-23 20:31:25 +02:00
Matthew Honnibal
e120561294
Fix vector_norm test.
2016-10-23 19:56:16 +02:00
Matthew Honnibal
c05cd2356e
Fix similarity test for Python 3
2016-10-23 18:16:56 +02:00
Matthew Honnibal
79aa03fe98
Test Issue #514 : Serializer fails when new entity type has been added.
2016-10-23 17:41:44 +02:00
Matthew Honnibal
f97548c6f1
Fix broken test, re Issue #461
2016-10-23 17:02:23 +02:00
Matthew Honnibal
4de30a8e38
Test Issue #514 : Serialization fails after adding a new entity label.
2016-10-23 16:40:27 +02:00
Matthew Honnibal
e99b3f5322
Test Issue #459 : Fail to deserialize empty doc
2016-10-23 16:30:22 +02:00
Matthew Honnibal
99ff8b902f
Test that huffman codec works with empty freqs dict
2016-10-23 16:27:45 +02:00
Matthew Honnibal
e5627134d9
Test Issue #461 : ent_iob tag incorrect after setting entities.
2016-10-23 15:50:04 +02:00
Matthew Honnibal
2989072aac
Add tests to verify that Issue #442 is fixed in 1.1
2016-10-23 14:33:13 +02:00
Matthew Honnibal
e838b6d53f
Add tests for using the new Entity ID tracking in the rule matcher
2016-10-23 14:04:01 +02:00
Matthew Honnibal
e7af75e0a9
Add test for vector resizing, re Issue #544
2016-10-21 17:07:21 +02:00
Matthew Honnibal
c3a8a1cf51
Update serializer test.
2016-10-18 16:18:46 +02:00
Matthew Honnibal
7d446e5094
Revert "Update matcher test, to reflect character offset return instead of token offset."
...
This reverts commit f8d3e3bcfe
.
2016-10-17 16:49:49 +02:00
Matthew Honnibal
4bf2c53c13
Revert "Hack on matcher tests, for new implementation."
...
This reverts commit dbe60644ab
.
2016-10-17 16:49:48 +02:00
Matthew Honnibal
dbe60644ab
Hack on matcher tests, for new implementation.
2016-10-17 16:12:22 +02:00
Matthew Honnibal
f8d3e3bcfe
Update matcher test, to reflect character offset return instead of token offset.
2016-10-17 16:00:10 +02:00
Matthew Honnibal
be48a7b4f3
Fix conftest for website tests.
2016-10-17 01:54:26 +02:00
Matthew Honnibal
8951bf6989
Update matcher tests
2016-10-17 01:53:24 +02:00
Matthew Honnibal
0cf4aff470
Set default path in EN/DE tests.
2016-10-17 01:52:49 +02:00
Matthew Honnibal
cd71b6b0a9
Remove test of parser pickle
2016-10-17 01:52:10 +02:00
Matthew Honnibal
5444d38cc6
Update test for biluo tags
2016-10-16 11:42:45 +02:00
Matthew Honnibal
47afef7d6b
Add init.py for gold tests
2016-10-15 21:51:28 +02:00
Matthew Honnibal
2163fd238f
Add tests for entity->biluo transformation
2016-10-15 21:50:43 +02:00
Matthew Honnibal
2516382106
Fix loading of English in span test
2016-10-15 14:44:37 +02:00
Matthew Honnibal
049197e0ae
Update tests, somewhat messily.
2016-10-15 14:14:04 +02:00
Matthew Honnibal
1e1a1d9517
Update matcher test
2016-10-15 14:13:41 +02:00
Matthew Honnibal
9cc9ce0f14
Load with default path=False in tests.
2016-10-15 14:13:23 +02:00
Matthew Honnibal
788657f062
Ensure words are added to vocab before test, so that the lexicon is updated correctly.
2016-10-15 14:12:18 +02:00
Matthew Honnibal
2cc515b2ed
Add add_flag method to Vocab, re Issue #504 .
2016-10-14 12:15:38 +02:00
Matthew Honnibal
a42fbcf946
Require model for test_is_properties
2016-10-12 19:35:18 +02:00
Matthew Honnibal
20c948361b
Use local path in test_lemmatizer
2016-10-12 19:35:00 +02:00
Matthew Honnibal
1318d0bc65
Test with the non-loaded versions of the English and German pipelines.
2016-10-12 19:13:31 +02:00
Matthew Honnibal
bd7fe6420c
Revert "Changes to test for new string-store"
...
This reverts commit 21e90d7d0b
.
2016-09-30 20:11:01 +02:00
Matthew Honnibal
21e90d7d0b
Changes to test for new string-store
2016-09-30 20:00:58 +02:00
Matthew Honnibal
81a47c01d8
Fix test for empty sentence string.
2016-09-27 19:21:22 +02:00
Matthew Honnibal
fc4a7ad794
Test and fix Issue #411 : IndexError when .sents property is used on empty string.
2016-09-27 18:49:14 +02:00
Matthew Honnibal
3d370b7d45
Add test for Issue #445 , fixed in 3cb4d455d
, with improved lemmatizer logic
2016-09-27 18:39:46 +02:00
Matthew Honnibal
9c8ac91d72
Add test for Issue #435
2016-09-27 13:52:38 +02:00
Matthew Honnibal
e233328d38
Fix Issue #371 : Lexeme objects were unhashable.
2016-09-27 13:22:30 +02:00
Matthew Honnibal
2debc4e0a2
Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class.
2016-09-26 11:57:54 +02:00
Matthew Honnibal
95aaea0d3f
Refactor so that the tokenizer data is read from Python data, rather than from disk
2016-09-25 14:49:53 +02:00
Matthew Honnibal
fd65cf6cbb
Finish refactoring data loading
2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
2016-09-24 15:42:01 +02:00
Matthew Honnibal
b00f683a0c
Fix matcher test
2016-09-24 11:20:58 +02:00
Matthew Honnibal
939a791a52
Update tests
2016-09-24 01:17:03 +02:00
Matthew Honnibal
f6e587b1c7
Fix matcher tests
2016-09-21 20:45:20 +02:00
Matthew Honnibal
58e83fe34b
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
2016-09-21 14:54:55 +02:00
Matthew Honnibal
cc8bf62208
* Fix Issue #360 : Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens.
2016-05-09 13:23:47 +02:00
Matthew Honnibal
5d86c30f0b
* Fix Issue #367 : Missing has_vector property on Doc and Span objects
2016-05-09 12:36:14 +02:00
Matthew Honnibal
26095f9722
* Add span.sent property, re Issue #366
2016-05-06 00:17:38 +02:00
Matthew Honnibal
a6a25166ba
* Remove print from test
2016-05-05 11:10:59 +02:00
Matthew Honnibal
7441ca30ee
* Add tests for Issue #361 : Lexeme rich comparison
2016-05-05 01:31:58 +02:00
Matthew Honnibal
72564213e3
* Add test for Issue #309
2016-05-04 16:00:28 +02:00
Matthew Honnibal
76f1d871da
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
2016-05-04 15:54:00 +02:00
Matthew Honnibal
b4bfc6ae55
* Add test for Issue #351 : Indices off when leading whitespace
2016-05-04 15:53:17 +02:00
Wolfgang Seeker
a06fca9fdf
German noun chunk iterator now doesn't return tokens more than once
2016-05-03 16:58:59 +02:00
Wolfgang Seeker
7825b75548
add tests for German noun chunker
2016-05-03 15:01:28 +02:00
Wolfgang Seeker
7b246c13cb
reformulate noun chunk tests for English
2016-05-03 14:24:35 +02:00
Wolfgang Seeker
1786331cd8
add model sanity test
2016-05-03 12:51:47 +02:00
Matthew Honnibal
308a28c26c
* Whitespace
2016-05-02 16:08:11 +02:00
Matthew Honnibal
c1c11a8ae0
* Fix formatting on serializer tests
2016-05-02 16:07:21 +02:00
Matthew Honnibal
902a389d85
* Fix merge conflict in test_parse
2016-05-02 15:28:07 +02:00
Matthew Honnibal
02c23cc1d0
* Fix sentence boundary test
2016-05-02 15:26:07 +02:00
Matthew Honnibal
d2f469b809
* Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct
2016-05-02 15:25:27 +02:00
Wolfgang Seeker
b11cbb06c6
remove old tests for sentence boundary detection
2016-05-02 14:36:35 +02:00
Matthew Honnibal
508fd1f6dc
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
2016-05-02 14:25:10 +02:00
Wolfgang Seeker
fa961ea694
add tests for serialization bug
2016-05-02 11:01:56 +02:00
Wolfgang Seeker
1003e7ccec
remove debug output from tests
2016-04-25 12:12:40 +02:00
Wolfgang Seeker
f57f843e85
fix bug in updating tree structure when introducing additional roots
2016-04-25 12:01:19 +02:00
Wolfgang Seeker
b6477fc4f4
adjusted tests to Travis Setup
2016-04-21 17:15:10 +02:00
Wolfgang Seeker
736ffcb9a2
remove whitespace
2016-04-21 16:55:55 +02:00
Wolfgang Seeker
6c7301cc6d
the parser now introduces sentence boundaries properly when predicting dependents with root labels
2016-04-21 16:50:53 +02:00
Wolfgang Seeker
12024b0b0a
bugfix: introducing multiple roots now updates original head's properties
...
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Matthew Honnibal
2add5206aa
* Fix description of matcher test
2016-04-17 15:40:21 +02:00
Matthew Honnibal
2b419d5b8c
* Update test for Issue #242
2016-04-17 15:34:23 +02:00
Matthew Honnibal
f12b043308
* Add test for Issue #242 : Overlapping matches not well recognised.
2016-04-17 15:19:17 +02:00
Matthew Honnibal
c0909afe22
Merge pull request #312 from wbwseeker/space_head_bug
...
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Matthew Honnibal
6f82065761
* Fix infixed commas in tokenizer, re Issue #326 . Need to benchmark on empirical data, to make sure this doesn't break other cases.
2016-04-14 11:36:03 +02:00
Matthew Honnibal
0f957dd586
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-04-14 10:37:56 +02:00
Wolfgang Seeker
d99a9cbce9
different handling of space tokens
...
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal
04d0209be9
* Recognise multiple infixes in a token.
2016-04-13 18:38:26 +10:00
Henning Peters
a473d6e937
fix tests (use english model)
2016-04-12 16:41:57 +02:00
Matthew Honnibal
6df3858dbc
* Fix Issue #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.
2016-04-12 13:17:59 +10:00
Wolfgang Seeker
80bea62842
bugfix in unit test
2016-04-08 16:46:44 +02:00
Matthew Honnibal
26622f0ffc
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-03-29 14:31:52 +11:00
Matthew Honnibal
b1fe41b45d
* Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment.
2016-03-29 14:31:05 +11:00
Matthew Honnibal
9c73983bdd
* Add test for hyphenation problem in Issue #302
2016-03-29 14:27:13 +11:00
Matthew Honnibal
4a37fdcee1
Merge pull request #287 from wbwseeker/deproj_sentbnd_bug
...
add function to Token for setting head and dep (and dep_)
2016-03-25 09:47:45 +11:00
Henning Peters
c12d3dd200
add __init__.py to empty package dirs
2016-03-14 11:28:03 +01:00
Wolfgang Seeker
46e3f979f1
add function for setting head and label to token
...
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Matthew Honnibal
963fe5258e
* Add missing __contains__ method to vocab
2016-03-08 15:49:10 +00:00