Ines Montani
|
667051375d
|
Modernize tokenizer tests for whitespace
|
2017-01-04 00:46:35 +01:00 |
|
Ines Montani
|
aafc894285
|
Modernize tokenizer tests for contractions
Use @pytest.mark.parametrize.
|
2017-01-03 23:02:21 +01:00 |
|
Ines Montani
|
fb9d3bb022
|
Revert "Merge remote-tracking branch 'origin/master'"
This reverts commit d3b181cdf1 , reversing
changes made to b19cfcc144 .
|
2017-01-03 18:21:36 +01:00 |
|
Matthew Honnibal
|
3ba7c167a8
|
Fix URL tests
|
2016-12-30 17:10:08 -06:00 |
|
Matthew Honnibal
|
9936a1b9b5
|
Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns
|
2016-12-30 14:53:40 -06:00 |
|
Matthew Honnibal
|
3e8d9c772e
|
Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
|
2016-12-31 00:52:17 +11:00 |
|
Gyorgy Orosz
|
45e045a87b
|
Unicode/UTF8 compatibility for Python2
|
2016-12-24 00:21:00 +01:00 |
|
Gyorgy Orosz
|
72b61b6d03
|
Typo fix.
|
2016-12-24 00:10:29 +01:00 |
|
Gyorgy Orosz
|
1748549aeb
|
Added exception pattern mechanism to the tokenizer.
|
2016-12-21 23:16:19 +01:00 |
|
Gyorgy Orosz
|
ab2f6ea46c
|
Removed data files from tests..
|
2016-12-21 20:22:09 +01:00 |
|
Gyorgy Orosz
|
3d5306acb9
|
Added further testcases.
|
2016-12-20 23:49:35 +01:00 |
|
Gyorgy Orosz
|
23956e72ff
|
Improved partial support for tokenzing Hungarian numbers
|
2016-12-20 23:36:59 +01:00 |
|
Gyorgy Orosz
|
6add156075
|
Refactored language data structure
|
2016-12-20 22:28:20 +01:00 |
|
Gyorgy Orosz
|
366b3f8685
|
Merge branch 'master' into hu_tokenizer
|
2016-12-20 20:53:31 +01:00 |
|
Gyorgy Orosz
|
c035928156
|
Partial Hungarian number tokenization is added.
|
2016-12-20 20:46:20 +01:00 |
|
Matthew Honnibal
|
f38eb25fe1
|
Fix test for word vector
|
2016-12-18 23:31:55 +01:00 |
|
Matthew Honnibal
|
e4c951c153
|
Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data
|
2016-12-18 17:01:08 +01:00 |
|
Ines Montani
|
d1c1d3f9cd
|
Fix tokenizer test
|
2016-12-18 16:55:32 +01:00 |
|
Matthew Honnibal
|
bdcecb3c96
|
Add import in regression test
|
2016-12-18 16:51:31 +01:00 |
|
Ines Montani
|
77cf2fb0f6
|
Remove unnecessary argument in test
|
2016-12-18 14:06:27 +01:00 |
|
Ines Montani
|
121c310566
|
Remove trailing whitespace
|
2016-12-18 14:06:27 +01:00 |
|
Matthew Honnibal
|
0595cc0635
|
Change test595 to mock data, instead of requiring model.
|
2016-12-18 13:28:51 +01:00 |
|
Ines Montani
|
f2c48ef504
|
Resolve stopwords conflict to merge Dutch
|
2016-12-17 13:08:16 +01:00 |
|
Janneke van der Zwaan
|
4a3fdcce8a
|
Merge github.com:explosion/spaCy into dutch
|
2016-12-13 09:25:23 +01:00 |
|
Gyorgy Orosz
|
0cf2144d24
|
Adding partial hyphen and quote handling support.
|
2016-12-11 00:14:36 +01:00 |
|
Gyorgy Orosz
|
2051726fd3
|
Passing Hungatian abbrev tests.
|
2016-12-10 23:37:58 +01:00 |
|
Gyorgy Orosz
|
0289b8ceaa
|
Additional abbreviation tests.
|
2016-12-08 12:17:44 +01:00 |
|
Gyorgy Orosz
|
5b00039955
|
First steps towards the Hungarian tokenizer code.
|
2016-12-07 23:07:43 +01:00 |
|
Ines Montani
|
8350d65695
|
Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
|
2016-12-07 21:12:49 +01:00 |
|
Ines Montani
|
52e7d634df
|
Remove trailing whitespace
|
2016-12-07 21:12:19 +01:00 |
|
Ines Montani
|
07f0efb102
|
Add test for tokenizer regular expressions
|
2016-12-07 20:33:28 +01:00 |
|
Matthew Honnibal
|
f6e356aada
|
Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667
|
2016-12-02 11:05:50 +01:00 |
|
Janneke van der Zwaan
|
88869e0e07
|
Merge github.com:explosion/spaCy into dutch
|
2016-11-30 17:13:39 +01:00 |
|
Matthew Honnibal
|
6652f2a135
|
Test #656, #624: special case rules for tokenizer with attributes.
|
2016-11-25 12:44:13 +01:00 |
|
Matthew Honnibal
|
53d8ca8f51
|
Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.
|
2016-11-25 11:34:30 +01:00 |
|
dafnevk
|
3db8b0d322
|
Added language class and some language data (with some TODOs) for Dutch
|
2016-11-24 15:56:38 +01:00 |
|
Matthew Honnibal
|
e01c1875ee
|
Work on test for #615
|
2016-11-23 23:48:41 +01:00 |
|
Matthew Honnibal
|
e86f440ca6
|
Fix test for issue 617
|
2016-11-10 22:48:10 +01:00 |
|
Matthew Honnibal
|
faa7610c56
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-11-10 22:46:38 +01:00 |
|
Matthew Honnibal
|
a2c7de8329
|
spacy/tests/regression/test_issue617.py
Test Issue #617
|
2016-11-10 22:46:23 +01:00 |
|
tiago
|
2a3e342c1f
|
Added a test case to cover the span.merge returning values
|
2016-11-09 18:57:50 +00:00 |
|
Dmitry Sadovnychyi
|
86c056ba64
|
Add basic test for PhraseMatcher
#613
|
2016-11-09 00:10:32 +08:00 |
|
Matthew Honnibal
|
3ea15b257f
|
Fix test for 605
|
2016-11-06 11:59:26 +01:00 |
|
Matthew Honnibal
|
efe7790439
|
Test #590: Order dependence in Matcher rules.
|
2016-11-06 11:21:36 +01:00 |
|
Matthew Honnibal
|
75805397dd
|
Test Issue #605
|
2016-11-06 10:42:32 +01:00 |
|
Matthew Honnibal
|
4a8a2b6001
|
Test #595 -- Bug in lemmatization of base forms.
|
2016-11-04 00:27:32 +01:00 |
|
Matthew Honnibal
|
72b9bd57ec
|
Test Issue #588: Matcher accepts invalid, empty patterns.
|
2016-11-03 00:09:35 +01:00 |
|
Matthew Honnibal
|
b6b01d4680
|
Remove deprecated tokens_from_list test.
|
2016-11-02 23:47:21 +01:00 |
|
Matthew Honnibal
|
3d6c79e595
|
Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents.
|
2016-11-02 23:40:11 +01:00 |
|
Matthew Honnibal
|
125c910a8d
|
Test Issue #600
|
2016-11-02 23:24:13 +01:00 |
|