Commit Graph

12 Commits

Author SHA1 Message Date
Matthew Honnibal
6f82065761 * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
Matthew Honnibal
04d0209be9 * Recognise multiple infixes in a token. 2016-04-13 18:38:26 +10:00
Matthew Honnibal
b1fe41b45d * Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment. 2016-03-29 14:31:05 +11:00
Matthew Honnibal
9c73983bdd * Add test for hyphenation problem in Issue #302 2016-03-29 14:27:13 +11:00
Henning Peters
c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Henning Peters
9d8966a2c0 Update test_tokenizer.py 2016-02-10 19:24:37 +01:00
Matthew Honnibal
7f24229f10 * Don't try to pickle the tokenizer 2016-02-06 14:09:05 +01:00
Matthew Honnibal
515493c675 * Add xfail test for Issue #225: tokenization with non-whitespace delimiters 2016-01-19 13:20:14 +01:00
Matthew Honnibal
223d2b3484 * Add test for Issue #154: Additional whitespace introduced when string ends with a whitespace token. 2016-01-16 17:08:07 +01:00
Matthew Honnibal
3fbfba575a * xfail the contractions test 2015-12-31 13:16:28 +01:00
Matthew Honnibal
4b4eec8b47 * Fix Issue #201: Tokenization of there'll 2015-12-29 18:09:09 +01:00
Matthew Honnibal
4e16f9e435 * Move tests underneath spacy/ 2015-10-26 00:07:31 +11:00