Matthew Honnibal
|
47a4371fea
|
* Upd tokenizer with i.e. tests
|
2015-02-18 06:37:04 -05:00 |
|
leofidus
|
363473aeed
|
Add rokenizer test for zero length string
|
2015-02-10 08:20:32 -05:00 |
|
Matthew Honnibal
|
d0e08a5b57
|
* Upd index tests
|
2015-01-30 12:35:13 +11:00 |
|
Matthew Honnibal
|
706305ee26
|
* Upd tests for new meaning of 'string'
|
2015-01-24 07:22:30 +11:00 |
|
Matthew Honnibal
|
5ed8b2b98f
|
* Rename sic to orth
|
2015-01-23 02:08:25 +11:00 |
|
Matthew Honnibal
|
93d4bd6c2e
|
* Add test for ). in tokenizer
|
2015-01-22 22:25:18 +11:00 |
|
Matthew Honnibal
|
7d3c40de7d
|
* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme
|
2015-01-15 00:33:16 +11:00 |
|
Matthew Honnibal
|
81d878beb2
|
* Upd tests
|
2014-12-30 21:34:09 +11:00 |
|
Matthew Honnibal
|
91a5064b7f
|
* Upd tests
|
2014-12-26 14:26:27 +11:00 |
|
Matthew Honnibal
|
73f200436f
|
* Tests passing except for morphology/lemmatization stuff
|
2014-12-23 11:40:32 +11:00 |
|
Matthew Honnibal
|
0d9972f4b0
|
* Upd tokenizer test
|
2014-12-21 20:38:27 +11:00 |
|
Matthew Honnibal
|
302e09018b
|
* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas
|
2014-12-09 14:48:01 +11:00 |
|
Matthew Honnibal
|
0de700b566
|
* Comment out tests of hyphenation, while we decide what hyphenation policy should be.
|
2014-11-05 02:03:22 +11:00 |
|
Matthew Honnibal
|
63114820cf
|
* Upd tests for tighter interface
|
2014-10-30 18:15:30 +11:00 |
|
Matthew Honnibal
|
13909a2e24
|
* Rewriting Lexeme serialization.
|
2014-10-29 23:19:38 +11:00 |
|
Matthew Honnibal
|
08ce602243
|
* Large refactor, particularly to Python API
|
2014-10-24 00:59:17 +11:00 |
|
Matthew Honnibal
|
6fb42c4919
|
* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang
|
2014-10-14 16:17:45 +11:00 |
|
Matthew Honnibal
|
db191361ee
|
* Add new tests for fancier tokenization cases
|
2014-09-15 06:31:58 +02:00 |
|
Matthew Honnibal
|
5dcc1a426a
|
* Update tokenization tests for new tokenizer rules
|
2014-09-15 01:32:51 +02:00 |
|
Matthew Honnibal
|
985bc68327
|
* Fix bug with trailing punct on contractions. Reduced efficiency, and slightly hacky implementation.
|
2014-09-12 18:26:26 +02:00 |
|
Matthew Honnibal
|
b5b31c6b6e
|
* Avoid testing for object identity
|
2014-09-10 20:58:30 +02:00 |
|
Matthew Honnibal
|
c282e6d5fb
|
* Redesign proceeding
|
2014-08-28 19:45:09 +02:00 |
|
Matthew Honnibal
|
9815c7649e
|
* Refactor around Word objects, adapting tests. Tests passing, except for string views.
|
2014-08-23 19:55:06 +02:00 |
|
Matthew Honnibal
|
01469b0888
|
* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.
|
2014-08-18 19:14:00 +02:00 |
|
Matthew Honnibal
|
e4263a241a
|
* Tests passing for reorganized version
|
2014-07-07 04:23:46 +02:00 |
|