Commit Graph

12 Commits

Author SHA1 Message Date
Matthew Honnibal
63114820cf * Upd tests for tighter interface 2014-10-30 18:15:30 +11:00
Matthew Honnibal
13909a2e24 * Rewriting Lexeme serialization. 2014-10-29 23:19:38 +11:00
Matthew Honnibal
08ce602243 * Large refactor, particularly to Python API 2014-10-24 00:59:17 +11:00
Matthew Honnibal
6fb42c4919 * Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang 2014-10-14 16:17:45 +11:00
Matthew Honnibal
db191361ee * Add new tests for fancier tokenization cases 2014-09-15 06:31:58 +02:00
Matthew Honnibal
5dcc1a426a * Update tokenization tests for new tokenizer rules 2014-09-15 01:32:51 +02:00
Matthew Honnibal
985bc68327 * Fix bug with trailing punct on contractions. Reduced efficiency, and slightly hacky implementation. 2014-09-12 18:26:26 +02:00
Matthew Honnibal
b5b31c6b6e * Avoid testing for object identity 2014-09-10 20:58:30 +02:00
Matthew Honnibal
c282e6d5fb * Redesign proceeding 2014-08-28 19:45:09 +02:00
Matthew Honnibal
9815c7649e * Refactor around Word objects, adapting tests. Tests passing, except for string views. 2014-08-23 19:55:06 +02:00
Matthew Honnibal
01469b0888 * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. 2014-08-18 19:14:00 +02:00
Matthew Honnibal
e4263a241a * Tests passing for reorganized version 2014-07-07 04:23:46 +02:00