Matthew Honnibal
|
67c8c8019f
|
* Update lexeme serialization, using a binary file format
|
2014-10-30 01:01:00 +11:00 |
|
Matthew Honnibal
|
43d5964e13
|
* Add function to read detokenization rules
|
2014-10-22 12:54:59 +11:00 |
|
Matthew Honnibal
|
12742f4f83
|
* Add detokenize method and test
|
2014-10-18 18:07:29 +11:00 |
|
Matthew Honnibal
|
6fb42c4919
|
* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang
|
2014-10-14 16:17:45 +11:00 |
|
Matthew Honnibal
|
e40caae51f
|
* Update Lexicon class to expect a list of lexeme dict descriptions
|
2014-10-09 14:51:35 +11:00 |
|
Matthew Honnibal
|
2e44fa7179
|
* Add util.py
|
2014-09-25 18:26:22 +02:00 |
|
Matthew Honnibal
|
e9a62b6eba
|
* Refactoring with Lexeme as a class now compiles. Basic design seems to work
|
2014-08-27 17:15:39 +02:00 |
|
Matthew Honnibal
|
d10993f41a
|
* More docs work
|
2014-08-21 16:37:13 +02:00 |
|
Matthew Honnibal
|
3379d7a571
|
* Reforming data model for lexemes
|
2014-08-19 02:40:37 +02:00 |
|
Matthew Honnibal
|
01469b0888
|
* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.
|
2014-08-18 19:14:00 +02:00 |
|
Matthew Honnibal
|
ff1869ff07
|
* Fixed major efficiency problem, from not quite grokking pass by reference in cython c++
|
2014-07-07 07:36:43 +02:00 |
|
Matthew Honnibal
|
25849fc926
|
* Generalize tokenization rules to capitals
|
2014-07-07 05:07:21 +02:00 |
|
Matthew Honnibal
|
4e79446dc2
|
* Reading in tokenization rules correctly. Passing tests.
|
2014-07-07 00:02:55 +02:00 |
|
Matthew Honnibal
|
556f6a18ca
|
* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.
|
2014-07-05 20:51:42 +02:00 |
|