Matthew Honnibal
|
8423e8627f
|
Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.
|
2016-09-30 10:14:47 +02:00 |
|
Stefan Behnel
|
f2cfbfc412
|
remove internal redundancy and overhead from StringStore
|
2016-03-24 15:25:27 +01:00 |
|
Matthew Honnibal
|
864a8f45d8
|
* Use unicode in StringStore.intern, instead of unreliably casting to bytes.
|
2015-11-05 11:32:19 +00:00 |
|
Matthew Honnibal
|
109106a949
|
* Replace UniStr, using unicode objects instead
|
2015-07-22 04:52:05 +02:00 |
|
Matthew Honnibal
|
01a97b90f3
|
* Fix header for string store
|
2015-07-20 12:06:10 +02:00 |
|
Matthew Honnibal
|
4dddc8a69b
|
* Fix type declarations for attr_t. Remove unused id_t.
|
2015-07-18 22:39:57 +02:00 |
|
Matthew Honnibal
|
95e57c2780
|
* Remove unnecessary key and id properties from Utf8String.
|
2015-07-17 01:40:18 +02:00 |
|
Matthew Honnibal
|
ce2edd6312
|
* Tmp commit. Refactoring to create a Python Lexeme class.
|
2015-01-12 10:26:22 +11:00 |
|
Matthew Honnibal
|
73f200436f
|
* Tests passing except for morphology/lemmatization stuff
|
2014-12-23 11:40:32 +11:00 |
|
Matthew Honnibal
|
cf8d26c3d2
|
* POS tagger training working after reorg
|
2014-12-22 08:54:47 +11:00 |
|
Matthew Honnibal
|
4c4aa2c5c9
|
* Work on train
|
2014-12-22 07:25:43 +11:00 |
|
Matthew Honnibal
|
e1c1a4b868
|
* Tmp
|
2014-12-21 05:36:29 +11:00 |
|
Matthew Honnibal
|
89a1cc1a48
|
* Move murmurhash to .pxd in strings file
|
2014-12-20 07:41:08 +11:00 |
|
Matthew Honnibal
|
7d48bba6c4
|
* Move StringStore class to its own file
|
2014-12-20 06:42:01 +11:00 |
|