Matthew Honnibal
|
3de1b3ef1d
|
* Change get_freqs to take a list of files
|
2015-07-14 10:55:56 +02:00 |
|
Matthew Honnibal
|
935ac53ee3
|
* Extend count_by method
|
2015-07-14 03:20:09 +02:00 |
|
Matthew Honnibal
|
39c93116eb
|
* Add get_freqs script
|
2015-07-14 02:31:32 +02:00 |
|
Matthew Honnibal
|
3b5baa660f
|
* Fix tokenizer
|
2015-07-14 00:10:51 +02:00 |
|
Matthew Honnibal
|
2ae0b439b2
|
* Fix space check in gold.pyx
|
2015-07-14 00:10:27 +02:00 |
|
Matthew Honnibal
|
81aa4e6dcc
|
* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API
|
2015-07-14 00:10:11 +02:00 |
|
Matthew Honnibal
|
e1c702e498
|
* Upd tests after refactor
|
2015-07-14 00:08:50 +02:00 |
|
Matthew Honnibal
|
ba9a22ae0b
|
* Ignore cpp files in spacy/tokens
|
2015-07-13 22:30:15 +02:00 |
|
Matthew Honnibal
|
98382bd7a0
|
* Update tests after refactor
|
2015-07-13 22:30:01 +02:00 |
|
Matthew Honnibal
|
d87d71caf4
|
* Compile the new modules after refactor
|
2015-07-13 22:29:33 +02:00 |
|
Matthew Honnibal
|
24d6ce99ec
|
* Add comment to tokenizer, explaining the spacy attr
|
2015-07-13 22:29:13 +02:00 |
|
Matthew Honnibal
|
8214b74eec
|
* Restore _py_tokens cache, to handle orphan tokens.
|
2015-07-13 22:28:10 +02:00 |
|
Matthew Honnibal
|
67641f3b58
|
* Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string
|
2015-07-13 21:46:02 +02:00 |
|
Matthew Honnibal
|
6eef0bf9ab
|
* Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx
|
2015-07-13 20:20:58 +02:00 |
|
Matthew Honnibal
|
3ea8756c24
|
* Add spacy/tokens/doc.pyx, for Doc class in its own file
|
2015-07-13 19:58:26 +02:00 |
|
Matthew Honnibal
|
c99387155f
|
* Refactor tokens, moving classes into a module instead of a single file
|
2015-07-13 19:49:55 +02:00 |
|
Matthew Honnibal
|
d27899658e
|
* Import classes in spacy.tokens.__init__
|
2015-07-13 19:48:55 +02:00 |
|
Matthew Honnibal
|
aa82caf8f5
|
* Add TokenC.spacy attr
|
2015-07-13 19:48:07 +02:00 |
|
Matthew Honnibal
|
dba6b47d4e
|
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference
|
2015-07-13 19:20:48 +02:00 |
|
Matthew Honnibal
|
5b0a7190c9
|
* Round-trip for serialization finally working. Needs a lot of optimization.
|
2015-07-13 18:39:38 +02:00 |
|
Matthew Honnibal
|
edd371246c
|
* Make huffman coder take BitArray in encode/decode. Add __iter__ method to BitArray.
|
2015-07-13 17:33:33 +02:00 |
|
Matthew Honnibal
|
af5cc926a4
|
* Add codec property to Vocab, to use the Huffman encoding
|
2015-07-13 13:55:14 +02:00 |
|
Matthew Honnibal
|
77385d5580
|
* Make .pxd file for huffman codec
|
2015-07-13 13:54:51 +02:00 |
|
Matthew Honnibal
|
0628e0e2a8
|
* Add tests for huffman encoding
|
2015-07-13 12:58:07 +02:00 |
|
Matthew Honnibal
|
083b6ea7ae
|
* Clean up encoder a bit. now read for integration into Vocab.
|
2015-07-13 12:57:22 +02:00 |
|
Matthew Honnibal
|
8d0f1d98da
|
* Draft dockstring for HuffmanCache
|
2015-07-13 12:01:18 +02:00 |
|
Matthew Honnibal
|
281f1faefb
|
* Nearly finished huffman coder
|
2015-07-12 23:48:46 +02:00 |
|
Matthew Honnibal
|
e1a25fba32
|
* Work on huffman coder
|
2015-07-12 19:58:05 +02:00 |
|
Matthew Honnibal
|
3fb9de2d13
|
* Remove vector[bint], in favor of simple Code struct.
|
2015-07-12 17:58:27 +02:00 |
|
Matthew Honnibal
|
aa7bfd932b
|
* Work on compressor
|
2015-07-12 16:03:43 +02:00 |
|
Matthew Honnibal
|
14eafcab15
|
* Refactor to use vector[bint]
|
2015-07-12 05:27:47 +02:00 |
|
Matthew Honnibal
|
6a6e852a39
|
* Refactor huffman coding stuff into class
|
2015-07-12 05:06:36 +02:00 |
|
Matthew Honnibal
|
aad96fdb5c
|
* Improve efficiency of huffman coding
|
2015-07-12 01:31:37 +02:00 |
|
Matthew Honnibal
|
ff9ff6f3fa
|
* Ensure unseen words are given low log probability
|
2015-07-12 01:31:09 +02:00 |
|
Matthew Honnibal
|
9d3b0d83de
|
* Refactor huffman coding
|
2015-07-11 22:27:43 +02:00 |
|
Matthew Honnibal
|
8d29406cd6
|
* Rename span.right to span.rights
|
2015-07-11 22:15:04 +02:00 |
|
Matthew Honnibal
|
da9f358166
|
* Fix span getting
|
2015-07-11 21:41:41 +02:00 |
|
Matthew Honnibal
|
11e8f2ffb4
|
* Huffman codes working
|
2015-07-11 20:01:10 +02:00 |
|
Matthew Honnibal
|
cb6fc81909
|
* Work on huffman coding.
|
2015-07-11 15:23:35 +02:00 |
|
Matthew Honnibal
|
4c9b77fe95
|
* Begin working on serialization code
|
2015-07-11 10:57:30 +02:00 |
|
Matthew Honnibal
|
11a380e00f
|
* Draft v0.89 update notes
|
2015-07-10 19:41:42 +02:00 |
|
Matthew Honnibal
|
53d1f5b2eb
|
* Rename Span.head to Span.root.
|
2015-07-09 17:30:58 +02:00 |
|
Matthew Honnibal
|
c0255ed7d8
|
* Allow slice indexing in Doc.__getitem__, returning a Span object
|
2015-07-09 15:15:32 +02:00 |
|
Matthew Honnibal
|
7d2964f673
|
* Test that whitespace is not assigned a tag
|
2015-07-09 13:31:40 +02:00 |
|
Matthew Honnibal
|
b5223c4824
|
* Add whitespace to specials.json
|
2015-07-09 13:31:12 +02:00 |
|
Matthew Honnibal
|
89a91ad726
|
* Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity
|
2015-07-09 13:30:41 +02:00 |
|
Matthew Honnibal
|
f95da0bd52
|
* Allow tests to read model dir from SPACY_DATA environment variable
|
2015-07-09 12:18:02 +02:00 |
|
Matthew Honnibal
|
55f1042443
|
* Improve efficiency of L and R features, correcting the non-linear-in-length problem.
|
2015-07-09 12:17:26 +02:00 |
|
Matthew Honnibal
|
70d2acb579
|
* Fix edge features
|
2015-07-09 12:15:01 +02:00 |
|
Matthew Honnibal
|
8a7bbd5850
|
* Announce v0.88
|
2015-07-09 12:12:45 +02:00 |
|