Commit Graph

  • 5b4c78bbb2 * Use an AttributeCodec based on orth for words. Still no oov handling mechanism. Matthew Honnibal 2015-07-18 22:43:18 +0200
  • 82d84b0f2b * Index lexemes by orth, instead of a lexemes vector. Breaks the mechanism for deciding not to own LexemeC structs during parsing. Need to reinstate this. Matthew Honnibal 2015-07-18 22:42:15 +0200
  • 4dddc8a69b * Fix type declarations for attr_t. Remove unused id_t. Matthew Honnibal 2015-07-18 22:39:57 +0200
  • ced59ab9ea * Make minor efficiency improvement in Doc.__iter__ Matthew Honnibal 2015-07-18 04:10:53 +0200
  • cd91914dd8 * Fix hard-coded length Matthew Honnibal 2015-07-18 04:09:56 +0200
  • b1d74ce60d * Remove unused joint.pyx and joint.pxd files Matthew Honnibal 2015-07-17 23:31:44 +0200
  • c27514512b * Remove cruft ner/ directory Matthew Honnibal 2015-07-17 23:24:32 +0200
  • f8d6d319f4 * Remove cruft module Matthew Honnibal 2015-07-17 23:23:05 +0200
  • fb0a641a2d * Don't release the gil around Parser.parse. Does this indicate thread problems? Matthew Honnibal 2015-07-17 23:07:37 +0200
  • a6ff7e6ca4 * Fix redundant options in train.py Matthew Honnibal 2015-07-17 22:38:05 +0200
  • e29daea85f * Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool*, but in C it means int*. So, type-casting to bint* is unsafe. Matthew Honnibal 2015-07-17 22:37:24 +0200
  • 6cfa83157e Merge branch 'refactor' of ssh://github.com/honnibal/spaCy into refactor Matthew Honnibal 2015-07-17 21:38:04 +0200
  • f7f0ad1a78 * Fix tests Matthew Honnibal 2015-07-17 21:31:44 +0200
  • 68374149ae * Move huffman encoding test to tests/serialize directory Matthew Honnibal 2015-07-17 21:22:18 +0200
  • e950f5a408 * Tests for serializer Matthew Honnibal 2015-07-17 21:21:10 +0200
  • cf0c788892 * Tests passing on round-trip pack/unpack on basic example Matthew Honnibal 2015-07-17 21:20:48 +0200
  • 44f39a876f * Add a blank attrs.pyx Matthew Honnibal 2015-07-17 16:40:42 +0200
  • c2c83120d4 * Remove codec property from Vocab Matthew Honnibal 2015-07-17 16:40:11 +0200
  • dfdf19f6a9 * Draft a from_orth method for Doc Matthew Honnibal 2015-07-17 16:39:54 +0200
  • a9149fdcbd * Compile attrs.pyx Matthew Honnibal 2015-07-17 16:39:25 +0200
  • 9e3f17051b * Move to ORTH instead of ID for encoding lexemes. Basic tests of the codec wrappers now passing Matthew Honnibal 2015-07-17 16:38:29 +0200
  • 15ff739996 * Fix passing of ID attribute in string store Matthew Honnibal 2015-07-17 14:49:42 +0200
  • 95e57c2780 * Remove unnecessary key and id properties from Utf8String. Matthew Honnibal 2015-07-16 19:29:02 +0200
  • 234c7e440a * Add spacy/serialize/__init__ files Matthew Honnibal 2015-07-17 01:37:33 +0200
  • 221f7e51c7 * Ignore spacy/serialize/*.cpp Matthew Honnibal 2015-07-17 01:36:49 +0200
  • db9dfd2e23 * Major refactor of serialization. Nearly complete now. Matthew Honnibal 2015-07-17 01:19:29 +0200
  • c8282f9934 * Work on serialization. Needs more reorganisation Matthew Honnibal 2015-07-16 19:55:47 +0200
  • d8458d6a25 * Fix attr_id_t import in Spans Matthew Honnibal 2015-07-16 19:55:21 +0200
  • d1cb30dbc4 * Remove unnecessary key and id properties from Utf8String. Matthew Honnibal 2015-07-16 19:29:02 +0200
  • 897de2d438 * Add 'bitter' property for serializer in English class Matthew Honnibal 2015-07-16 17:47:53 +0200
  • fb54052ae0 * Work on serializer design Matthew Honnibal 2015-07-16 17:46:46 +0200
  • a6f401580d * Add from_array function to Doc. Matthew Honnibal 2015-07-16 17:46:11 +0200
  • 2a5d050134 * Give codec loading back to Vocab. Matthew Honnibal 2015-07-16 17:45:42 +0200
  • 8bf0f65f1c * Remove dead code in strings.pyx Matthew Honnibal 2015-07-16 17:35:53 +0200
  • a9c3863665 * Fix inefficiency in StringStore.dump function Matthew Honnibal 2015-07-16 17:34:32 +0200
  • b59d271510 * Move serialization functionality into Serializer class Matthew Honnibal 2015-07-16 11:23:48 +0200
  • 30be4f15da * Import attrs from spacy.attrs, not spacy.typedefs Matthew Honnibal 2015-07-16 11:23:25 +0200
  • 6c99e5f4aa * Move serialization into Serializer class, with __call__ and train() api Matthew Honnibal 2015-07-16 11:22:35 +0200
  • e2133d990e * Move serialization functionality out into a Serializer object Matthew Honnibal 2015-07-16 11:21:44 +0200
  • a6d040bd11 * Import Lexeme attrs from spacy.attrs, not spacy.typedefs Matthew Honnibal 2015-07-16 11:20:08 +0200
  • d8bc279e0c * Fix 'you' contraction capitals in specials.json Matthew Honnibal 2015-07-16 01:28:32 +0200
  • 45ae1ce428 * Remove unused declaration in parser Matthew Honnibal 2015-07-16 01:27:11 +0200
  • efa80096f1 * Upd attrs id list Matthew Honnibal 2015-07-16 01:26:54 +0200
  • 01fab6bb90 * Improve de/serialize functions Matthew Honnibal 2015-07-16 01:26:35 +0200
  • 0e07c1ed2a * draft de/serialization functions in doc.pyx Matthew Honnibal 2015-07-16 01:16:33 +0200
  • 9d956b07e9 * Fix import of attrs in doc.pyx, and update the get_token_attr function. Matthew Honnibal 2015-07-16 01:15:34 +0200
  • 65251e7625 * Remove redundant attr_id_t from typedefs.pxd Matthew Honnibal 2015-07-16 00:58:51 +0200
  • 9a8db9743c * Remove gil from parser.call Matthew Honnibal 2015-07-14 23:47:03 +0200
  • 3c1e3e9ee8 * Fix capitalization problems in specials.json Matthew Honnibal 2015-07-14 23:46:31 +0200
  • 38ca0c33f5 Merge branch 'neuralnet' into refactor Matthew Honnibal 2015-07-14 14:11:23 +0200
  • 6405d2384c * Add first draft of annotation standards doc Matthew Honnibal 2015-07-14 12:50:13 +0200
  • af54d05d60 * Remove sense stuff from init_model Matthew Honnibal 2015-07-14 10:56:17 +0200
  • 3de1b3ef1d * Change get_freqs to take a list of files Matthew Honnibal 2015-07-14 10:55:56 +0200
  • 935ac53ee3 * Extend count_by method Matthew Honnibal 2015-07-14 03:20:09 +0200
  • 39c93116eb * Add get_freqs script Matthew Honnibal 2015-07-14 02:31:32 +0200
  • 3b5baa660f * Fix tokenizer Matthew Honnibal 2015-07-14 00:10:51 +0200
  • 2ae0b439b2 * Fix space check in gold.pyx Matthew Honnibal 2015-07-14 00:10:27 +0200
  • 81aa4e6dcc * Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API Matthew Honnibal 2015-07-14 00:10:11 +0200
  • e1c702e498 * Upd tests after refactor Matthew Honnibal 2015-07-14 00:08:50 +0200
  • ba9a22ae0b * Ignore cpp files in spacy/tokens Matthew Honnibal 2015-07-13 22:30:15 +0200
  • 98382bd7a0 * Update tests after refactor Matthew Honnibal 2015-07-13 22:30:01 +0200
  • d87d71caf4 * Compile the new modules after refactor Matthew Honnibal 2015-07-13 22:29:33 +0200
  • 24d6ce99ec * Add comment to tokenizer, explaining the spacy attr Matthew Honnibal 2015-07-13 22:29:13 +0200
  • 8214b74eec * Restore _py_tokens cache, to handle orphan tokens. Matthew Honnibal 2015-07-13 22:28:10 +0200
  • 67641f3b58 * Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string Matthew Honnibal 2015-07-13 21:46:02 +0200
  • 6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx Matthew Honnibal 2015-07-13 20:20:58 +0200
  • 3ea8756c24 * Add spacy/tokens/doc.pyx, for Doc class in its own file Matthew Honnibal 2015-07-13 19:58:26 +0200
  • c99387155f * Refactor tokens, moving classes into a module instead of a single file Matthew Honnibal 2015-07-13 19:49:55 +0200
  • d27899658e * Import classes in spacy.tokens.__init__ Matthew Honnibal 2015-07-13 19:48:55 +0200
  • aa82caf8f5 * Add TokenC.spacy attr Matthew Honnibal 2015-07-13 19:48:07 +0200
  • dba6b47d4e * Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference Matthew Honnibal 2015-07-13 19:20:48 +0200
  • 5b0a7190c9 * Round-trip for serialization finally working. Needs a lot of optimization. Matthew Honnibal 2015-07-13 18:39:38 +0200
  • edd371246c * Make huffman coder take BitArray in encode/decode. Add __iter__ method to BitArray. Matthew Honnibal 2015-07-13 17:33:33 +0200
  • af5cc926a4 * Add codec property to Vocab, to use the Huffman encoding Matthew Honnibal 2015-07-13 13:55:14 +0200
  • 77385d5580 * Make .pxd file for huffman codec Matthew Honnibal 2015-07-13 13:54:51 +0200
  • 0628e0e2a8 * Add tests for huffman encoding Matthew Honnibal 2015-07-13 12:58:07 +0200
  • 083b6ea7ae * Clean up encoder a bit. now read for integration into Vocab. Matthew Honnibal 2015-07-13 12:57:22 +0200
  • 8d0f1d98da * Draft dockstring for HuffmanCache Matthew Honnibal 2015-07-13 12:01:18 +0200
  • 281f1faefb * Nearly finished huffman coder Matthew Honnibal 2015-07-12 23:48:46 +0200
  • e1a25fba32 * Work on huffman coder Matthew Honnibal 2015-07-12 19:58:05 +0200
  • 3fb9de2d13 * Remove vector[bint], in favor of simple Code struct. Matthew Honnibal 2015-07-12 17:58:27 +0200
  • aa7bfd932b * Work on compressor Matthew Honnibal 2015-07-12 16:03:43 +0200
  • 14eafcab15 * Refactor to use vector[bint] Matthew Honnibal 2015-07-12 05:27:47 +0200
  • 6a6e852a39 * Refactor huffman coding stuff into class Matthew Honnibal 2015-07-12 05:06:36 +0200
  • aad96fdb5c * Improve efficiency of huffman coding Matthew Honnibal 2015-07-12 01:31:37 +0200
  • ff9ff6f3fa * Ensure unseen words are given low log probability Matthew Honnibal 2015-07-12 01:31:09 +0200
  • 9d3b0d83de * Refactor huffman coding Matthew Honnibal 2015-07-11 22:27:43 +0200
  • 8d29406cd6 * Rename span.right to span.rights Matthew Honnibal 2015-07-11 22:15:04 +0200
  • da9f358166 * Fix span getting Matthew Honnibal 2015-07-11 21:41:41 +0200
  • 11e8f2ffb4 * Huffman codes working Matthew Honnibal 2015-07-11 20:01:10 +0200
  • cb6fc81909 * Work on huffman coding. Matthew Honnibal 2015-07-11 15:23:35 +0200
  • 4c9b77fe95 * Begin working on serialization code Matthew Honnibal 2015-07-11 10:57:30 +0200
  • 11a380e00f * Draft v0.89 update notes Matthew Honnibal 2015-07-10 19:41:42 +0200
  • 53d1f5b2eb * Rename Span.head to Span.root. Matthew Honnibal 2015-07-09 17:30:58 +0200
  • c0255ed7d8 * Allow slice indexing in Doc.__getitem__, returning a Span object Matthew Honnibal 2015-07-09 15:15:32 +0200
  • 7d2964f673 * Test that whitespace is not assigned a tag Matthew Honnibal 2015-07-09 13:31:40 +0200
  • b5223c4824 * Add whitespace to specials.json Matthew Honnibal 2015-07-09 13:31:12 +0200
  • 89a91ad726 * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity Matthew Honnibal 2015-07-09 13:30:41 +0200
  • f95da0bd52 * Allow tests to read model dir from SPACY_DATA environment variable Matthew Honnibal 2015-07-09 12:18:02 +0200
  • 55f1042443 * Improve efficiency of L and R features, correcting the non-linear-in-length problem. Matthew Honnibal 2015-07-09 12:17:26 +0200