Commit Graph

  • 9d3ca13909 * Start work on parse-tree iteration classes Matthew Honnibal 2014-12-20 03:48:10 +1100
  • bed680c632 * Remove commented-out features Matthew Honnibal 2014-12-20 03:47:32 +1100
  • 3d178c03ae * Prune the features a bit Matthew Honnibal 2014-12-20 02:46:14 +1100
  • a0408e1758 * Working DecisionMemory class Matthew Honnibal 2014-12-20 01:43:26 +1100
  • 7920ea72b4 * Working parser with the decision memory idea. Disabling that for now, for simplicity Matthew Honnibal 2014-12-20 01:43:15 +1100
  • a2f2a48da9 * Add some extra features Matthew Honnibal 2014-12-20 01:42:24 +1100
  • 8fd9762d91 * Start laying out parse tree iteration methods Matthew Honnibal 2014-12-20 01:42:09 +1100
  • 53b8bc1f3c * Work on implementing a trainable cache for the parser. So far, doesn't improve efficiency Matthew Honnibal 2014-12-19 09:30:50 +1100
  • 033d6c9ac2 * Adapt POS tagger decision-memory for use in parser Matthew Honnibal 2014-12-19 07:23:04 +1100
  • 809ddf7887 * Add index.pxd Matthew Honnibal 2014-12-19 07:23:00 +1100
  • 1879abd16a * Set const-correctness for tagger Matthew Honnibal 2014-12-18 20:41:52 +1100
  • f72243b156 * Set const-correctness for Feature* array Matthew Honnibal 2014-12-18 20:41:32 +1100
  • 6ab7e40590 * Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set Matthew Honnibal 2014-12-18 11:33:25 +1100
  • 7e0c692daf * Automatically push when the stack is empty Matthew Honnibal 2014-12-18 09:16:10 +1100
  • 61142a8eff * Tweak features Matthew Honnibal 2014-12-18 09:15:03 +1100
  • e3b123e6e0 * Ignore cpp files from parser Matthew Honnibal 2014-12-18 09:05:51 +1100
  • 8446ebfbbb * Work on parser. Up to 92 UAS on YM labels Matthew Honnibal 2014-12-18 09:05:31 +1100
  • 55de747bfc * Remove .cpp files Matthew Honnibal 2014-12-18 02:43:13 +1100
  • 4448a840f7 * Work on greedy parsing. Scoring about 91.2 Matthew Honnibal 2014-12-18 02:42:55 +1100
  • 87e9487d76 * Work on parser Matthew Honnibal 2014-12-17 21:10:12 +1100
  • 9d7d97978d * Work on greedy parser Matthew Honnibal 2014-12-17 21:09:29 +1100
  • d524dd306a * Work on greedy parser Matthew Honnibal 2014-12-17 03:19:43 +1100
  • 95ccea03b2 * Work on greedy parser Matthew Honnibal 2014-12-16 22:44:43 +1100
  • a432862fde * Add exception type to _arg_max_among in tagger Matthew Honnibal 2014-12-16 09:44:19 +1100
  • 9e00798820 * Work on integrating a greedy dependency parser Matthew Honnibal 2014-12-16 08:06:04 +1100
  • 24ffc32f2f * Another redraft of index.rst Matthew Honnibal 2014-12-15 16:32:03 +1100
  • 77dd7a212a * More thoughts on intro Matthew Honnibal 2014-12-15 09:19:29 +1100
  • 792802b2b9 * POS tag memoisation working, with good speed-up Matthew Honnibal 2014-12-12 14:33:51 +1100
  • ca54d58638 * Merge setup.py Matthew Honnibal 2014-12-10 15:21:27 +1100
  • 9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. Matthew Honnibal 2014-12-10 08:09:32 +1100
  • 7831b06610 * Compile morphology.pyx file Matthew Honnibal 2014-12-10 08:09:13 +1100
  • df3be14987 * Add pos_type features to POS tagger Matthew Honnibal 2014-12-10 08:08:55 +1100
  • 42973c4b37 * Improve efficiency of tagger, and improve morphological processing Matthew Honnibal 2014-12-10 01:02:04 +1100
  • 6b34a2f34b * Move morphological analysis into its own module, morphology.pyx Matthew Honnibal 2014-12-09 21:16:17 +1100
  • b962fe73d7 * Make suffixes file use full-power regex, so that we can handle periods properly Matthew Honnibal 2014-12-09 19:04:27 +1100
  • accdbe989b * Remove Tokens.extend method Matthew Honnibal 2014-12-09 17:09:23 +1100
  • 495e1c7366 * Use fused type in Tokens.push_back, simplifying the use of the cache Matthew Honnibal 2014-12-09 16:50:01 +1100
  • 516f0f1e14 * Remove test for loading ad hoc rules format Matthew Honnibal 2014-12-09 16:08:45 +1100
  • 6369835306 * Add false positive test for emoticons Matthew Honnibal 2014-12-09 16:08:17 +1100
  • f15deaad5b * Upd docs Matthew Honnibal 2014-12-09 16:08:01 +1100
  • 1ccabc806e * Work on lemmatization Matthew Honnibal 2014-12-09 16:06:18 +1100
  • 2a6bd2818f * Load the lexicon before we check flag values Matthew Honnibal 2014-12-09 15:18:43 +1100
  • 302e09018b * Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas Matthew Honnibal 2014-12-09 14:48:01 +1100
  • cda9ea9a4a * Add test to make sure iterating over the lexicon isnt broken Matthew Honnibal 2014-12-08 21:12:51 +1100
  • 99bbbb6feb * Work on morphological processing Matthew Honnibal 2014-12-08 21:12:15 +1100
  • 7b68f911cf * Add WordNet lemmatizer Matthew Honnibal 2014-12-08 01:39:13 +1100
  • c20dd79748 * Fiddle with const correctness and comments Matthew Honnibal 2014-12-08 00:03:55 +1100
  • b031c7c430 * Remove language-general context module Matthew Honnibal 2014-12-07 23:53:01 +1100
  • ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules Matthew Honnibal 2014-12-07 23:52:41 +1100
  • 327383e38a * Remove unused code in tagger.pyx Matthew Honnibal 2014-12-07 22:14:51 +1100
  • 8f2f319c57 * Add a couple more contractions tests Matthew Honnibal 2014-12-07 22:08:04 +1100
  • 9f17467c2e * Fix EMPTY_TOKEN Matthew Honnibal 2014-12-07 22:07:41 +1100
  • 3819a88e1b * Add support for tag dictionary, and fix error-code for predict method Matthew Honnibal 2014-12-07 22:07:16 +1100
  • f00afe12c4 * Load POS tagger in load() function if path exists Matthew Honnibal 2014-12-07 22:05:57 +1100
  • 677e111ee7 * Revise tokenization rules to match PTB. Rules are pretty messy around periods, need better support for these. Matthew Honnibal 2014-12-07 22:04:47 +1100
  • 5fe5e6e66b * Move context functions to header, inlining them. Matthew Honnibal 2014-12-07 21:59:04 +1100
  • 91e8d9ea1c * Compile context.pyx and tagger.pyx modules Matthew Honnibal 2014-12-07 15:29:54 +1100
  • 5caabec789 * Link in tagger, to work on integrating POS tagging Matthew Honnibal 2014-12-07 15:29:41 +1100
  • 0c7aeb9de7 * Begin revising tagger, focussing on POS tagging Matthew Honnibal 2014-12-07 15:29:04 +1100
  • f5c4f2eb52 * Revise context, focussing on POS tagging for now Matthew Honnibal 2014-12-07 15:28:22 +1100
  • e27b912ef9 * Remove need for confusing _data pointer to be stored on Tokens Matthew Honnibal 2014-12-05 16:31:30 +1100
  • 1c9253701d * Introduce a TokenC struct, to handle token indices, pos tags and sense tags Matthew Honnibal 2014-12-05 15:56:14 +1100
  • 187372c7f3 * Allow the lexicon to create lexemes using an external memory pool, so that it can decide to make some lexemes temporary, rather than cached Matthew Honnibal 2014-12-05 03:29:50 +1100
  • 75b8dfb348 * Remove upper_pc from lexeme.pyx Matthew Honnibal 2014-12-04 22:14:34 +1100
  • a14f9eaf63 * Add index.pyx to setup Matthew Honnibal 2014-12-04 22:14:11 +1100
  • 49f3780ff5 * Fiddle with lexeme attrs Matthew Honnibal 2014-12-04 21:22:38 +1100
  • 564082e48e * Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed... Matthew Honnibal 2014-12-04 20:51:29 +1100
  • 69bb022204 * Add as_array and count_by method Matthew Honnibal 2014-12-04 20:46:55 +1100
  • e1b1f45cc9 * Add STEM attribute to lexeme Matthew Honnibal 2014-12-04 20:46:20 +1100
  • d7952634ca * Make the string-store serve const pointers to Utf8Str Matthew Honnibal 2014-12-03 16:01:47 +1100
  • 7e04c22f8f * const added to Lexicon interface. Seems to work. Matthew Honnibal 2014-12-03 15:58:17 +1100
  • d70d31aa45 * Introduce first attempt at const-ness Matthew Honnibal 2014-12-03 15:44:25 +1100
  • d0d812c548 * Hack setup.py to exclude tagger stuff Matthew Honnibal 2014-12-03 11:06:57 +1100
  • 4560ada85b * Add typedef for attr_t. Change flag_t to flags_t Matthew Honnibal 2014-12-03 11:06:31 +1100
  • e600f7b327 * Move String struct stuff into the utf8string module, from spacy.lang Matthew Honnibal 2014-12-03 11:06:00 +1100
  • e170faf5b0 * Hack Tokens to work without tagger.pyx Matthew Honnibal 2014-12-03 11:05:15 +1100
  • b463a7eb86 * Make flag-setting a language-specific thing Matthew Honnibal 2014-12-03 11:04:00 +1100
  • 71b009e323 * Fix bug in refactored StringStore.__getitem__ Matthew Honnibal 2014-12-03 11:02:24 +1100
  • 14097311ae * Make StringStore.__getitem__ accept unicode-typed keys. Matthew Honnibal 2014-12-03 01:33:20 +1100
  • 522bb0346e * Work on get_array method of Tokens Matthew Honnibal 2014-12-02 23:48:05 +1100
  • 8c2938fe01 * Rename Lexicon._dict to Lexicon._map Matthew Honnibal 2014-12-02 23:46:59 +1100
  • 2ee8a1e61f * Make intro chattier, explain philosophy better Matthew Honnibal 2014-12-02 15:20:18 +1100
  • ea19850a69 * Add tokenizer section Matthew Honnibal 2014-12-02 04:39:12 +1100
  • 3430d5f629 * Revise intro copy. Add NLTK comparison Matthew Honnibal 2014-12-01 22:55:13 +1100
  • 33dfb4933c * Remove taggers from Language class. Work on doc strings Matthew Honnibal 2014-11-26 19:53:29 +1100
  • 80baa2e3db * Work on beam parser Matthew Honnibal 2014-11-20 19:49:33 +1100
  • 5c3016bac8 * Tmp commit of ner code Matthew Honnibal 2014-11-14 18:27:47 +1100
  • 33c421bcf8 * More feature tweaks Matthew Honnibal 2014-11-12 23:59:16 +1100
  • 41dedfb14e * Add label features for NER parsing Matthew Honnibal 2014-11-12 23:55:10 +1100
  • cf55b48ba6 * Switch to predict label on shift. Big increase in accuracy. Matthew Honnibal 2014-11-12 23:50:12 +1100
  • 8f84e8a78b * Neaten oracle Matthew Honnibal 2014-11-12 23:38:07 +1100
  • 66cb4f96e1 * Upd gitignore Matthew Honnibal 2014-11-12 23:25:27 +1100
  • 60c1e78596 * Commit outstanding tests Matthew Honnibal 2014-11-12 23:24:32 +1100
  • 7e0a9077dd * Add context files Matthew Honnibal 2014-11-12 23:22:36 +1100
  • 9b13392ac7 * Add conll experiments Matthew Honnibal 2014-11-12 23:22:05 +1100
  • b934bf1c69 * Compile IOB Matthew Honnibal 2014-11-12 23:21:40 +1100
  • 3b0b902384 * IOB-style parsing working. Accuracy down from BILOU, form 87-88 to 85-86 Matthew Honnibal 2014-11-12 23:21:09 +1100
  • e6bb8aa3a9 * Move moves to bilou_moves. Refactor context, returning to the simpler giant-enum style Matthew Honnibal 2014-11-12 00:54:25 +1100
  • c788633429 * Add tokens_from_list method to Language Matthew Honnibal 2014-11-11 23:43:14 +1100
  • da70b6bd60 * Upd tokenization special-cases Matthew Honnibal 2014-11-11 22:10:15 +1100