Commit Graph

  • 99bbbb6feb * Work on morphological processing Matthew Honnibal 2014-12-08 21:12:15 +1100
  • 7b68f911cf * Add WordNet lemmatizer Matthew Honnibal 2014-12-08 01:39:13 +1100
  • c20dd79748 * Fiddle with const correctness and comments Matthew Honnibal 2014-12-08 00:03:55 +1100
  • b031c7c430 * Remove language-general context module Matthew Honnibal 2014-12-07 23:53:01 +1100
  • ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules Matthew Honnibal 2014-12-07 23:52:41 +1100
  • 327383e38a * Remove unused code in tagger.pyx Matthew Honnibal 2014-12-07 22:14:51 +1100
  • 8f2f319c57 * Add a couple more contractions tests Matthew Honnibal 2014-12-07 22:08:04 +1100
  • 9f17467c2e * Fix EMPTY_TOKEN Matthew Honnibal 2014-12-07 22:07:41 +1100
  • 3819a88e1b * Add support for tag dictionary, and fix error-code for predict method Matthew Honnibal 2014-12-07 22:07:16 +1100
  • f00afe12c4 * Load POS tagger in load() function if path exists Matthew Honnibal 2014-12-07 22:05:57 +1100
  • 677e111ee7 * Revise tokenization rules to match PTB. Rules are pretty messy around periods, need better support for these. Matthew Honnibal 2014-12-07 22:04:47 +1100
  • 5fe5e6e66b * Move context functions to header, inlining them. Matthew Honnibal 2014-12-07 21:59:04 +1100
  • 91e8d9ea1c * Compile context.pyx and tagger.pyx modules Matthew Honnibal 2014-12-07 15:29:54 +1100
  • 5caabec789 * Link in tagger, to work on integrating POS tagging Matthew Honnibal 2014-12-07 15:29:41 +1100
  • 0c7aeb9de7 * Begin revising tagger, focussing on POS tagging Matthew Honnibal 2014-12-07 15:29:04 +1100
  • f5c4f2eb52 * Revise context, focussing on POS tagging for now Matthew Honnibal 2014-12-07 15:28:22 +1100
  • e27b912ef9 * Remove need for confusing _data pointer to be stored on Tokens Matthew Honnibal 2014-12-05 16:31:30 +1100
  • 1c9253701d * Introduce a TokenC struct, to handle token indices, pos tags and sense tags Matthew Honnibal 2014-12-05 15:56:14 +1100
  • 187372c7f3 * Allow the lexicon to create lexemes using an external memory pool, so that it can decide to make some lexemes temporary, rather than cached Matthew Honnibal 2014-12-05 03:29:50 +1100
  • 75b8dfb348 * Remove upper_pc from lexeme.pyx Matthew Honnibal 2014-12-04 22:14:34 +1100
  • a14f9eaf63 * Add index.pyx to setup Matthew Honnibal 2014-12-04 22:14:11 +1100
  • 49f3780ff5 * Fiddle with lexeme attrs Matthew Honnibal 2014-12-04 21:22:38 +1100
  • 564082e48e * Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed... Matthew Honnibal 2014-12-04 20:51:29 +1100
  • 69bb022204 * Add as_array and count_by method Matthew Honnibal 2014-12-04 20:46:55 +1100
  • e1b1f45cc9 * Add STEM attribute to lexeme Matthew Honnibal 2014-12-04 20:46:20 +1100
  • d7952634ca * Make the string-store serve const pointers to Utf8Str Matthew Honnibal 2014-12-03 16:01:47 +1100
  • 7e04c22f8f * const added to Lexicon interface. Seems to work. Matthew Honnibal 2014-12-03 15:58:17 +1100
  • d70d31aa45 * Introduce first attempt at const-ness Matthew Honnibal 2014-12-03 15:44:25 +1100
  • d0d812c548 * Hack setup.py to exclude tagger stuff Matthew Honnibal 2014-12-03 11:06:57 +1100
  • 4560ada85b * Add typedef for attr_t. Change flag_t to flags_t Matthew Honnibal 2014-12-03 11:06:31 +1100
  • e600f7b327 * Move String struct stuff into the utf8string module, from spacy.lang Matthew Honnibal 2014-12-03 11:06:00 +1100
  • e170faf5b0 * Hack Tokens to work without tagger.pyx Matthew Honnibal 2014-12-03 11:05:15 +1100
  • b463a7eb86 * Make flag-setting a language-specific thing Matthew Honnibal 2014-12-03 11:04:00 +1100
  • 71b009e323 * Fix bug in refactored StringStore.__getitem__ Matthew Honnibal 2014-12-03 11:02:24 +1100
  • 14097311ae * Make StringStore.__getitem__ accept unicode-typed keys. Matthew Honnibal 2014-12-03 01:33:20 +1100
  • 522bb0346e * Work on get_array method of Tokens Matthew Honnibal 2014-12-02 23:48:05 +1100
  • 8c2938fe01 * Rename Lexicon._dict to Lexicon._map Matthew Honnibal 2014-12-02 23:46:59 +1100
  • 2ee8a1e61f * Make intro chattier, explain philosophy better Matthew Honnibal 2014-12-02 15:20:18 +1100
  • ea19850a69 * Add tokenizer section Matthew Honnibal 2014-12-02 04:39:12 +1100
  • 3430d5f629 * Revise intro copy. Add NLTK comparison Matthew Honnibal 2014-12-01 22:55:13 +1100
  • 33dfb4933c * Remove taggers from Language class. Work on doc strings Matthew Honnibal 2014-11-26 19:53:29 +1100
  • 80baa2e3db * Work on beam parser Matthew Honnibal 2014-11-20 19:49:33 +1100
  • 5c3016bac8 * Tmp commit of ner code Matthew Honnibal 2014-11-14 18:27:47 +1100
  • 33c421bcf8 * More feature tweaks Matthew Honnibal 2014-11-12 23:59:16 +1100
  • 41dedfb14e * Add label features for NER parsing Matthew Honnibal 2014-11-12 23:55:10 +1100
  • cf55b48ba6 * Switch to predict label on shift. Big increase in accuracy. Matthew Honnibal 2014-11-12 23:50:12 +1100
  • 8f84e8a78b * Neaten oracle Matthew Honnibal 2014-11-12 23:38:07 +1100
  • 66cb4f96e1 * Upd gitignore Matthew Honnibal 2014-11-12 23:25:27 +1100
  • 60c1e78596 * Commit outstanding tests Matthew Honnibal 2014-11-12 23:24:32 +1100
  • 7e0a9077dd * Add context files Matthew Honnibal 2014-11-12 23:22:36 +1100
  • 9b13392ac7 * Add conll experiments Matthew Honnibal 2014-11-12 23:22:05 +1100
  • b934bf1c69 * Compile IOB Matthew Honnibal 2014-11-12 23:21:40 +1100
  • 3b0b902384 * IOB-style parsing working. Accuracy down from BILOU, form 87-88 to 85-86 Matthew Honnibal 2014-11-12 23:21:09 +1100
  • e6bb8aa3a9 * Move moves to bilou_moves. Refactor context, returning to the simpler giant-enum style Matthew Honnibal 2014-11-12 00:54:25 +1100
  • c788633429 * Add tokens_from_list method to Language Matthew Honnibal 2014-11-11 23:43:14 +1100
  • da70b6bd60 * Upd tokenization special-cases Matthew Honnibal 2014-11-11 22:10:15 +1100
  • 95282d4993 * Use the dynamic oracle 'follow' strategy Matthew Honnibal 2014-11-11 21:11:17 +1100
  • 60ffdc2eb7 * Upd fabfile Matthew Honnibal 2014-11-11 21:10:40 +1100
  • d5e9dce039 * Compile ner NER code Matthew Honnibal 2014-11-11 21:10:22 +1100
  • b01604b303 * Upd NER tests Matthew Honnibal 2014-11-11 21:10:04 +1100
  • 5aaf7a024d * Move ner features to ner subdir Matthew Honnibal 2014-11-11 21:09:03 +1100
  • ff8989b63c * Use greedy NER parser Matthew Honnibal 2014-11-11 21:08:35 +1100
  • 0d943ab358 * Fixed greedy NER parsing. With static oracle, replicates accuracy from tagger. Matthew Honnibal 2014-11-11 17:17:54 +1100
  • 399239760b * Fix moves for new State struct Matthew Honnibal 2014-11-10 22:16:05 +1100
  • 82247169f2 * Implement validation and oracle on pystate, for testing Matthew Honnibal 2014-11-10 22:15:18 +1100
  • 3709ed9d6d * Add curr field to State, to handle entity being built Matthew Honnibal 2014-11-10 22:14:36 +1100
  • 10e9e14c4f * Add tests for NER oracle Matthew Honnibal 2014-11-10 22:13:46 +1100
  • af9ed18cf1 * Bug fixes to NER Matthew Honnibal 2014-11-10 17:39:23 +1100
  • d7b2843643 * Add some tests for ner Matthew Honnibal 2014-11-10 16:29:19 +1100
  • 9f2587f5ec * Work on shift-reduce NER Matthew Honnibal 2014-11-10 16:28:56 +1100
  • f307eb2e36 * Refactor context extraction, and start breaking out gold standards into their own functions Matthew Honnibal 2014-11-09 15:43:07 +1100
  • 602f993af9 * Moving tagger to accept multiple correct answers Matthew Honnibal 2014-11-09 15:18:33 +1100
  • 10a33ec725 * Upd fabfile for experiments Matthew Honnibal 2014-11-07 04:44:14 +1100
  • f37d896a42 * Upd NER feats. With adadelta learner, getting 76.9 on NER Matthew Honnibal 2014-11-07 04:43:54 +1100
  • a42321bd4e * Upd shape test Matthew Honnibal 2014-11-07 04:42:54 +1100
  • 68d1cdad62 * When encoding POS/NER tags, accept '-' as a missing value Matthew Honnibal 2014-11-07 04:42:31 +1100
  • 949a6245f9 * Increase default number of iterations from 5 to 10 Matthew Honnibal 2014-11-07 04:42:04 +1100
  • 3cab1d9a29 * Refine word_shape feature, by trimming the max sequence length Matthew Honnibal 2014-11-07 04:41:29 +1100
  • b4454cf036 * Add extra context tokens Matthew Honnibal 2014-11-07 04:40:36 +1100
  • 50309e6e49 * Fix context vector, importing all features Matthew Honnibal 2014-11-05 22:11:39 +1100
  • 07a23768de * Play with NER feats a bit. Up to 82.00 training on MUC7. Matthew Honnibal 2014-11-05 21:47:17 +1100
  • edf739134c * Make make quiet by default, and add a vmake option for verbose make Matthew Honnibal 2014-11-05 20:46:29 +1100
  • dbbb914480 * Upd setup Matthew Honnibal 2014-11-05 20:45:44 +1100
  • 4ecbe8c893 * Complete refactor of Tagger features, to use a generic list of context names. Matthew Honnibal 2014-11-05 20:45:29 +1100
  • 0a8c84625d * Moving feature context stuff to a generalized place Matthew Honnibal 2014-11-05 19:55:10 +1100
  • 3733444101 * Generalize tagger code, in preparation for NER and supersense tagging. Matthew Honnibal 2014-11-05 03:42:14 +1100
  • 81da61f3cf * Remove out-dated POS data test Matthew Honnibal 2014-11-05 02:04:12 +1100
  • 0de700b566 * Comment out tests of hyphenation, while we decide what hyphenation policy should be. Matthew Honnibal 2014-11-05 02:03:22 +1100
  • abbe3e44b0 * Move spacy.pos tagger to spacy.tagger, and generalize it so that it can take on other tagging tasks, given a different set of feature templates. Matthew Honnibal 2014-11-05 00:37:59 +1100
  • 2420d944cb * Upd sales copy Matthew Honnibal 2014-11-04 17:01:54 +1100
  • 954c970415 * Add __iter__ method to tokens Matthew Honnibal 2014-11-04 01:07:08 +1100
  • f07457a91f * Remove POS alignment stuff. Now use training data based on raw text, instead of clumsy detokenization stuff Matthew Honnibal 2014-11-04 01:06:43 +1100
  • bea762ec04 * Update tokenization rules Matthew Honnibal 2014-11-04 01:06:00 +1100
  • b8d5881333 * Update sales copy Matthew Honnibal 2014-11-03 13:54:18 +1100
  • ae52f9f38c * Remove vocab10k from tokens Matthew Honnibal 2014-11-03 00:23:20 +1100
  • 11915e5238 * Update tests Matthew Honnibal 2014-11-03 00:23:04 +1100
  • 75329e9ef8 * Add Co. abbreviation to tokenization rules Matthew Honnibal 2014-11-03 00:16:20 +1100
  • 32fb50dc35 * Remove non_sparse method --- features wanting this can do it easily enough. Matthew Honnibal 2014-11-03 00:15:47 +1100
  • b5ae1471db * Fiddle with POS tag features Matthew Honnibal 2014-11-03 00:15:03 +1100
  • 70ea862703 * Remove vocab10k field, and add flags for gazetteers Matthew Honnibal 2014-11-03 00:13:51 +1100