Commit Graph

458 Commits

Author SHA1 Message Date
Matthew Honnibal
043b758cf4 * Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference. 2015-03-26 16:44:41 +01:00
Matthew Honnibal
b139aa92ba * Start setting out how NER will be implemented in the data model 2015-03-26 16:44:41 +01:00
Matthew Honnibal
0962ffc095 * Fix issue #37: missing check_flag attribute from Token class 2015-03-26 15:06:26 +01:00
Matthew Honnibal
2e8d0e5d45 * Upd download script 2015-03-03 05:47:16 -05:00
Matthew Honnibal
dbe26f5793 * Add children and subtree methods to Token, which are generators to assist parse-tree navigation. 2015-03-03 04:18:41 -05:00
Matthew Honnibal
ea90d136e8 * Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy. 2015-02-27 03:56:10 -05:00
Matthew Honnibal
caf046b220 * Hastily add method to apply tags from a list of strings, instead of predicting the tags. 2015-02-23 15:40:17 -05:00
Matthew Honnibal
cae077b583 * Work on fixing orphaned Token objects bug 2015-02-16 15:20:31 -05:00
Matthew Honnibal
7572e31f5e * Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive. 2015-02-11 18:05:06 -05:00
Matthew Honnibal
64645a1c2f * Improve docstring on English 2015-02-11 15:13:20 -05:00
Matthew Honnibal
594e50bd45 * Add option to download speech-parsing data set. 2015-02-11 14:20:29 -05:00
Matthew Honnibal
0b7e769211 * Add POS tags to support SWBD tag set 2015-02-11 14:08:28 -05:00
Matthew Honnibal
312b3a45f3 * Fix issue #19: Allow parsing/pos tagging of empty strings 2015-02-10 10:15:58 -05:00
Matthew Honnibal
2a0615104b * Upd download script 2015-02-09 10:22:59 -05:00
Matthew Honnibal
5c3513583d * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. 2015-02-09 03:57:10 -05:00
Matthew Honnibal
be5536d239 * Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON. 2015-02-08 18:36:18 -05:00
Matthew Honnibal
0492cee8b4 * Fix Issue #24: Lemmas are empty when the L field is missing for special-cased tokens 2015-02-08 18:30:30 -05:00
Matthew Honnibal
d229fbd228 * Give better error on out-of-bounds array access 2015-02-07 12:59:12 -05:00
Matthew Honnibal
ab8bb047d0 * Fix negative index for __getitem__ 2015-02-07 12:58:46 -05:00
Matthew Honnibal
44c7eafe44 * Fix download.py 2015-02-07 12:00:36 -05:00
Matthew Honnibal
6ca7f2eedc * Upd download script 2015-02-07 11:32:33 -05:00
Matthew Honnibal
f0e0588833 * Fill L2 norm attribute on LexemeC struct 2015-02-07 08:44:42 -05:00
Matthew Honnibal
75f9b7d6bf * Add L2 norm field to LexemeC struct 2015-02-07 08:43:17 -05:00
Matthew Honnibal
51b618d646 * Add a has_repvec property to Lexeme, and a check function to check flags 2015-02-07 08:42:44 -05:00
Matthew Honnibal
321b402739 * Store the l2 norm of the word's vector 2015-02-07 08:42:16 -05:00
Matthew Honnibal
c7d8644149 * Fix regression on 'prob' attr of Token. 2015-02-03 03:32:18 +11:00
Matthew Honnibal
c55a33d045 * Catch oracle errors 2015-02-02 23:02:04 +11:00
Matthew Honnibal
de772088e6 * Use parse tree for sbd in Tokens.sents 2015-02-02 12:17:32 +11:00
Matthew Honnibal
56c2ef2982 * Tweak POS features for web text 2015-02-02 11:59:36 +11:00
Matthew Honnibal
d68678a93e * Add Exception class, OracleError 2015-02-02 11:57:32 +11:00
Matthew Honnibal
a20fdbd8ee * Upd download script 2015-02-01 13:22:23 +11:00
Matthew Honnibal
76d9394cb4 * Fix vocab.pyx for Python3 2015-02-01 13:14:04 +11:00
Matthew Honnibal
63abdf154c * Hastily hack download file 2015-01-31 22:48:32 +11:00
Matthew Honnibal
7de00c5a79 * Try not holding a reference to Pool, since that seems to confuse the GC 2015-01-31 22:10:22 +11:00
Matthew Honnibal
ce3ae8b5d9 * Fix platform-specific lexicon bug. 2015-01-31 16:38:58 +11:00
Matthew Honnibal
a1ed574b7b * Fix default model path for English 2015-01-31 16:38:27 +11:00
Matthew Honnibal
018e0bfa24 * Bug fixes to parse navigation 2015-01-31 16:37:13 +11:00
Matthew Honnibal
e013555b25 * Add option to download script 2015-01-31 13:51:56 +11:00
Matthew Honnibal
08ca5c8970 * Add sent_end flag to TokenC struct 2015-01-31 13:44:16 +11:00
Matthew Honnibal
024cfd485c * Pass tag_strings as a tuple, to support new Tokens API 2015-01-31 13:43:37 +11:00
Matthew Honnibal
77d62d0179 * Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation. 2015-01-31 13:42:58 +11:00
Matthew Honnibal
88170e6295 * Supply dep_strings as a tuple, for the changed API on Tokens 2015-01-31 13:42:09 +11:00
Matthew Honnibal
0981d68022 * Set a sent_end flag during parsing, for later use 2015-01-31 13:41:46 +11:00
Matthew Honnibal
251dbf24d7 * Fix unintialised variable error 2015-01-30 20:46:34 +11:00
Matthew Honnibal
83a4df5a1a * Fix download script 2015-01-30 20:40:42 +11:00
Matthew Honnibal
6f9ebc2f34 * Fix download script 2015-01-30 20:33:19 +11:00
Matthew Honnibal
8b85d0bb8a * Only download small data if no data dir exists 2015-01-30 20:27:14 +11:00
Matthew Honnibal
1a7a1c2771 * Fix Issue #16: tokens recurse when printing 2015-01-30 19:47:50 +11:00
Matthew Honnibal
cb95ef6934 * Fix download script 2015-01-30 19:28:43 +11:00
Matthew Honnibal
e578bd37bd * Fix download script 2015-01-30 18:59:31 +11:00