Commit Graph

78 Commits

Author SHA1 Message Date
Matthew Honnibal
567388e38d * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags 2015-03-26 16:44:45 +01:00
Matthew Honnibal
801bf14f4f * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. 2015-03-26 16:44:45 +01:00
Matthew Honnibal
f21ab2d7fb * Fix bug in ugly ent_strings hack on English class 2015-03-26 16:44:45 +01:00
Matthew Honnibal
8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal
220ce8bfed * Prepare English class for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal
179b7eb0a7 * Specify parser transition system in language 2015-03-26 16:44:43 +01:00
Matthew Honnibal
8cc3524dc9 * Ws 2015-03-26 16:44:41 +01:00
Matthew Honnibal
2e8d0e5d45 * Upd download script 2015-03-03 05:47:16 -05:00
Matthew Honnibal
caf046b220 * Hastily add method to apply tags from a list of strings, instead of predicting the tags. 2015-02-23 15:40:17 -05:00
Matthew Honnibal
64645a1c2f * Improve docstring on English 2015-02-11 15:13:20 -05:00
Matthew Honnibal
594e50bd45 * Add option to download speech-parsing data set. 2015-02-11 14:20:29 -05:00
Matthew Honnibal
0b7e769211 * Add POS tags to support SWBD tag set 2015-02-11 14:08:28 -05:00
Matthew Honnibal
312b3a45f3 * Fix issue #19: Allow parsing/pos tagging of empty strings 2015-02-10 10:15:58 -05:00
Matthew Honnibal
2a0615104b * Upd download script 2015-02-09 10:22:59 -05:00
Matthew Honnibal
5c3513583d * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. 2015-02-09 03:57:10 -05:00
Matthew Honnibal
be5536d239 * Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON. 2015-02-08 18:36:18 -05:00
Matthew Honnibal
44c7eafe44 * Fix download.py 2015-02-07 12:00:36 -05:00
Matthew Honnibal
6ca7f2eedc * Upd download script 2015-02-07 11:32:33 -05:00
Matthew Honnibal
56c2ef2982 * Tweak POS features for web text 2015-02-02 11:59:36 +11:00
Matthew Honnibal
a20fdbd8ee * Upd download script 2015-02-01 13:22:23 +11:00
Matthew Honnibal
63abdf154c * Hastily hack download file 2015-01-31 22:48:32 +11:00
Matthew Honnibal
a1ed574b7b * Fix default model path for English 2015-01-31 16:38:27 +11:00
Matthew Honnibal
e013555b25 * Add option to download script 2015-01-31 13:51:56 +11:00
Matthew Honnibal
024cfd485c * Pass tag_strings as a tuple, to support new Tokens API 2015-01-31 13:43:37 +11:00
Matthew Honnibal
83a4df5a1a * Fix download script 2015-01-30 20:40:42 +11:00
Matthew Honnibal
6f9ebc2f34 * Fix download script 2015-01-30 20:33:19 +11:00
Matthew Honnibal
8b85d0bb8a * Only download small data if no data dir exists 2015-01-30 20:27:14 +11:00
Matthew Honnibal
cb95ef6934 * Fix download script 2015-01-30 19:28:43 +11:00
Matthew Honnibal
e578bd37bd * Fix download script 2015-01-30 18:59:31 +11:00
Matthew Honnibal
df52014d12 * Fix download script 2015-01-30 18:36:24 +11:00
Matthew Honnibal
998b607f65 * Upd download script, having it download all data if there's no data/ directory, allowing easier compilation from source 2015-01-30 18:04:01 +11:00
Matthew Honnibal
67d6e53a69 * Ensure parser and tagger function correctly when training from missing values, indicated by -1 2015-01-30 14:08:56 +11:00
Matthew Honnibal
c38c62d4a3 * Add docstring to English class 2015-01-27 02:45:21 +11:00
Matthew Honnibal
7f87716cf7 * Fix download script 2015-01-25 23:01:10 +11:00
Matthew Honnibal
12b034e3ef * Move POS tag definitions to parts_of_speech.pxd 2015-01-25 16:31:07 +11:00
Matthew Honnibal
7431c133d8 * Add error if try to access head and not is_parsed 2015-01-25 15:33:54 +11:00
Matthew Honnibal
951d06c824 * Silently don't parse if data is not present 2015-01-25 14:47:38 +11:00
Matthew Honnibal
4e857ab7a6 * Fix bug in POS tagger feature 2015-01-25 02:20:15 +11:00
Matthew Honnibal
dd56e298e2 * Ensure tagging is applied if parse=True 2015-01-25 02:19:44 +11:00
Matthew Honnibal
94750819cd * Set parse=True by default --- i.e. parse unless told not to. 2015-01-25 01:28:28 +11:00
Matthew Honnibal
a97bed9359 * Fix POS and dependency label tag names. Add parse and string navigation functions. 2015-01-24 17:29:04 +11:00
Matthew Honnibal
fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00
Matthew Honnibal
5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal
f2a229136c * Fix data_dir=None argument to English class 2015-01-21 18:27:31 +11:00
Matthew Honnibal
ef49b8c179 * Add stop-word flag 2015-01-21 18:22:31 +11:00
Matthew Honnibal
6646bfc5df * Add LOWER attr 2015-01-21 18:19:08 +11:00
Matthew Honnibal
6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal
7d3c40de7d * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme 2015-01-15 00:33:16 +11:00
Matthew Honnibal
0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal
46da3d74d2 * Tmp. Refactoring, introducing a Lexeme PyObject. 2015-01-12 11:23:44 +11:00