Commit Graph

93 Commits

Author SHA1 Message Date
Matthew Honnibal
dbe26f5793 * Add children and subtree methods to Token, which are generators to assist parse-tree navigation. 2015-03-03 04:18:41 -05:00
Matthew Honnibal
cae077b583 * Work on fixing orphaned Token objects bug 2015-02-16 15:20:31 -05:00
Matthew Honnibal
7572e31f5e * Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive. 2015-02-11 18:05:06 -05:00
Matthew Honnibal
ab8bb047d0 * Fix negative index for __getitem__ 2015-02-07 12:58:46 -05:00
Matthew Honnibal
c7d8644149 * Fix regression on 'prob' attr of Token. 2015-02-03 03:32:18 +11:00
Matthew Honnibal
de772088e6 * Use parse tree for sbd in Tokens.sents 2015-02-02 12:17:32 +11:00
Matthew Honnibal
7de00c5a79 * Try not holding a reference to Pool, since that seems to confuse the GC 2015-01-31 22:10:22 +11:00
Matthew Honnibal
018e0bfa24 * Bug fixes to parse navigation 2015-01-31 16:37:13 +11:00
Matthew Honnibal
77d62d0179 * Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation. 2015-01-31 13:42:58 +11:00
Matthew Honnibal
251dbf24d7 * Fix unintialised variable error 2015-01-30 20:46:34 +11:00
Matthew Honnibal
1a7a1c2771 * Fix Issue #16: tokens recurse when printing 2015-01-30 19:47:50 +11:00
Matthew Honnibal
b68f563c2f * Fix Issue #14: Improve parsing API 2015-01-30 18:04:41 +11:00
Matthew Honnibal
e6c3d3471f * Tweak documentation for Tokens, and hide constructor as __cinit__ 2015-01-27 18:57:52 +11:00
Matthew Honnibal
12b034e3ef * Move POS tag definitions to parts_of_speech.pxd 2015-01-25 16:31:07 +11:00
Matthew Honnibal
7431c133d8 * Add error if try to access head and not is_parsed 2015-01-25 15:33:54 +11:00
Matthew Honnibal
a97bed9359 * Fix POS and dependency label tag names. Add parse and string navigation functions. 2015-01-24 17:29:04 +11:00
Matthew Honnibal
76cd024095 * Add whitespace property to Token 2015-01-24 07:41:21 +11:00
Matthew Honnibal
5fd72bc220 * Have 'string' refer to the whitespace-padded string 2015-01-24 07:32:38 +11:00
Matthew Honnibal
fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00
Matthew Honnibal
5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal
a27b23cc8f * Have SBD return start/end indices 2015-01-22 22:24:44 +11:00
Matthew Honnibal
9cd0b6b3e9 * Various tweaks to Tokens class 2015-01-22 02:05:37 +11:00
Matthew Honnibal
d6ac60e91c * Bug fixes to sentences method, and improved vector transport for tokens 2015-01-21 18:56:32 +11:00
Matthew Honnibal
f149259bf5 * Fix negative indices in tokens 2015-01-20 01:16:29 +11:00
Matthew Honnibal
b65b0c07bf * Messily hook up vector in tokens 2015-01-19 19:59:55 +11:00
Matthew Honnibal
6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal
802867e96a * Revise interface to Token. Strings now have attribute names like norm1_ 2015-01-15 03:51:47 +11:00
Matthew Honnibal
7d3c40de7d * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme 2015-01-15 00:33:16 +11:00
Matthew Honnibal
0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal
46da3d74d2 * Tmp. Refactoring, introducing a Lexeme PyObject. 2015-01-12 11:23:44 +11:00
Matthew Honnibal
ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal
3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00
Matthew Honnibal
c1ef3febee * Embedsignature in tokens.pyx 2014-12-30 21:22:00 +11:00
Matthew Honnibal
fe2a5e0370 * Work on docstrings 2014-12-27 21:46:04 +11:00
Matthew Honnibal
bb80937544 * Upd docstrings 2014-12-27 18:45:16 +11:00
Matthew Honnibal
b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal
ab61673edd * Fix api of array method 2014-12-23 15:18:48 +11:00
Matthew Honnibal
73f200436f * Tests passing except for morphology/lemmatization stuff 2014-12-23 11:40:32 +11:00
Matthew Honnibal
4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal
4c6ce7ee84 * Update tokens.pyx as part of reorg 2014-12-20 07:03:26 +11:00
Matthew Honnibal
9d3ca13909 * Start work on parse-tree iteration classes 2014-12-20 03:48:10 +11:00
Matthew Honnibal
87e9487d76 * Work on parser 2014-12-17 21:10:12 +11:00
Matthew Honnibal
9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. 2014-12-10 08:09:32 +11:00
Matthew Honnibal
accdbe989b * Remove Tokens.extend method 2014-12-09 17:09:23 +11:00
Matthew Honnibal
495e1c7366 * Use fused type in Tokens.push_back, simplifying the use of the cache 2014-12-09 16:50:01 +11:00
Matthew Honnibal
99bbbb6feb * Work on morphological processing 2014-12-08 21:12:15 +11:00
Matthew Honnibal
ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules 2014-12-07 23:52:41 +11:00
Matthew Honnibal
9f17467c2e * Fix EMPTY_TOKEN 2014-12-07 22:07:41 +11:00
Matthew Honnibal
e27b912ef9 * Remove need for confusing _data pointer to be stored on Tokens 2014-12-05 16:31:30 +11:00
Matthew Honnibal
1c9253701d * Introduce a TokenC struct, to handle token indices, pos tags and sense tags 2014-12-05 15:56:14 +11:00