Commit Graph

68 Commits

Author SHA1 Message Date
Matthew Honnibal
07c09a0e1b * Fix attribute getters and setters in Lexeme 2015-09-09 14:29:22 +02:00
Matthew Honnibal
86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal
d2fc104a26 * Begin merge of Gazetteer and DE branches 2015-09-06 19:45:15 +02:00
Matthew Honnibal
7cc56ada6e * Temporarily add py_set_flag attribute in Lexeme 2015-09-06 17:52:51 +02:00
Matthew Honnibal
3acf60df06 * Add missing properties in Lexeme class 2015-08-26 19:16:28 +02:00
Matthew Honnibal
6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal
cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal
8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal
6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal
3c270fc8ff * Remove has_sense method from Lexeme 2015-07-08 19:28:29 +02:00
Matthew Honnibal
b64c843861 * Remove senses attr 2015-07-08 19:26:24 +02:00
Matthew Honnibal
e23d1582a2 * Add supersense data to Lexeme objects. Add simple has_sense method to check the flag. 2015-07-01 18:50:37 +02:00
Jordan Suchow
3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal
51b618d646 * Add a has_repvec property to Lexeme, and a check function to check flags 2015-02-07 08:42:44 -05:00
Matthew Honnibal
6d1c08dafd * Add docstring to Lexeme 2015-01-24 20:48:34 +11:00
Matthew Honnibal
fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00
Matthew Honnibal
5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal
5e63c606ad * Rename vec to repvec 2015-01-22 02:03:54 +11:00
Matthew Honnibal
6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal
7d3c40de7d * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme 2015-01-15 00:33:16 +11:00
Matthew Honnibal
0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal
46da3d74d2 * Tmp. Refactoring, introducing a Lexeme PyObject. 2015-01-12 11:23:44 +11:00
Matthew Honnibal
ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal
90c143bd85 * Fix orth import 2015-01-05 18:49:19 +11:00
Matthew Honnibal
4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal
ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules 2014-12-07 23:52:41 +11:00
Matthew Honnibal
75b8dfb348 * Remove upper_pc from lexeme.pyx 2014-12-04 22:14:34 +11:00
Matthew Honnibal
e1b1f45cc9 * Add STEM attribute to lexeme 2014-12-04 20:46:20 +11:00
Matthew Honnibal
d70d31aa45 * Introduce first attempt at const-ness 2014-12-03 15:44:25 +11:00
Matthew Honnibal
b463a7eb86 * Make flag-setting a language-specific thing 2014-12-03 11:04:32 +11:00
Matthew Honnibal
70ea862703 * Remove vocab10k field, and add flags for gazetteers 2014-11-03 00:13:51 +11:00
Matthew Honnibal
8335706321 * Add LIKE_URL and LIKE_NUMBER flag features 2014-11-02 13:19:23 +11:00
Matthew Honnibal
6c807aa45f * Restore id attribute to lexeme, and rename pos field to postype, to store clustered tag dictionaries 2014-10-31 17:43:00 +11:00
Matthew Honnibal
c6fcd03692 * Small efficiency tweak to lexeme init 2014-10-30 17:56:11 +11:00
Matthew Honnibal
87c2418a89 * Fiddle with data types on Lexeme, to compress them to a much smaller size. 2014-10-30 15:42:15 +11:00
Matthew Honnibal
e6b87766fe * Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme 2014-10-30 15:21:38 +11:00
Matthew Honnibal
67c8c8019f * Update lexeme serialization, using a binary file format 2014-10-30 01:01:00 +11:00
Matthew Honnibal
13909a2e24 * Rewriting Lexeme serialization. 2014-10-29 23:19:38 +11:00
Matthew Honnibal
08ce602243 * Large refactor, particularly to Python API 2014-10-24 00:59:17 +11:00
Matthew Honnibal
e5e951ae67 * Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding. 2014-10-23 01:57:59 +11:00
Matthew Honnibal
0a0e41f6c8 * Add prefix and suffix features 2014-10-22 12:56:09 +11:00
Matthew Honnibal
65d3ead4fd * Rename LexStr_casefix to LexStr_norm and LexInt_i to LexInt_id 2014-10-14 15:19:07 +11:00
Matthew Honnibal
71ee921055 * Slight cleaning of tokenizer code 2014-10-10 19:17:22 +11:00
Matthew Honnibal
59b41a9fd3 * Switch to new data model, tests passing 2014-10-10 08:11:31 +11:00
Matthew Honnibal
1b0e01d3d8 * Revising data model of lexeme. Compiles. 2014-10-09 19:53:30 +11:00
Matthew Honnibal
51d75b244b * Add serialize/deserialize functions for lexeme, transport to/from python dict. 2014-10-09 14:10:46 +11:00
Matthew Honnibal
d73d89a2de * Add i attribute to lexeme, giving lexemes sequential IDs. 2014-10-09 13:50:05 +11:00
Matthew Honnibal
ac522e2553 * Switch from own memory class to cymem, in pip 2014-09-17 23:09:24 +02:00
Matthew Honnibal
6266cac593 * Switch to using a Python ref counted gateway to malloc/free, to prevent memory leaks 2014-09-17 20:02:26 +02:00
Matthew Honnibal
c396581a0b * Fiddle with the way strings are interned in lexeme 2014-09-15 06:34:45 +02:00