Commit Graph

29 Commits

Author SHA1 Message Date
Matthew Honnibal
4ecbe8c893 * Complete refactor of Tagger features, to use a generic list of context names. 2014-11-05 20:45:29 +11:00
Matthew Honnibal
3733444101 * Generalize tagger code, in preparation for NER and supersense tagging. 2014-11-05 03:42:14 +11:00
Matthew Honnibal
ae52f9f38c * Remove vocab10k from tokens 2014-11-03 00:23:20 +11:00
Matthew Honnibal
b186a66bae * Rename Token.lex_pos to Token.postype, and Token.lex_supersense to Token.sensetype 2014-10-31 17:44:39 +11:00
Matthew Honnibal
e6b87766fe * Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme 2014-10-30 15:21:38 +11:00
Matthew Honnibal
13909a2e24 * Rewriting Lexeme serialization. 2014-10-29 23:19:38 +11:00
Matthew Honnibal
08ce602243 * Large refactor, particularly to Python API 2014-10-24 00:59:17 +11:00
Matthew Honnibal
e5e951ae67 * Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding. 2014-10-23 01:57:59 +11:00
Matthew Honnibal
7018b53d3a * Improve array features in tokens 2014-10-22 12:55:42 +11:00
Matthew Honnibal
43743a5d63 * Work on efficiency 2014-10-14 18:22:41 +11:00
Matthew Honnibal
6fb42c4919 * Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang 2014-10-14 16:17:45 +11:00
Matthew Honnibal
2805068ca8 * Have tokens track tuples that record the start offset and pos tag as well as a lexeme pointer 2014-10-14 15:21:03 +11:00
Matthew Honnibal
71ee921055 * Slight cleaning of tokenizer code 2014-10-10 19:17:22 +11:00
Matthew Honnibal
59b41a9fd3 * Switch to new data model, tests passing 2014-10-10 08:11:31 +11:00
Matthew Honnibal
08cef75ffd * Switch to using a heap-allocated vector in tokens 2014-09-15 03:46:14 +02:00
Matthew Honnibal
f77b7098c0 * Upd Tokens to use vector, with bounds checking. 2014-09-15 03:22:40 +02:00
Matthew Honnibal
df24e3708c * Move EnglishTokens stuff to Tokens 2014-09-15 01:31:44 +02:00
Matthew Honnibal
5aa591106b * Fiddle with token features 2014-09-12 15:49:36 +02:00
Matthew Honnibal
073ee0de63 * Restore dense_hash_map for cache dictionary. Seems to double efficiency 2014-09-12 02:23:51 +02:00
Matthew Honnibal
1a3222af4b * Moving tokens to use an array internally, instead of a list of Lexeme objects. 2014-09-11 16:57:08 +02:00
Matthew Honnibal
cf412adba8 * Refactoring to use Tokens object 2014-09-10 18:11:13 +02:00
Matthew Honnibal
68bae2fec6 * More refactoring 2014-08-25 16:42:22 +02:00
Matthew Honnibal
07ecf5d2f4 * Fixed group_by, removed idea of general attr_of function. 2014-08-22 00:02:37 +02:00
Matthew Honnibal
a78ad4152d * Broken version being refactored for docs 2014-08-20 13:39:39 +02:00
Matthew Honnibal
01469b0888 * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. 2014-08-18 19:14:00 +02:00
Matthew Honnibal
a895fe5ddb * Upd from spacy 2014-07-23 17:35:18 +01:00
Matthew Honnibal
571808a274 Group-by seems to be working 2014-07-07 20:27:02 +02:00
Matthew Honnibal
057c21969b * Refactor for string view features. Working on setting up flags and enums. 2014-07-07 16:58:48 +02:00
Matthew Honnibal
f1bcbd4c4e * Reorganized code to accomodate Tokens class. Need string views before group_by and count_by can be done well. 2014-07-07 12:47:21 +02:00