Commit Graph

113 Commits

Author SHA1 Message Date
Matthew Honnibal
6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
Matthew Honnibal
ff9ff6f3fa * Ensure unseen words are given low log probability 2015-07-12 01:31:09 +02:00
Matthew Honnibal
89a91ad726 * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity 2015-07-09 13:30:41 +02:00
Matthew Honnibal
6ddb2f5e45 * Restore merge_mwe in English class 2015-07-08 19:35:30 +02:00
Matthew Honnibal
6859f6adac * Restore merge_mwe in English class 2015-07-08 19:34:55 +02:00
Matthew Honnibal
e3c53f5ecd * Fix mention of Tokens in docstring 2015-07-08 18:56:27 +02:00
Matthew Honnibal
bb522496dd * Rename Tokens to Doc 2015-07-08 18:53:00 +02:00
Matthew Honnibal
4e4fac452b * Refactor __init__ for simplicity. Allow parse=True, tag=True etc flags to be passed at top-level. Do not lazy-load parser. 2015-07-08 12:35:29 +02:00
Matthew Honnibal
1d2deb4616 * Work on refactoring default arguments to English.__init__ 2015-07-07 15:53:25 +02:00
Matthew Honnibal
6788c86b2f * Begin refactor 2015-07-07 14:00:07 +02:00
Matthew Honnibal
9af86b0b0b * Fix attrs.pxd 2015-06-30 18:16:30 +02:00
Matthew Honnibal
5d595b5a8c * Inc versions 2015-06-30 18:11:06 +02:00
Matthew Honnibal
d2eeba6667 * Start wiring up color and emotion lexicons. Hopefully we get to use them. 2015-06-30 16:22:23 +02:00
Matthew Honnibal
b266a63f2c * Inc version of downloadble data 2015-06-24 04:53:08 +02:00
Matthew Honnibal
7d265a9c62 * Revert to wget in spacy.en.download 2015-06-08 00:48:56 +02:00
Matthew Honnibal
1515862861 * Fix download.py 2015-06-08 00:08:05 +02:00
Matthew Honnibal
7e9e8f654a * Use urllib in spacy.en.download 2015-06-07 23:51:38 +02:00
Matthew Honnibal
80cff41a9c * Upd download.py 2015-06-07 19:13:28 +02:00
Matthew Honnibal
58d5ac0944 * Add beam search capabilities to Parser. Rename GreedyParser to Parser. 2015-06-02 00:28:02 +02:00
Matthew Honnibal
62424e6c76 * Remove unused regularize argument from _ml.Model 2015-06-02 00:27:07 +02:00
Matthew Honnibal
04bda8648d * Pass parameter for regularization to model 2015-05-27 03:16:58 +02:00
Matthew Honnibal
eba7b34f66 * Add flag to disable loading of word vectors 2015-05-25 01:02:42 +02:00
Matthew Honnibal
03ebf70a66 * Inc version to 0.84 2015-05-12 02:38:51 +02:00
Matthew Honnibal
fb8d50b3d5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-04-30 12:45:15 +02:00
Matthew Honnibal
378c2a6435 * Fix POS model: make it use tag instead of pos in history features 2015-04-29 00:02:53 +02:00
Jordan Suchow
3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal
cc4e395927 * Add some ad hoc regexes, for multi-word location prepositions 2015-04-17 04:44:24 +02:00
Matthew Honnibal
684d0e5e85 * Download updated data 2015-04-16 04:29:15 +02:00
Matthew Honnibal
42617548af * Disable merge_mwes by default 2015-04-16 04:20:31 +02:00
Matthew Honnibal
77d0700caf * Add on X way regexes 2015-04-16 01:35:46 +02:00
Matthew Honnibal
c6707778dd * Fix Issue #51: Handle non-ascii lemmas correctly 2015-04-13 22:28:59 +02:00
Matthew Honnibal
761a19113a * Fix /tmp moving thing in download.py 2015-04-12 07:04:10 +02:00
Matthew Honnibal
b64b2bd910 * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. 2015-04-07 06:00:30 +02:00
Matthew Honnibal
b8d34531c4 * Add support for units to English.__init__, by loading and applying regular expressions 2015-04-07 04:02:32 +02:00
Matthew Honnibal
2fee67cfa3 * Add regular expressions for English multi-word expressions 2015-04-07 03:45:18 +02:00
Matthew Honnibal
567388e38d * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags 2015-03-26 16:44:45 +01:00
Matthew Honnibal
801bf14f4f * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. 2015-03-26 16:44:45 +01:00
Matthew Honnibal
f21ab2d7fb * Fix bug in ugly ent_strings hack on English class 2015-03-26 16:44:45 +01:00
Matthew Honnibal
8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal
220ce8bfed * Prepare English class for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal
179b7eb0a7 * Specify parser transition system in language 2015-03-26 16:44:43 +01:00
Matthew Honnibal
8cc3524dc9 * Ws 2015-03-26 16:44:41 +01:00
Matthew Honnibal
2e8d0e5d45 * Upd download script 2015-03-03 05:47:16 -05:00
Matthew Honnibal
caf046b220 * Hastily add method to apply tags from a list of strings, instead of predicting the tags. 2015-02-23 15:40:17 -05:00
Matthew Honnibal
64645a1c2f * Improve docstring on English 2015-02-11 15:13:20 -05:00
Matthew Honnibal
594e50bd45 * Add option to download speech-parsing data set. 2015-02-11 14:20:29 -05:00
Matthew Honnibal
0b7e769211 * Add POS tags to support SWBD tag set 2015-02-11 14:08:28 -05:00
Matthew Honnibal
312b3a45f3 * Fix issue #19: Allow parsing/pos tagging of empty strings 2015-02-10 10:15:58 -05:00
Matthew Honnibal
2a0615104b * Upd download script 2015-02-09 10:22:59 -05:00
Matthew Honnibal
5c3513583d * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. 2015-02-09 03:57:10 -05:00