Commit Graph

1720 Commits

Author SHA1 Message Date
Matthew Honnibal
b96bf9b8cc Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-07-27 22:57:48 +02:00
Matthew Honnibal
aa7a964a4f * Add a type declaration for doc.from_array 2015-07-27 22:57:22 +02:00
Matthew Honnibal
9034f8a1cf * Update test_docs 2015-07-27 22:15:19 +02:00
Matthew Honnibal
25a8774f42 * Fix regression in packer 2015-07-27 21:53:38 +02:00
Matthew Honnibal
174ed1ad20 * Tighten the frequency filter in init_model 2015-07-27 21:44:51 +02:00
Matthew Honnibal
1601e488ee * Fix bug in decoding non-ascii characters 2015-07-27 21:43:58 +02:00
Matthew Honnibal
6deb1e84b6 * Upd serialization tests 2015-07-27 21:25:48 +02:00
Matthew Honnibal
6a95409cd2 * Fix type on bits 2015-07-27 21:16:49 +02:00
Matthew Honnibal
a296d72b54 * Fix en/attrs 2015-07-27 21:16:33 +02:00
Matthew Honnibal
45460f505c * Fix data type on read32 in BitArray 2015-07-27 21:12:13 +02:00
Matthew Honnibal
3d43f49f69 * Revert prev change 2015-07-27 10:58:15 +02:00
Matthew Honnibal
6b586cdad4 * Change lexemes.bin format. Add a header specifying size of LexemeC and number of lexemes, and don't have the redundant orth information. 2015-07-27 08:31:51 +02:00
Matthew Honnibal
6047f2aa35 * Fix path to freqs.txt 2015-07-27 02:22:35 +02:00
Matthew Honnibal
4a0f40ec2d * Ensure data is packaged in vocab 2015-07-27 02:14:36 +02:00
Matthew Honnibal
af6ed18f2a * Ensure we don't use orth_encode on OOV words. 2015-07-27 02:12:01 +02:00
Matthew Honnibal
912511f0aa * Update prebuild command, for shell bug 2015-07-27 01:52:04 +02:00
Matthew Honnibal
b532f4eaa2 * Ensure serialize is packaged. 2015-07-27 01:51:37 +02:00
Matthew Honnibal
8535d872e8 * Set is_oov property in get_flags 2015-07-27 01:51:24 +02:00
Matthew Honnibal
0f4d0d51ab * Test is_oov property 2015-07-27 01:50:34 +02:00
Matthew Honnibal
8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal
fc268f03eb * Assert against null pointer exceptions in vocab 2015-07-27 01:00:10 +02:00
Matthew Honnibal
2b5cde87fd * Add prebuild command, to test clean builds 2015-07-26 22:40:04 +02:00
Matthew Honnibal
0368889d6c * Support gzipped frequencies in init_model 2015-07-26 22:39:22 +02:00
Matthew Honnibal
62da5eb338 * Inc version 2015-07-26 22:22:54 +02:00
Matthew Honnibal
b997b1122b * Mark test_io as requiring the model 2015-07-26 21:36:22 +02:00
Matthew Honnibal
0f093fdb30 * Fix get_by_orth for py3 2015-07-26 19:26:41 +02:00
Matthew Honnibal
ceeda5a739 * Fix get_by_orth for py3 2015-07-26 18:39:27 +02:00
Matthew Honnibal
5c9b8d05e4 * Upd test_docs 2015-07-26 17:41:13 +02:00
Matthew Honnibal
609f729cc5 * Fix infix test 2015-07-26 17:32:55 +02:00
Matthew Honnibal
3cfe3d8c1c * Revert bad infix change 2015-07-26 17:32:37 +02:00
Matthew Honnibal
460b4c3207 * Add more infix tests 2015-07-26 17:30:34 +02:00
Matthew Honnibal
bd608559bc * Fix infix-period tokenization 2015-07-26 17:14:52 +02:00
Matthew Honnibal
94f314c271 * Fix tokenization of email addresses. 2015-07-26 16:38:08 +02:00
Matthew Honnibal
48a4d15264 * Test token properties 2015-07-26 16:37:39 +02:00
Matthew Honnibal
6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal
eeaea25f0c * Check oov_prob file is present 2015-07-26 16:36:38 +02:00
Matthew Honnibal
847c08e411 * Unhack serialization api tests 2015-07-26 16:23:41 +02:00
Matthew Honnibal
c4f20847da * Fix init_model for travis tests 2015-07-26 14:03:30 +02:00
Matthew Honnibal
09312b9353 * Fix init_model for travis tests 2015-07-26 13:55:47 +02:00
Matthew Honnibal
3a4c2a3276 * Update doctests 2015-07-26 13:04:18 +02:00
Matthew Honnibal
2b2032d1a0 * Update doctests 2015-07-26 12:57:59 +02:00
Matthew Honnibal
90ad717dc4 * Update default freq thresholds in init_model 2015-07-26 01:41:17 +02:00
Matthew Honnibal
6c01e01f12 * Fix some casing problems in specials.json 2015-07-26 01:38:29 +02:00
Matthew Honnibal
6a5e035a48 * Ensure data files are copied for tokenizer in init_model 2015-07-26 01:36:19 +02:00
Matthew Honnibal
ab93898ac6 * Make heuristics more explicit in init_model 2015-07-26 00:22:19 +02:00
Matthew Honnibal
7eb2446082 * Return empty lexeme on empty string 2015-07-26 00:18:30 +02:00
Matthew Honnibal
1b5d1da2a7 * Allow an OOV probability to be specified in get_lex_props 2015-07-26 00:03:43 +02:00
Matthew Honnibal
cd6e25132b * Allow an OOV probability to be specified in get_lex_props 2015-07-26 00:01:46 +02:00
Matthew Honnibal
5c04dcd7c1 * Fix init_model 2015-07-25 23:33:02 +02:00
Matthew Honnibal
fd525f0675 * Pass OOV probability around 2015-07-25 23:29:51 +02:00