Commit Graph

86 Commits

Author SHA1 Message Date
ines
0957737ee8 Add Python-formatted lemmatizer data and rules 2017-03-12 13:58:22 +01:00
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Matthew Honnibal
d108534dc2 Fix 2/3 problems for training 2017-03-08 01:37:52 +01:00
JM
70ff0639b5 Fixed missing vec_path declaration that was failing if 'add_vectors' was set
Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides.
2016-12-20 18:21:05 +01:00
Matthew Honnibal
13a0b31279 Another tweak to GloVe path hackery. 2016-12-18 23:12:49 +01:00
Matthew Honnibal
2c6228565e Fix vector loading re glove hack 2016-12-18 23:06:44 +01:00
Matthew Honnibal
618b50a064 Fix issue #684: GloVe vectors not loaded in spacy.en.English. 2016-12-18 22:46:31 +01:00
Matthew Honnibal
2ef9d53117 Untested fix for issue #684: GloVe vectors hack should be inserted in English, not in spacy.load. 2016-12-18 22:29:31 +01:00
Ines Montani
b99d683a93 Fix formatting 2016-12-18 16:58:28 +01:00
Ines Montani
b11d8cd3db Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data 2016-12-18 16:57:12 +01:00
Ines Montani
2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00
Matthew Honnibal
44f4f008bd Wire up lemmatizer rules for English 2016-12-18 15:50:09 +01:00
Ines Montani
5445074cbd Expand tokenizer exceptions with unicode apostrophe (fixes #685) 2016-12-17 12:34:08 +01:00
Ines Montani
e0a7b5c612 Fix formatting 2016-12-17 12:33:09 +01:00
Ines Montani
08162dce67 Move shared functions and constants to global language data 2016-12-17 12:32:48 +01:00
Ines Montani
6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani
487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani
311b30ab35 Reorganize exceptions for English and German 2016-12-08 13:58:32 +01:00
Matthew Honnibal
8c8f5c62c6 Add LANG attribute to English and German 2016-10-18 18:52:48 +02:00
Matthew Honnibal
ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal
7db956133e Move tokenizer data for German into spacy.de.language_data 2016-09-25 15:37:33 +02:00
Matthew Honnibal
95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal
fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Henning Peters
470cdf5bf9 remove deprecated LOCAL_DATA_DIR 2016-04-05 11:25:54 +02:00
Matthew Honnibal
445164d5b4 * Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated 2016-01-19 02:54:56 +01:00
Matthew Honnibal
187960606f * Fix pickle problems 2015-12-28 16:54:03 +01:00
Henning Peters
8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00
Henning Peters
9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Matthew Honnibal
e13e47e9e5 * Add English stop words 2015-09-14 17:48:51 +10:00
Matthew Honnibal
c4d8754385 * Specify LOCAL_DATA_DIR global in spacy.en.__init__.py 2015-08-26 19:15:07 +02:00
Matthew Honnibal
8083a07c3e * Use language base class 2015-08-25 15:37:30 +02:00
Matthew Honnibal
6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal
cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal
5737115e1e * Work on gazetteer matching 2015-08-06 14:33:21 +02:00
Matthew Honnibal
ddc1a5cfe5 * Fix training under python3 2015-07-28 14:09:30 +02:00
Matthew Honnibal
8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal
eeaea25f0c * Check oov_prob file is present 2015-07-26 16:36:38 +02:00
Matthew Honnibal
1b5d1da2a7 * Allow an OOV probability to be specified in get_lex_props 2015-07-26 00:03:43 +02:00
Matthew Honnibal
cd6e25132b * Allow an OOV probability to be specified in get_lex_props 2015-07-26 00:01:46 +02:00
Matthew Honnibal
5b41744270 * Check for directory presence before loading annotators 2015-07-23 09:27:37 +02:00
Matthew Honnibal
680bb47b55 * Write serializer freqs to single file, vocab/serializer.json 2015-07-23 01:15:25 +02:00
Matthew Honnibal
c86dbe4944 * Update English.save_models for new Packer save/load stuff 2015-07-22 13:40:23 +02:00
Matthew Honnibal
317cbbc015 * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. 2015-07-19 15:18:17 +02:00
Matthew Honnibal
db9dfd2e23 * Major refactor of serialization. Nearly complete now. 2015-07-17 01:27:54 +02:00
Matthew Honnibal
897de2d438 * Add 'bitter' property for serializer in English class 2015-07-16 17:47:53 +02:00
Matthew Honnibal
ff9ff6f3fa * Ensure unseen words are given low log probability 2015-07-12 01:31:09 +02:00
Matthew Honnibal
6ddb2f5e45 * Restore merge_mwe in English class 2015-07-08 19:35:30 +02:00
Matthew Honnibal
6859f6adac * Restore merge_mwe in English class 2015-07-08 19:34:55 +02:00
Matthew Honnibal
e3c53f5ecd * Fix mention of Tokens in docstring 2015-07-08 18:56:27 +02:00
Matthew Honnibal
bb522496dd * Rename Tokens to Doc 2015-07-08 18:53:00 +02:00