ines
|
d24589aa72
|
Clean up imports, unused code, whitespace, docstrings
|
2017-04-15 12:05:47 +02:00 |
|
Matthew Honnibal
|
8843b84bd1
|
Merge remote-tracking branch 'origin/develop-downloads'
|
2017-03-16 12:00:42 -05:00 |
|
ines
|
1101fd3855
|
Fix formatting and remove unused imports
|
2017-03-15 17:33:39 +01:00 |
|
ines
|
842782c128
|
Move fix_deprecated_glove_vectors_loading to deprecated.py
|
2017-03-15 17:33:29 +01:00 |
|
Matthew Honnibal
|
8dbff4f5f4
|
Wire up English lemma and morph rules.
|
2017-03-15 09:23:22 -05:00 |
|
Matthew Honnibal
|
f70be44746
|
Use lemmatizer in code, not from downloaded model.
|
2017-03-15 04:52:50 -05:00 |
|
ines
|
0957737ee8
|
Add Python-formatted lemmatizer data and rules
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
66c1f194f9
|
Use consistent unicode declarations
|
2017-03-12 13:07:28 +01:00 |
|
Matthew Honnibal
|
d108534dc2
|
Fix 2/3 problems for training
|
2017-03-08 01:37:52 +01:00 |
|
JM
|
70ff0639b5
|
Fixed missing vec_path declaration that was failing if 'add_vectors' was set
Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides.
|
2016-12-20 18:21:05 +01:00 |
|
Matthew Honnibal
|
13a0b31279
|
Another tweak to GloVe path hackery.
|
2016-12-18 23:12:49 +01:00 |
|
Matthew Honnibal
|
2c6228565e
|
Fix vector loading re glove hack
|
2016-12-18 23:06:44 +01:00 |
|
Matthew Honnibal
|
618b50a064
|
Fix issue #684: GloVe vectors not loaded in spacy.en.English.
|
2016-12-18 22:46:31 +01:00 |
|
Matthew Honnibal
|
2ef9d53117
|
Untested fix for issue #684: GloVe vectors hack should be inserted in English, not in spacy.load.
|
2016-12-18 22:29:31 +01:00 |
|
Ines Montani
|
b99d683a93
|
Fix formatting
|
2016-12-18 16:58:28 +01:00 |
|
Ines Montani
|
b11d8cd3db
|
Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data
|
2016-12-18 16:57:12 +01:00 |
|
Ines Montani
|
2b2ea8ca11
|
Reorganise language data
|
2016-12-18 16:54:19 +01:00 |
|
Matthew Honnibal
|
44f4f008bd
|
Wire up lemmatizer rules for English
|
2016-12-18 15:50:09 +01:00 |
|
Ines Montani
|
5445074cbd
|
Expand tokenizer exceptions with unicode apostrophe (fixes #685)
|
2016-12-17 12:34:08 +01:00 |
|
Ines Montani
|
e0a7b5c612
|
Fix formatting
|
2016-12-17 12:33:09 +01:00 |
|
Ines Montani
|
08162dce67
|
Move shared functions and constants to global language data
|
2016-12-17 12:32:48 +01:00 |
|
Ines Montani
|
6a60a61086
|
Move update_exc to global language data utils
|
2016-12-17 12:29:02 +01:00 |
|
Ines Montani
|
487ce1e20a
|
Add encoding declaration
|
2016-12-17 12:25:44 +01:00 |
|
Ines Montani
|
311b30ab35
|
Reorganize exceptions for English and German
|
2016-12-08 13:58:32 +01:00 |
|
Matthew Honnibal
|
8c8f5c62c6
|
Add LANG attribute to English and German
|
2016-10-18 18:52:48 +02:00 |
|
Matthew Honnibal
|
ea23b64cc8
|
Refactor training, with new spacy.train module. Defaults still a little awkward.
|
2016-10-09 12:24:24 +02:00 |
|
Matthew Honnibal
|
7db956133e
|
Move tokenizer data for German into spacy.de.language_data
|
2016-09-25 15:37:33 +02:00 |
|
Matthew Honnibal
|
95aaea0d3f
|
Refactor so that the tokenizer data is read from Python data, rather than from disk
|
2016-09-25 14:49:53 +02:00 |
|
Matthew Honnibal
|
fd65cf6cbb
|
Finish refactoring data loading
|
2016-09-24 20:26:17 +02:00 |
|
Henning Peters
|
470cdf5bf9
|
remove deprecated LOCAL_DATA_DIR
|
2016-04-05 11:25:54 +02:00 |
|
Matthew Honnibal
|
445164d5b4
|
* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated
|
2016-01-19 02:54:56 +01:00 |
|
Matthew Honnibal
|
187960606f
|
* Fix pickle problems
|
2015-12-28 16:54:03 +01:00 |
|
Henning Peters
|
8359bd4d93
|
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
|
2015-12-18 09:52:55 +01:00 |
|
Henning Peters
|
9027cef3bc
|
access model via sputnik
|
2015-12-07 06:01:28 +01:00 |
|
Matthew Honnibal
|
e13e47e9e5
|
* Add English stop words
|
2015-09-14 17:48:51 +10:00 |
|
Matthew Honnibal
|
c4d8754385
|
* Specify LOCAL_DATA_DIR global in spacy.en.__init__.py
|
2015-08-26 19:15:07 +02:00 |
|
Matthew Honnibal
|
8083a07c3e
|
* Use language base class
|
2015-08-25 15:37:30 +02:00 |
|
Matthew Honnibal
|
6f1743692a
|
* Work on language-independent refactoring
|
2015-08-23 20:49:18 +02:00 |
|
Matthew Honnibal
|
cad0cca4e3
|
* Tmp
|
2015-08-22 22:04:34 +02:00 |
|
Matthew Honnibal
|
5737115e1e
|
* Work on gazetteer matching
|
2015-08-06 14:33:21 +02:00 |
|
Matthew Honnibal
|
ddc1a5cfe5
|
* Fix training under python3
|
2015-07-28 14:09:30 +02:00 |
|
Matthew Honnibal
|
8e4c69ee8c
|
* Add is_oov property, and fix up handling of attributes
|
2015-07-27 01:50:06 +02:00 |
|
Matthew Honnibal
|
eeaea25f0c
|
* Check oov_prob file is present
|
2015-07-26 16:36:38 +02:00 |
|
Matthew Honnibal
|
1b5d1da2a7
|
* Allow an OOV probability to be specified in get_lex_props
|
2015-07-26 00:03:43 +02:00 |
|
Matthew Honnibal
|
cd6e25132b
|
* Allow an OOV probability to be specified in get_lex_props
|
2015-07-26 00:01:46 +02:00 |
|
Matthew Honnibal
|
5b41744270
|
* Check for directory presence before loading annotators
|
2015-07-23 09:27:37 +02:00 |
|
Matthew Honnibal
|
680bb47b55
|
* Write serializer freqs to single file, vocab/serializer.json
|
2015-07-23 01:15:25 +02:00 |
|
Matthew Honnibal
|
c86dbe4944
|
* Update English.save_models for new Packer save/load stuff
|
2015-07-22 13:40:23 +02:00 |
|
Matthew Honnibal
|
317cbbc015
|
* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
|
2015-07-19 15:18:17 +02:00 |
|
Matthew Honnibal
|
db9dfd2e23
|
* Major refactor of serialization. Nearly complete now.
|
2015-07-17 01:27:54 +02:00 |
|