Commit Graph

82 Commits

Author SHA1 Message Date
Matthew Honnibal
5ec32f5d97 Fix loading of GloVe vectors, to address Issue #541 2016-10-20 18:27:48 +02:00
Matthew Honnibal
d4aaf2752c Fix issue #535: Pipeline elements added even when data not installed. 2016-10-19 19:55:19 +02:00
Matthew Honnibal
1b651db9c5 Fix parser creation in Language class. 2016-10-18 19:36:44 +02:00
Matthew Honnibal
45a6f9b9c7 Fix loading of tagger. 2016-10-18 19:33:04 +02:00
Matthew Honnibal
7d5212f131 Refactor defaults 2016-10-18 16:18:25 +02:00
Matthew Honnibal
f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00
Matthew Honnibal
ca51f3b77e Use DependencyParser and EntityRecognizer in the Language class. 2016-10-16 17:58:12 +02:00
Matthew Honnibal
a81c5a7abf Fix name of labels keyword to 'actions'. 2016-10-16 12:00:27 +02:00
Matthew Honnibal
8a6b35d266 Delay binding in MakeDoc 2016-10-16 11:41:55 +02:00
Matthew Honnibal
08e9134760 Change default value of path to True 2016-10-15 14:12:54 +02:00
Matthew Honnibal
6d8cb515ac Break the tokenization stage out of the pipeline into a function 'make_doc'. This allows all pipeline methods to have the same signature. 2016-10-14 17:38:29 +02:00
Matthew Honnibal
41f88ce938 Fix dep model loading in parser 2016-10-12 20:26:38 +02:00
Matthew Honnibal
0e2bedc373 Fix default labels for parser and NER 2016-10-12 19:12:40 +02:00
Matthew Honnibal
847a4a4182 Refactor Language, dropping Language.blank() method. 2016-10-12 13:45:58 +02:00
Matthew Honnibal
ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Matthew Honnibal
eceeaefe53 Fix defaults for Parser and Entity, adding a blank= argument. 2016-09-30 19:56:06 +02:00
Matthew Honnibal
e382e48d9f Temporarily patch handling of defaul templates for tagger. Need to move these to language_data. 2016-09-27 13:21:28 +02:00
Matthew Honnibal
b14b9b096b Return None if /deps directory not present, instead of trying to load the parser. 2016-09-26 18:48:03 +02:00
Matthew Honnibal
0b2d7ae9d6 Fix Entity creation 2016-09-26 15:41:22 +02:00
Matthew Honnibal
2debc4e0a2 Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class. 2016-09-26 11:57:54 +02:00
Matthew Honnibal
722199acb8 Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey 2016-09-26 11:07:46 +02:00
Matthew Honnibal
7db956133e Move tokenizer data for German into spacy.de.language_data 2016-09-25 15:37:33 +02:00
Matthew Honnibal
95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal
fd58f7655a Python 3 compatible basestring 2016-09-24 22:16:43 +02:00
Matthew Honnibal
fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Matthew Honnibal
9dc8043a7e Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib. 2016-09-24 14:08:53 +02:00
Matthew Honnibal
4d7f5468bb * Change Language class to use a .pipeline attribute, instead of having the pipeline hard coded 2016-05-17 16:55:42 +02:00
Matthew Honnibal
0f957dd586 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-04-14 10:37:56 +02:00
Matthew Honnibal
61d20de35d * Fix language.py docstring 2016-04-14 10:36:57 +02:00
Henning Peters
ff690f76ba fix loading non-german models 2016-04-12 16:00:56 +02:00
Wolfgang Seeker
03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Wolfgang Seeker
bc9c62e279 replace Language functions with corresponding orth functions
implement punctuation functions in orth
2016-03-09 18:07:37 +01:00
Henning Peters
931c07a609 initial proposal for separate vector package 2016-03-04 11:09:06 +01:00
Matthew Honnibal
a95974ad3f * Fix oov probability 2016-02-06 15:13:55 +01:00
Matthew Honnibal
1ef84a0557 * Merge master into rethinc2 2016-02-05 12:55:59 +01:00
Matthew Honnibal
249dccbe95 * Fix Language.pipe 2016-02-05 12:47:57 +01:00
Matthew Honnibal
af58f273b3 * Fix spacy.language.pipe 2016-02-05 12:20:29 +01:00
Matthew Honnibal
419edfab50 * Use generic flags for the new attributes until they're added 2016-02-04 15:50:54 +01:00
Matthew Honnibal
e5c96c969f * Wire up new attributes 2016-02-04 13:04:58 +01:00
Matthew Honnibal
84b247ef83 * Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading. 2016-02-03 02:10:58 +01:00
Matthew Honnibal
fcfc17a164 Merge branch 'master' into rethinc2 2016-02-02 23:05:34 +01:00
Matthew Honnibal
59123443e2 * Check for presence/absence of the different models in Language.end_training 2016-02-02 22:49:55 +01:00
Matthew Honnibal
9e9d4c8706 * Fix stupid error in Language.batch 2016-02-01 09:49:32 +01:00
Matthew Honnibal
98fbdf2856 * Add Language.batch() method, to support multi-threaded jobs 2016-02-01 09:01:13 +01:00
Matthew Honnibal
c4a89d56bd * Automatically register any entity types pre-set on the tokens, so that the NER works with user-given entity types. 2016-01-19 20:09:26 +01:00
Matthew Honnibal
bba0a5e078 * Handle string paths in default_vocab, default_parser, default_entity in Language class 2016-01-18 22:37:24 +01:00
Henning Peters
41ea14a56f fix pickling 2016-01-16 13:23:11 +01:00
Henning Peters
235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Henning Peters
846fa49b2a distinct load() and from_package() methods 2016-01-16 10:00:57 +01:00