Matthew Honnibal
10877a7791
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
2016-01-30 14:31:36 +01:00
Matthew Honnibal
b0718b6ee1
* Move to thinc 5.0
2016-01-29 03:58:55 +01:00
Henning Peters
235f094534
untangle data_path/via
2016-01-16 12:23:45 +01:00
Henning Peters
846fa49b2a
distinct load() and from_package() methods
2016-01-16 10:00:57 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Henning Peters
8359bd4d93
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
2015-12-18 09:52:55 +01:00
Henning Peters
345dda6f53
small fixes, add package build step
2015-12-07 06:50:26 +01:00
Henning Peters
9027cef3bc
access model via sputnik
2015-12-07 06:01:28 +01:00
Matthew Honnibal
6f47074214
* Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling.
2015-11-07 18:25:17 +11:00
Matthew Honnibal
3c162dcac3
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
2015-11-07 03:24:30 +11:00
Matthew Honnibal
b9991fbd20
* Update to use thinc 3.0
2015-11-06 00:25:59 +11:00
Matthew Honnibal
68f479e821
* Rename Doc.data to Doc.c
2015-11-04 00:15:14 +11:00
Matthew Honnibal
20fd36a0f7
* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.
2015-10-13 13:44:41 +11:00
Matthew Honnibal
534e3dda3c
* More work on language independent parsing
2015-08-28 03:44:54 +02:00
Matthew Honnibal
c2307fa9ee
* More work on language-generic parsing
2015-08-28 02:02:33 +02:00
Matthew Honnibal
5b89e2454c
* Improve error-reporting in tagger
2015-08-27 10:26:36 +02:00
Matthew Honnibal
0af139e183
* Tagger training now working. Still need to test load/save of model. Morphology still broken.
2015-08-27 09:16:11 +02:00
Matthew Honnibal
b4faf551f5
* Refactor language-independent tagger class
2015-08-26 19:19:21 +02:00
Matthew Honnibal
5dd76be446
* Split EnPosTagger up into base class and subclass
2015-08-24 05:25:55 +02:00
Matthew Honnibal
aac5028b6e
* Move tagger to _ml
2014-12-30 21:21:38 +11:00
Matthew Honnibal
73f200436f
* Tests passing except for morphology/lemmatization stuff
2014-12-23 11:40:32 +11:00
Matthew Honnibal
cf8d26c3d2
* POS tagger training working after reorg
2014-12-22 08:54:47 +11:00
Matthew Honnibal
1879abd16a
* Set const-correctness for tagger
2014-12-18 20:41:52 +11:00
Matthew Honnibal
a432862fde
* Add exception type to _arg_max_among in tagger
2014-12-16 09:44:19 +11:00
Matthew Honnibal
42973c4b37
* Improve efficiency of tagger, and improve morphological processing
2014-12-10 01:02:04 +11:00
Matthew Honnibal
6b34a2f34b
* Move morphological analysis into its own module, morphology.pyx
2014-12-09 21:16:17 +11:00
Matthew Honnibal
99bbbb6feb
* Work on morphological processing
2014-12-08 21:12:15 +11:00
Matthew Honnibal
c20dd79748
* Fiddle with const correctness and comments
2014-12-08 00:03:55 +11:00
Matthew Honnibal
ef4398b204
* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules
2014-12-07 23:52:41 +11:00
Matthew Honnibal
327383e38a
* Remove unused code in tagger.pyx
2014-12-07 22:16:17 +11:00
Matthew Honnibal
3819a88e1b
* Add support for tag dictionary, and fix error-code for predict method
2014-12-07 22:07:16 +11:00
Matthew Honnibal
0c7aeb9de7
* Begin revising tagger, focussing on POS tagging
2014-12-07 15:29:04 +11:00
Matthew Honnibal
f307eb2e36
* Refactor context extraction, and start breaking out gold standards into their own functions
2014-11-09 15:43:07 +11:00
Matthew Honnibal
602f993af9
* Moving tagger to accept multiple correct answers
2014-11-09 15:18:33 +11:00
Matthew Honnibal
949a6245f9
* Increase default number of iterations from 5 to 10
2014-11-07 04:42:04 +11:00
Matthew Honnibal
4ecbe8c893
* Complete refactor of Tagger features, to use a generic list of context names.
2014-11-05 20:45:29 +11:00
Matthew Honnibal
3733444101
* Generalize tagger code, in preparation for NER and supersense tagging.
2014-11-05 03:42:14 +11:00
Matthew Honnibal
abbe3e44b0
* Move spacy.pos tagger to spacy.tagger, and generalize it so that it can take on other tagging tasks, given a different set of feature templates.
2014-11-05 00:37:59 +11:00