Matthew Honnibal
e30ffdd003
Use ftrl optimizer in tagger
2017-03-11 06:59:13 -06:00
Matthew Honnibal
798450136d
Set L1 penalty to 0 in tagger.
2017-03-09 18:43:47 -06:00
Roman Inflianskas
66e1109b53
Add support for Universal Dependencies v2.0
2017-03-03 13:17:34 +01:00
Matthew Honnibal
97a1286129
Revert changes to tagger and parser for thinc 6
2017-01-09 10:08:34 -06:00
Matthew Honnibal
af81ac8bb0
Use thinc 6.0
2016-12-29 11:58:42 +01:00
Yanhao
762169da29
Fixed bug: eg.guess is a tag id, rather than tag
2016-11-15 14:11:22 +08:00
Matthew Honnibal
1fb09c3dc1
Fix morphology tagger
2016-11-04 19:19:09 +01:00
Matthew Honnibal
b86f8af0c1
Fix doc strings
2016-11-01 12:25:36 +01:00
Matthew Honnibal
6eb73a095f
Fix JSON in tagger
2016-10-21 01:44:10 +02:00
Matthew Honnibal
517f090cbf
Use GoldParse in tagger.update
2016-10-17 00:55:15 +02:00
Matthew Honnibal
f787cd29fe
Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.
2016-10-16 21:34:57 +02:00
Matthew Honnibal
195d998a12
Fix GoldParse argument to tagger.update
2016-10-16 17:05:09 +02:00
Matthew Honnibal
ea23b64cc8
Refactor training, with new spacy.train module. Defaults still a little awkward.
2016-10-09 12:24:24 +02:00
Matthew Honnibal
ba51cb8325
Revert "Changes to tagger for new string store scheme"
...
This reverts commit f5a6aac906
.
2016-09-30 20:09:53 +02:00
Matthew Honnibal
f5a6aac906
Changes to tagger for new string store scheme
2016-09-30 20:01:51 +02:00
Matthew Honnibal
fd65cf6cbb
Finish refactoring data loading
2016-09-24 20:26:17 +02:00
Matthew Honnibal
83e364188c
Mostly finished loading refactoring. Design is in place, but doesn't work yet.
2016-09-24 15:42:01 +02:00
Matthew Honnibal
070af4af9d
Revert "* Working neural net, but features hacky. Switching to extractor."
...
This reverts commit 7c2f1a673b
.
2016-09-21 12:26:14 +02:00
Matthew Honnibal
7c2f1a673b
* Working neural net, but features hacky. Switching to extractor.
2016-05-26 19:06:10 +02:00
Wolfgang Seeker
690c5acabf
adjust train.py to train both english and german models
2016-03-03 15:21:00 +01:00
Wolfgang Seeker
4b2297d5d4
add class PseudoProjective for pseudo-projective parsing
...
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Wolfgang Seeker
8d531c958b
replace tests for non-projectivity
...
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Wolfgang Seeker
eae35e9b27
add tokenizer files for German, add/change code to train German pos tagger
...
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/
- init_model.py
- change doc freq threshold to 0
- add train_german_tagger.py
- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
Matthew Honnibal
8a13cebdcc
* Update for modified thinc interface
2016-02-05 11:44:39 +01:00
Matthew Honnibal
84b247ef83
* Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading.
2016-02-03 02:10:58 +01:00
Matthew Honnibal
fcfc17a164
Merge branch 'master' into rethinc2
2016-02-02 23:05:34 +01:00
Matthew Honnibal
f204daf27b
* Add error warning that a gold tag is unrecognised
2016-02-02 22:59:59 +01:00
Matthew Honnibal
10877a7791
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
2016-01-30 14:31:36 +01:00
Matthew Honnibal
b0718b6ee1
* Move to thinc 5.0
2016-01-29 03:58:55 +01:00
Henning Peters
235f094534
untangle data_path/via
2016-01-16 12:23:45 +01:00
Henning Peters
846fa49b2a
distinct load() and from_package() methods
2016-01-16 10:00:57 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Henning Peters
8359bd4d93
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
2015-12-18 09:52:55 +01:00
Henning Peters
345dda6f53
small fixes, add package build step
2015-12-07 06:50:26 +01:00
Henning Peters
9027cef3bc
access model via sputnik
2015-12-07 06:01:28 +01:00
Matthew Honnibal
6f47074214
* Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling.
2015-11-07 18:25:17 +11:00
Matthew Honnibal
3c162dcac3
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
2015-11-07 03:24:30 +11:00
Matthew Honnibal
b9991fbd20
* Update to use thinc 3.0
2015-11-06 00:25:59 +11:00
Matthew Honnibal
68f479e821
* Rename Doc.data to Doc.c
2015-11-04 00:15:14 +11:00
Matthew Honnibal
20fd36a0f7
* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.
2015-10-13 13:44:41 +11:00
Matthew Honnibal
534e3dda3c
* More work on language independent parsing
2015-08-28 03:44:54 +02:00
Matthew Honnibal
c2307fa9ee
* More work on language-generic parsing
2015-08-28 02:02:33 +02:00
Matthew Honnibal
5b89e2454c
* Improve error-reporting in tagger
2015-08-27 10:26:36 +02:00
Matthew Honnibal
0af139e183
* Tagger training now working. Still need to test load/save of model. Morphology still broken.
2015-08-27 09:16:11 +02:00
Matthew Honnibal
b4faf551f5
* Refactor language-independent tagger class
2015-08-26 19:19:21 +02:00
Matthew Honnibal
5dd76be446
* Split EnPosTagger up into base class and subclass
2015-08-24 05:25:55 +02:00
Matthew Honnibal
aac5028b6e
* Move tagger to _ml
2014-12-30 21:21:38 +11:00
Matthew Honnibal
73f200436f
* Tests passing except for morphology/lemmatization stuff
2014-12-23 11:40:32 +11:00