spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-10 19:57:17 +03:00

Author	SHA1	Message	Date
ines	8bc05c2ba9	Delete old training scripts (resolves #911 )	2017-03-23 11:07:59 +01:00
Raphaël Bournhonesque	08346dba1a	Use specific language class instead of base Language class	2017-03-21 23:18:54 +01:00
Raphaël Bournhonesque	7568cd6bf8	Split CONLLX file using tabs and not default split separators	2017-03-21 23:00:13 +01:00
Matthew Honnibal	ef6bd08e6c	Update train_ud for Universal Dependencies 2	2017-03-16 17:08:15 -05:00
Matthew Honnibal	a155482fda	Improve printing in train_ud script	2017-03-11 11:11:05 -06:00
Matthew Honnibal	35124b144a	Add L1 penalty option to parser	2017-03-09 18:44:53 -06:00
Matthew Honnibal	04a51dab62	Print active parser features during training	2017-03-08 01:37:19 +01:00
Matthew Honnibal	c744ce4b6d	Fix bad change to cythonize.py script, re subprocess call	2017-02-16 19:01:25 +01:00
Matthew Honnibal	071d11cb35	Pass environment to Cythonize script. Closes #791	2017-02-17 01:04:16 +11:00
Matthew Honnibal	4ff92184f1	Improve train_ud script	2017-01-09 09:53:46 -06:00
Matthew Honnibal	c1ef07788c	Update train_ud.py Create deps folder if it doesn't exist.	2017-01-09 10:55:44 +11:00
Matthew Honnibal	46e98ec029	Move init_model.py script from repo. These meta-tools should live elsewhere	2016-12-18 14:03:40 +01:00
dafnevk	cdf5dcc40a	fixed bug in init_model so that it runs for dutch	2016-12-13 14:33:44 +01:00
Matthew Honnibal	c7889492f9	Fix model saving error for Python 3	2016-11-25 18:04:30 -06:00
Matthew Honnibal	22189e60db	Use unicode literals in train_ud	2016-11-25 17:45:45 -06:00
Matthew Honnibal	da5f0cce36	Fix train_ud script, which trains models from the Universal Dependencies format.	2016-11-25 11:19:33 -06:00
Matthew Honnibal	314bc8d34f	Fix train script for 1.0	2016-11-25 08:57:37 -06:00
Matthew Honnibal	bd1bfcca61	Update train.py	2016-10-13 03:23:48 +02:00
Matthew Honnibal	ea23b64cc8	Refactor training, with new spacy.train module. Defaults still a little awkward.	2016-10-09 12:24:24 +02:00
Matthew Honnibal	53fbd3dd1c	Fix train.py for v1.0.0-rc1	2016-10-05 01:11:46 +02:00
Matthew Honnibal	ae202e7a60	Fix init_model.py	2016-09-25 15:58:51 +02:00
Matthew Honnibal	af847e07fc	Fix usage of pathlib for Python3 -- turning paths to strings.	2016-09-24 21:05:27 +02:00
Matthew Honnibal	d310dc73ef	Fix bin/init_model.py after refactoring	2016-09-24 20:38:18 +02:00
Matthew Honnibal	8036368d96	* Fix model saving	2016-05-23 12:01:46 +00:00
Matthew Honnibal	35214053fd	* Work around get_lex_attr bug introduced during German parsing	2016-05-23 10:53:00 +00:00
Wolfgang Seeker	dae6bc05eb	define German dummy lemmatizer until morphology is done	2016-05-02 16:04:53 +02:00
Matthew Honnibal	8569dbc2d0	* Add initial stuff for Chinese parsing	2016-04-24 18:44:24 +02:00
Wolfgang Seeker	f9150ccf2a	rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip	2016-04-08 13:38:07 +02:00
Wolfgang Seeker	a8f4e49900	update init_model.py to previous (better) state	2016-03-29 16:12:13 +02:00
Matthew Honnibal	d249e2f7f3	* Improve error message in bin/parser/train.py	2016-03-29 13:04:33 +11:00
Yaser Martinez Palenzuela	3c210f45fa	make use of log_smooth_count	2016-03-17 12:19:52 +01:00
Matthew Honnibal	fcaa0ad7ce	Merge pull request #280 from wbwseeker/german_parser German parser	2016-03-04 03:27:42 +11:00
Wolfgang Seeker	690c5acabf	adjust train.py to train both english and german models	2016-03-03 15:21:00 +01:00
Matthew Honnibal	9d51e4d13c	Delete gather_freqs.py This script was in a broken state, and should be unnecessary. The functionality is subsumed by `get_freqs.py`	2016-03-02 00:42:55 +11:00
Yaser Martinez Palenzuela	1a93d7f725	replace codecs.open with io.open	2016-03-01 14:10:11 +01:00
Wolfgang Seeker	eae35e9b27	add tokenizer files for German, add/change code to train German pos tagger - add files to specify rules for German tokenization - change generate_specials.py to generate from an external file (abbrev.de.tab) - copy gazetteer.json from lang_data/en/ - init_model.py - change doc freq threshold to 0 - add train_german_tagger.py - expects conll09-formatted input	2016-02-18 13:24:20 +01:00
Henning Peters	a89ca6537b	fix cythonize	2016-02-05 16:17:23 +01:00
Henning Peters	3a50448bf3	py3 compatibility	2016-02-05 15:43:50 +01:00
Henning Peters	7627969aba	refactor, listen on setup.py, *.pxd	2016-02-05 15:37:00 +01:00
Matthew Honnibal	5dc6cffc67	* Fix gather_freqs.py	2016-02-04 20:21:58 +01:00
Matthew Honnibal	e2ed6251d7	* Fancy up the CLI for the conll train script	2016-02-02 22:58:06 +01:00
Matthew Honnibal	a676d66807	* Update the CoNLL train script, to get working on other languages	2016-02-02 22:29:34 +01:00
Henning Peters	73674a4afb	try using system-wide headers	2015-12-13 12:51:23 +01:00
Henning Peters	92fabd0114	wrap virtualenv around cythonize	2015-12-13 12:32:22 +01:00
Henning Peters	9662cf04c9	new approach to dependency headers	2015-12-13 11:53:02 +01:00
Matthew Honnibal	6e68b344c1	* Train after parsing, not before.	2015-11-12 04:43:52 +11:00
Matthew Honnibal	4fb038a9eb	* Update conll_train.py script for spaCy v0.97	2015-10-31 00:53:51 +11:00
Matthew Honnibal	cfaa4bde5d	* Add train and parse scripts that use CoNLL formatted data	2015-10-30 12:54:49 +11:00
Matthew Honnibal	2348a08481	* Load/dump strings with a json file, instead of the hacky strings file we were using.	2015-10-22 21:13:03 +11:00
Matthew Honnibal	0ce12e4548	* Import io in get_freqs	2015-10-19 12:56:18 +11:00

1 2 3 4 5

216 Commits