spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-19 05:30:47 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	8be231d490	Tmp. Working on NN NER.	2016-09-08 13:00:24 +02:00
Matthew Honnibal	7c7a05a466	Fix bin/parser/train	2016-09-05 01:44:30 +02:00
Matthew Honnibal	cf2131d649	Decay learn rate for parser	2016-09-04 16:57:10 +02:00
Matthew Honnibal	ed23476c82	Insert into vocab, and print nr_weight and nr_feat	2016-09-01 10:45:06 +02:00
Matthew Honnibal	0c7520dbb7	Update conll_train script	2016-08-29 14:24:30 +02:00
Matthew Honnibal	c57bf6485d	Print dev score before averaging	2016-08-20 04:16:50 +02:00
Matthew Honnibal	ad8f851faa	Default to 0 gradient noise	2016-08-06 16:23:59 +02:00
Matthew Honnibal	532fa36c13	Add parameter for gradient noise	2016-08-05 18:24:01 +02:00
Matthew Honnibal	e38632003d	Save models properly in conll_train.py	2016-07-31 11:42:17 +02:00
Matthew Honnibal	ac63274e15	Tmp	2016-07-27 02:56:36 +02:00
Matthew Honnibal	407ed4652d	* Work on neural network beam	2016-07-24 10:44:59 +02:00
Matthew Honnibal	de7c6c48d8	Working NN, but very messy. Relies on BLIS.	2016-07-20 16:28:02 +02:00
Matthew Honnibal	8036368d96	* Fix model saving	2016-05-23 12:01:46 +00:00
Matthew Honnibal	35214053fd	* Work around get_lex_attr bug introduced during German parsing	2016-05-23 10:53:00 +00:00
Wolfgang Seeker	dae6bc05eb	define German dummy lemmatizer until morphology is done	2016-05-02 16:04:53 +02:00
Matthew Honnibal	8569dbc2d0	* Add initial stuff for Chinese parsing	2016-04-24 18:44:24 +02:00
Wolfgang Seeker	f9150ccf2a	rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip	2016-04-08 13:38:07 +02:00
Wolfgang Seeker	a8f4e49900	update init_model.py to previous (better) state	2016-03-29 16:12:13 +02:00
Matthew Honnibal	d249e2f7f3	* Improve error message in bin/parser/train.py	2016-03-29 13:04:33 +11:00
Yaser Martinez Palenzuela	3c210f45fa	make use of log_smooth_count	2016-03-17 12:19:52 +01:00
Matthew Honnibal	fcaa0ad7ce	Merge pull request #280 from wbwseeker/german_parser German parser	2016-03-04 03:27:42 +11:00
Wolfgang Seeker	690c5acabf	adjust train.py to train both english and german models	2016-03-03 15:21:00 +01:00
Matthew Honnibal	9d51e4d13c	Delete gather_freqs.py This script was in a broken state, and should be unnecessary. The functionality is subsumed by `get_freqs.py`	2016-03-02 00:42:55 +11:00
Yaser Martinez Palenzuela	1a93d7f725	replace codecs.open with io.open	2016-03-01 14:10:11 +01:00
Wolfgang Seeker	eae35e9b27	add tokenizer files for German, add/change code to train German pos tagger - add files to specify rules for German tokenization - change generate_specials.py to generate from an external file (abbrev.de.tab) - copy gazetteer.json from lang_data/en/ - init_model.py - change doc freq threshold to 0 - add train_german_tagger.py - expects conll09-formatted input	2016-02-18 13:24:20 +01:00
Henning Peters	a89ca6537b	fix cythonize	2016-02-05 16:17:23 +01:00
Henning Peters	3a50448bf3	py3 compatibility	2016-02-05 15:43:50 +01:00
Henning Peters	7627969aba	refactor, listen on setup.py, *.pxd	2016-02-05 15:37:00 +01:00
Matthew Honnibal	5dc6cffc67	* Fix gather_freqs.py	2016-02-04 20:21:58 +01:00
Matthew Honnibal	e2ed6251d7	* Fancy up the CLI for the conll train script	2016-02-02 22:58:06 +01:00
Matthew Honnibal	a676d66807	* Update the CoNLL train script, to get working on other languages	2016-02-02 22:29:34 +01:00
Henning Peters	73674a4afb	try using system-wide headers	2015-12-13 12:51:23 +01:00
Henning Peters	92fabd0114	wrap virtualenv around cythonize	2015-12-13 12:32:22 +01:00
Henning Peters	9662cf04c9	new approach to dependency headers	2015-12-13 11:53:02 +01:00
Matthew Honnibal	6e68b344c1	* Train after parsing, not before.	2015-11-12 04:43:52 +11:00
Matthew Honnibal	4fb038a9eb	* Update conll_train.py script for spaCy v0.97	2015-10-31 00:53:51 +11:00
Matthew Honnibal	cfaa4bde5d	* Add train and parse scripts that use CoNLL formatted data	2015-10-30 12:54:49 +11:00
Matthew Honnibal	2348a08481	* Load/dump strings with a json file, instead of the hacky strings file we were using.	2015-10-22 21:13:03 +11:00
Matthew Honnibal	0ce12e4548	* Import io in get_freqs	2015-10-19 12:56:18 +11:00
Matthew Honnibal	17fffb4c57	* Update get_freqs.py script	2015-10-16 04:33:49 +11:00
Matthew Honnibal	5ff4454177	* Update get_freqs.py script	2015-10-16 04:31:15 +11:00
Matthew Honnibal	a748146dd3	* Update get_freqs.py script	2015-10-16 04:24:50 +11:00
Matthew Honnibal	a29fd79fbc	* Update get_freqs.py script	2015-10-16 04:24:08 +11:00
Matthew Honnibal	e08a4b46a2	* Update get_freqs.py script	2015-10-16 04:20:35 +11:00
Matthew Honnibal	92f750cf8b	* Use a gzipped frequencies file in init_model	2015-10-11 06:59:44 +02:00
Matthew Honnibal	064bd69ad0	* Refactor symbols, so that frequency rank can be derived from the orth id of a word.	2015-10-10 16:03:48 +11:00
Matthew Honnibal	83dccf0fd7	* Use io module insteads of deprecated codecs module	2015-10-10 14:13:01 +11:00
Matthew Honnibal	f35632e2e5	* Remove SBD print statement in train, after SBD evaluation was removed from Scorer	2015-10-09 11:08:58 +02:00
Matthew Honnibal	6ea1601e93	* Add script to train models off the UD treebanks. Note that the UD data is restricted to research purposes only, and should only be used to train models for academic experiments.	2015-10-08 12:01:08 +11:00
Matthew Honnibal	c503654ec1	* Update bin/parser/train for printing output.	2015-10-06 10:35:22 +11:00

1 2 3 4 5

205 Commits