spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-26 18:06:29 +03:00

Author	SHA1	Message	Date
ines	bb5c631402	Implement like_num getter for French (via #1161 )	2017-09-26 16:47:45 +02:00
ines	15479b3bae	Add comment to like_num re: future work	2017-09-26 16:43:28 +02:00
ines	adda08fe14	Implement like_num getter for Dutch (via #1177 )	2017-09-26 16:39:15 +02:00
ines	5ee10379db	Port over changes from #1340	2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun	5cba67146c	add thai in spacy2	2017-09-26 21:36:27 +07:00
ines	10d291f129	Port over change from #1351	2017-09-26 16:11:41 +02:00
Matthew Honnibal	3274b46a0d	Try to fix compile error on Windows	2017-09-26 09:05:53 -05:00
Matthew Honnibal	19c7c09bf7	Fix PhraseMatcher.__contains__	2017-09-26 08:35:53 -05:00
Matthew Honnibal	d02a41a8c9	Merge remote-tracking branch 'origin/develop' into feature/phrasematcher	2017-09-26 08:32:55 -05:00
Matthew Honnibal	698fc0d016	Remove merge artefact	2017-09-26 08:31:37 -05:00
Matthew Honnibal	defb68e94f	Update feature/noshare with recent develop changes	2017-09-26 08:15:14 -05:00
Matthew Honnibal	ca28590ddd	Use dep and ent multi-task objectives for parser'	2017-09-26 08:13:52 -05:00
Matthew Honnibal	9bfd585a11	Fix parameter name in .pxd file	2017-09-26 07:28:50 -05:00
Matthew Honnibal	74f08e1ad5	Update test	2017-09-26 06:45:56 -05:00
Matthew Honnibal	5aaef3e7b8	Dont link vectors in vocab deserialize	2017-09-26 06:45:47 -05:00
Matthew Honnibal	18a27c7579	Fix typo in tensorizer serialization	2017-09-26 06:45:14 -05:00
Matthew Honnibal	5056743ad5	Fix parser serialization	2017-09-26 06:44:56 -05:00
Ines Montani	7123139b2b	Add __contains__ to PhraseMatcher	2017-09-26 13:13:27 +02:00
Ines Montani	50ad50f96a	Update matcher.pyx	2017-09-26 13:11:17 +02:00
Matthew Honnibal	e34e70673f	Allow tagger models to be built with pre-defined tok2vec layer	2017-09-26 05:51:52 -05:00
Matthew Honnibal	bf917225ab	Allow multi-task objectives during training	2017-09-26 05:42:52 -05:00
Matthew Honnibal	4ae9ea7684	Remove unused argument in Language	2017-09-26 05:41:35 -05:00
ines	edf7e4881d	Add meta.json option to cli.train and add relevant properties Add accuracy scores to meta.json instead of accuracy.json and replace all relevant properties like lang, pipeline, spacy_version in existing meta.json. If not present, also add name and version placeholders to make it packagable.	2017-09-25 19:00:47 +02:00
ines	d2d35b63b7	Fix formatting	2017-09-25 18:37:13 +02:00
Matthew Honnibal	8eb0b7b779	Add docstrings for Pipe API	2017-09-25 16:22:07 +02:00
Matthew Honnibal	39f390dba7	Add docstrings for Pipe API	2017-09-25 16:20:49 +02:00
Matthew Honnibal	8716ffe57d	Serialize vocab last	2017-09-24 05:01:45 -05:00
Matthew Honnibal	72bbcc0871	Handle lemmatization for unknown string IDs	2017-09-24 05:01:31 -05:00
Matthew Honnibal	204b58c864	Fix evaluation during training	2017-09-24 05:01:03 -05:00
Matthew Honnibal	dc3a623d00	Remove unused update_shared argument	2017-09-24 05:00:37 -05:00
Matthew Honnibal	63bd87508d	Don't use iterated convolutions	2017-09-23 04:39:17 -05:00
Matthew Honnibal	5a7fd0fd36	Fix vector linkage	2017-09-22 20:11:52 -05:00
Matthew Honnibal	4348c479fc	Merge pre-trained vectors and noshare patches	2017-09-22 20:07:28 -05:00
Matthew Honnibal	7dc61b3f43	Whitespace	2017-09-22 20:00:50 -05:00
Matthew Honnibal	e93d43a43a	Fix training with preset vectors	2017-09-22 20:00:40 -05:00
Matthew Honnibal	0795857dcb	Fix beam parsing	2017-09-23 02:59:53 +02:00
Matthew Honnibal	4bd6a12b1f	Fix Tok2Vec	2017-09-23 02:58:54 +02:00
Matthew Honnibal	386c1a5bd8	Fix tagger training	2017-09-23 02:58:06 +02:00
Matthew Honnibal	a2357cce3f	Set random seed in train script	2017-09-23 02:57:31 +02:00
Matthew Honnibal	05596159bf	Fix serialization when pre-trained vectors	2017-09-22 15:33:27 -05:00
Matthew Honnibal	980fb6e854	Refactor Tok2Vec	2017-09-22 09:38:36 -05:00
Matthew Honnibal	d9124f1aa3	Add link_vectors_to_models function	2017-09-22 09:38:22 -05:00
Matthew Honnibal	a186596307	Add 'reapply' combinator, for iterated CNN	2017-09-22 09:37:03 -05:00
Matthew Honnibal	40a4873b70	Fix serialization of model options	2017-09-21 13:07:26 -05:00
Matthew Honnibal	0a9016cade	Fix serialization during training	2017-09-21 13:06:45 -05:00
Matthew Honnibal	20193371f5	Don't share CNN, to reduce complexities	2017-09-21 14:59:48 +02:00
Matthew Honnibal	1d73dec8b1	Refactor train script	2017-09-20 19:17:10 -05:00
Matthew Honnibal	ffda38356a	Add util function to enable GPU	2017-09-20 19:16:35 -05:00
Matthew Honnibal	24e85c2048	Pass values for CNN maxout pieces option	2017-09-20 19:16:12 -05:00
Matthew Honnibal	b832f89ff8	Add resume_training function	2017-09-20 19:15:20 -05:00
Matthew Honnibal	f5144f04be	Add argument for CNN maxout pieces	2017-09-20 19:14:41 -05:00
Matthew Honnibal	842e21de9f	Fix int type error for Python 2	2017-09-20 23:55:30 +02:00
Matthew Honnibal	0c93c73e49	Add __reduce__ method for PhraseMatcher	2017-09-20 22:26:40 +02:00
Matthew Honnibal	cc408fc189	Make PhraseMatcher API like Matcher API	2017-09-20 22:20:35 +02:00
Matthew Honnibal	43ad250dd5	Update matcher tests	2017-09-20 21:54:49 +02:00
Matthew Honnibal	828cc91545	Fix PhraseMatcher for spaCy 2	2017-09-20 21:54:31 +02:00
Matthew Honnibal	78301b2d29	Avoid comparison to None in Tok2Vec	2017-09-20 00:19:34 +02:00
Matthew Honnibal	b36a38f63d	Fix serialization of pretrained_dims property	2017-09-19 23:42:27 +02:00
Matthew Honnibal	2489dcaccf	Fix serialization of parser	2017-09-19 23:42:12 +02:00
Matthew Honnibal	40837b275d	Fix tensorizer with pretrained vectors	2017-09-18 18:05:38 -05:00
Matthew Honnibal	a0c4b33d03	Support resuming a model during spacy train	2017-09-18 18:04:47 -05:00
Matthew Honnibal	c858927271	Copy vectors to GPU on begin training	2017-09-18 18:04:16 -05:00
Matthew Honnibal	3fa76c17d1	Refactor Tok2Vec	2017-09-18 15:00:05 -05:00
Matthew Honnibal	217e7891cd	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-18 11:36:21 -05:00
Matthew Honnibal	7b3f391f80	Try dropping the Affine layer, conditionally	2017-09-18 11:35:59 -05:00
ines	2480f8f521	Add missing return in Doc.from_disk() (closes #1330 )	2017-09-18 15:32:00 +02:00
Matthew Honnibal	2148ae605b	Dont use iterated convolutions	2017-09-17 17:36:04 -05:00
Matthew Honnibal	c013e5996f	Fix parser test	2017-09-17 13:13:20 -05:00
Matthew Honnibal	8f42f8d305	Remove unused 'preprocess' argument in Tok2Vec'	2017-09-17 12:30:16 -05:00
Matthew Honnibal	039d609362	Remove hard-coded default vectors width	2017-09-17 12:29:39 -05:00
Matthew Honnibal	4f38a67a89	Make width default to 0 in vectors.pyx	2017-09-17 12:29:14 -05:00
Matthew Honnibal	16122f566e	Fix cpdef enum in attrs.pyx	2017-09-17 12:28:53 -05:00
Matthew Honnibal	b159e0eb50	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-17 05:47:50 -05:00
Matthew Honnibal	2b0efc77ae	Fix wiring of pre-trained vectors in parser loading	2017-09-17 05:47:34 -05:00
Matthew Honnibal	31c2e91c35	Fix wiring of pre-trained vectors in parser loading	2017-09-17 05:46:55 -05:00
Matthew Honnibal	8f913a74ca	Fix defaults and args to build_tagger_model	2017-09-17 05:46:36 -05:00
Matthew Honnibal	c003c561c3	Revert NER action loading change, for model compatibility	2017-09-17 05:46:03 -05:00
Matthew Honnibal	43210abacc	Resolve fine-tuning conflict	2017-09-17 05:30:04 -05:00
ines	ece30c28a8	Don't split hyphenated words in German This way, the tokenizer matches the tokenization in German treebanks	2017-09-16 20:40:15 +02:00
ines	68f66aebf8	Use pkg_resources instead of pip for is_package (resolves #1293 )	2017-09-16 20:27:59 +02:00
Matthew Honnibal	5ff2491f24	Pass option for pre-trained vectors in parser	2017-09-16 12:47:21 -05:00
Matthew Honnibal	8665a77f48	Fix feature error in NER	2017-09-16 12:46:57 -05:00
Matthew Honnibal	e37a50a436	Pass documents to tensorizer, not 'features'	2017-09-16 12:46:36 -05:00
Matthew Honnibal	84e637e2e6	Pass option for pretrained vectors in pipeline	2017-09-16 12:46:02 -05:00
Matthew Honnibal	2a93404da6	Support optional pre-trained vectors in tensorizer model	2017-09-16 12:45:37 -05:00
Matthew Honnibal	e0a2aa9289	Support having word vectors data on GPU	2017-09-16 12:45:09 -05:00
Matthew Honnibal	ebf8942564	Fix test for Python3	2017-09-16 16:22:38 +02:00
Matthew Honnibal	8c945310fb	Excuse emoji failure on narrow unicode builds	2017-09-16 16:21:13 +02:00
Matthew Honnibal	11f2a05ede	Fix code explosion from long enum in Python 3, Cython 0.24+	2017-09-16 12:20:04 +02:00
Matthew Honnibal	3fa5b40b5c	Add test for hash consistency	2017-09-16 11:21:35 +02:00
Matthew Honnibal	f730d07e4e	Fix prange error for Windows	2017-09-16 00:25:33 +02:00
Matthew Honnibal	4b2065430e	Merge branch 'feature/parser-history' into develop	2017-09-15 10:42:20 +02:00
Matthew Honnibal	2f08489694	Remove AddHistory layer -- didnt work as planned	2017-09-15 10:41:40 +02:00
Matthew Honnibal	8b481e0465	Remove redundant brackets	2017-09-15 10:38:08 +02:00
Matthew Honnibal	d84607f6bb	Vectorize update in AddHistory	2017-09-14 20:34:40 +02:00
Ines Montani	bd3da3d6fb	Port over change from #1323 and tidy up	2017-09-14 19:23:13 +02:00
Matthew Honnibal	18347ab69c	Implement AddHistory layer wrapper	2017-09-14 19:07:35 +02:00
Matthew Honnibal	d4ca6cef9e	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-14 17:00:07 +02:00
Matthew Honnibal	8c503487af	Fix lookup of missing NER actions	2017-09-14 16:59:45 +02:00
Matthew Honnibal	664c5af745	Revert padding in parser	2017-09-14 16:59:25 +02:00
Matthew Honnibal	8496d76224	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-14 09:21:20 -05:00
Matthew Honnibal	d1518027a9	Increment version	2017-09-14 16:18:46 +02:00
Matthew Honnibal	70da88a3a7	Update comment on Language.begin_training	2017-09-14 16:18:30 +02:00
Matthew Honnibal	c6395b057a	Improve parser feature extraction, for missing values	2017-09-14 16:18:02 +02:00
Matthew Honnibal	daf869ab3b	Fix add_action for NER, so labelled 'O' actions aren't added	2017-09-14 16:16:41 +02:00
Matthew Honnibal	9cb2aef587	Remove print statement	2017-09-14 13:38:28 +02:00
Matthew Honnibal	ba23d63c35	Fix minibatch function, for fixed batch size	2017-09-14 13:37:41 +02:00
Matthew Honnibal	456bb8a74c	Unxfail and close #1305	2017-09-06 19:14:17 +02:00
Matthew Honnibal	99e44fbdbb	Update regression test	2017-09-06 19:13:51 +02:00
Matthew Honnibal	5c3ff06924	Fix lemmatizer rules	2017-09-06 19:13:24 +02:00
Matthew Honnibal	dd9cab0faf	Fix type-check for int/long	2017-09-06 19:03:05 +02:00
Matthew Honnibal	497a9308a8	Xfail new lemmatizer test	2017-09-06 18:41:22 +02:00
Matthew Honnibal	dcbf866970	Merge parser changes	2017-09-06 18:41:05 +02:00
Matthew Honnibal	5384fff5ce	Add test for 1305: Incorrect lemmatization of VBZ for English	2017-09-06 18:40:18 +02:00
Matthew Honnibal	24ff6b0ad9	Fix parsing and tok2vec models	2017-09-06 05:50:58 -05:00
Matthew Honnibal	1b65115bc2	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-04 20:02:53 -05:00
Matthew Honnibal	33fa91feb7	Restore correctness of parser model	2017-09-04 21:19:30 +02:00
Matthew Honnibal	e88a42e460	Increment version	2017-09-04 21:14:39 +02:00
Matthew Honnibal	9d65d67985	Preserve model compatibility in parser, for now	2017-09-04 16:46:22 +02:00
Matthew Honnibal	d5fbf27335	Fix test	2017-09-04 16:45:11 +02:00
Matthew Honnibal	7fdafcc4c4	Fix config loading in tagger	2017-09-04 16:38:49 +02:00
Matthew Honnibal	058372d120	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-04 16:27:53 +02:00
Matthew Honnibal	16e25ce3b5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-04 09:26:53 -05:00
Matthew Honnibal	9f512e657a	Fix drop_layer calculation	2017-09-04 09:26:38 -05:00
Matthew Honnibal	cb4839033c	Fix loader for EN tests	2017-09-04 15:19:18 +02:00
Matthew Honnibal	382ce566eb	Fix deserialization bug	2017-09-04 15:19:01 +02:00
Matthew Honnibal	bfddf50081	Fix #1296 : Incorrect lemmatization of base form verbs	2017-09-04 15:18:41 +02:00
Matthew Honnibal	b29e6bff46	Improve lemmatization rule for am\|VBP	2017-09-04 15:18:10 +02:00
Matthew Honnibal	644d6c9e1a	Improve lemmatization tests, re #1296	2017-09-04 15:17:44 +02:00
Matthew Honnibal	3cf3fa1704	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-02 12:46:11 -05:00
Matthew Honnibal	e920885676	Fix pickle during train	2017-09-02 12:46:01 -05:00
Matthew Honnibal	c0eaba8b28	Fix low-data textcat	2017-09-02 15:17:32 +02:00
Matthew Honnibal	9e378bdac5	Fix textcat serialization	2017-09-02 15:17:20 +02:00
Matthew Honnibal	e3ea6ee02b	Increment version	2017-09-02 15:17:01 +02:00
Matthew Honnibal	a3b69bcb3d	Add low_data mode in textcat	2017-09-02 14:56:30 +02:00
Matthew Honnibal	ead78c7b9b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-02 12:55:25 +02:00
Matthew Honnibal	5e6a9e7dcc	Add rule-based SBD	2017-09-02 12:53:38 +02:00
Matthew Honnibal	a824cf8f9a	Adjust text classification model	2017-09-02 11:41:00 +02:00
Matthew Honnibal	ac040b99bb	Add support for pre-trained vectors in text classifier	2017-09-01 16:39:55 +02:00
Matthew Honnibal	7742a6d559	Add GloVe vectors reader	2017-09-01 16:39:22 +02:00
Matthew Honnibal	789e1a3980	Use 13 parser features, not 8	2017-08-31 14:13:00 -05:00
Matthew Honnibal	30e35d9666	Fix syntax error	2017-08-30 17:35:39 -05:00
Matthew Honnibal	4ceebde523	Fix gradient bug in parser	2017-08-30 17:32:56 -05:00
ines	173089a45a	Add more validation for model meta	2017-08-29 11:21:46 +02:00
Matthew Honnibal	2e28982e28	Merge pull request #1288 from geovedi/indonesian Indonesian language support	2017-08-26 21:31:13 +02:00
ines	7e04b7f89c	Fix info text on pipeline in package cli	2017-08-26 18:30:59 +02:00
ines	40afa13a8a	Increment version	2017-08-26 18:30:49 +02:00
Matthew Honnibal	876f38c548	Merge pull request #1279 from oroszgy/model_cli_v2 Added vector loading to model cli	2017-08-26 15:57:50 +02:00
Matthew Honnibal	cfc055734e	Split % in units, for compatibility with corpus	2017-08-25 20:03:37 -05:00
Matthew Honnibal	4bb6bc3f9e	Add support for sent_start to GoldParse	2017-08-25 20:03:14 -05:00

1 2 3 4 5 ...

3931 Commits