spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-08 00:09:45 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	cf5fcf0546	Update serialization test	2018-03-28 20:12:53 +02:00
Matthew Honnibal	4555e3e251	Dont assume pretrained_vectors cfg set in build_tagger	2018-03-28 20:12:45 +02:00
Matthew Honnibal	f8dd905a24	Warn and fallback if vectors have no name	2018-03-28 18:24:53 +02:00
Matthew Honnibal	fd9e259414	Add test for #1660	2018-03-28 18:22:51 +02:00
Matthew Honnibal	bc4afa9881	Remove print statement	2018-03-28 17:48:37 +02:00
Matthew Honnibal	79dc241caa	Set pretrained_vectors in parser cfg	2018-03-28 17:35:07 +02:00
Matthew Honnibal	17c3e7efa2	Add message noting vectors	2018-03-28 16:33:43 +02:00
Matthew Honnibal	9bf6e93b3e	Set pretrained_vectors in begin_training	2018-03-28 16:32:41 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
Matthew Honnibal	070b6c6495	Remove dependency on ftfy	2018-03-28 12:07:02 +02:00
ines	6d2c85f428	Drop six and related hacks as a dependency	2018-03-28 10:45:25 +02:00
ines	9e83513004	Add position of invalid token to error message	2018-03-27 23:56:59 +02:00
ines	11c4735ccf	Fix issue in Italian lemmatizer data (resolves #2050 )	2018-03-27 23:55:22 +02:00
ines	693971dd8f	Improve error message if token text is empty string (see #2101 )	2018-03-27 22:25:40 +02:00
ines	0c829e6605	Fix whitespace	2018-03-27 22:20:59 +02:00
Matthew Honnibal	d4680e4d83	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-27 13:36:37 +02:00
Matthew Honnibal	63a267b34d	Fix #2073 : Token.set_extension not working	2018-03-27 13:36:20 +02:00
Ines Montani	68226109f4	Merge pull request #2142 from jimregan/polish-more-tokens more exceptions	2018-03-24 19:06:44 +01:00
Matthew Honnibal	d566e673bf	Set version to v2.0.10	2018-03-24 18:09:03 +01:00
Matthew Honnibal	0d3bf0d4eb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-24 17:31:49 +01:00
dejanmarich	ccd1c04c63	Update stop_words.py Added more words	2018-03-24 17:31:24 +01:00
ines	f1446b0257	Port over Turkish changes	2018-03-24 17:31:07 +01:00
DuyguA	cd604878a4	quick typo fix	2018-03-24 17:26:35 +01:00
Matthew Honnibal	406548b976	Support .gz and .tar.gz files in spacy init-model	2018-03-24 17:18:32 +01:00
Jim O'Regan	efe037e8be	more exceptions	2018-03-24 00:05:27 +00:00
Matthew Honnibal	e3be3d65b3	Version as v2.0.10.dev0	2018-03-15 17:31:22 +01:00
ines	f3f8bfc367	Add built-in factories for merge_entities and merge_noun_chunks Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).	2018-03-15 17:16:54 +01:00
alldefector	f4e5904fc2	Fix Spanish noun_chunks failure caused by typo	2018-03-14 17:03:17 +01:00
Thomas Opsomer	fbf48b3f9f	lemma property to return hash instead of unicode	2018-03-14 17:03:00 +01:00
Matthew Honnibal	8cefc58abc	Fix Vectors pickling	2018-03-14 16:59:37 +01:00
Matthew Honnibal	307aefe131	Increment version to v2.0.9	2018-02-22 17:07:53 +01:00
Ines Montani	14e7e0f12a	Merge pull request #2000 from jimregan/polish-tag-map Polish tag map	2018-02-18 19:05:58 +01:00
Jim O'Regan	664407de5d	missing PrepCase attribute	2018-02-18 14:46:12 +00:00
Jim O'Regan	95f0673fbc	fix typo/missing here too	2018-02-18 14:38:27 +00:00
Matthew Honnibal	cf0e320f2b	Add doc.is_sentenced attribute, re #1959	2018-02-18 14:16:55 +01:00
Matthew Honnibal	1e5aeb4eec	Merge pull request #1987 from thomasopsomer/span-sent Make span.sent work when only manual / custom sbd	2018-02-18 14:05:37 +01:00
Matthew Honnibal	1cf774bdc1	Add output options return_matches and as_tuples to Matcher	2018-02-18 14:00:45 +01:00
Matthew Honnibal	dd9b0945af	Fix inconsistencies in the symbols table	2018-02-18 13:51:31 +01:00
Matthew Honnibal	66496ac8e1	Set version to v2.1.0.dev0	2018-02-18 13:48:39 +01:00
Matthew Honnibal	eb3040ce46	Merge pull request #1891 from fucking-signup/master Fix issue #1889	2018-02-18 13:47:47 +01:00
ines	6bba1db4cc	Drop six and related hacks as a dependency	2018-02-18 13:29:56 +01:00
Matthew Honnibal	b30b09192a	Merge pull request #1665 from jimregan/animacy typo in "inan", add "nhum"	2018-02-18 13:26:53 +01:00
Matthew Honnibal	1b3c98e01b	Set version to v2.0.8	2018-02-18 12:16:31 +01:00
Matthew Honnibal	f9f46e5a07	Revert matcher fixes from GregDubbin	2018-02-18 10:59:28 +01:00
Matthew Honnibal	86405e4ad1	Fix CLI for multitask objectives	2018-02-18 10:59:11 +01:00
Matthew Honnibal	a34749b2bf	Add multitask objectives options to train CLI	2018-02-17 22:03:54 +01:00
Matthew Honnibal	8f06903e09	Fix multitask objectives	2018-02-17 18:41:36 +01:00
Matthew Honnibal	d1246c95fb	Fix model loading when using multitask objectives	2018-02-17 18:11:36 +01:00
Matthew Honnibal	262d0a3148	Fix overwriting of lexical attributes when loading vectors during training	2018-02-17 18:11:11 +01:00
Matthew Honnibal	c0caf7cf27	Fix LANG symbol	2018-02-17 18:10:50 +01:00

1 2 3 4 5 ...

4810 Commits