spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-02-05 14:10:34 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
ines	6d2c85f428	Drop six and related hacks as a dependency	2018-03-28 10:45:25 +02:00
ines	f3f8bfc367	Add built-in factories for merge_entities and merge_noun_chunks Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).	2018-03-15 17:16:54 +01:00
Matthew Honnibal	f9f46e5a07	Revert matcher fixes from GregDubbin	2018-02-18 10:59:28 +01:00
Aaron Marquez	f0d3672e17	Changed loading EN model	2018-02-15 14:28:38 -08:00
Aaron Marquez	7ba4111554	Add test for issue-1959	2018-02-15 12:46:22 -08:00
Matthew Honnibal	4cb861e080	Merge pull request #1968 from DuyguA/is_currency New lexical feature is_currency	2018-02-15 12:13:36 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
4altinok	471d3c9e23	added lex test for is_currency	2018-02-11 18:50:50 +01:00
Matthew Honnibal	fd9fd275c5	Make test for #1945 more precise	2018-02-07 02:06:11 +01:00
Matthew Honnibal	c087a14380	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-07 01:29:39 +01:00
Matthew Honnibal	76d89b2180	Add test for #1945 : PhraseMatcher regression	2018-02-07 01:29:23 +01:00
Matthew Honnibal	2e7391e627	Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp Bug/fix not passing in model cfg in nlp	2018-02-05 01:19:40 +01:00
Matthew Honnibal	f74a802d09	Test and fix #1919 : Error resuming training	2018-02-02 02:32:40 +01:00
Motoki Wu	54062b7326	added tests for issue #1915	2018-01-30 18:30:19 -08:00
ines	8901814248	Improve error handling if pipeline component is not callable (resolves #1911 ) Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.	2018-01-30 15:43:03 +01:00
Matthew Honnibal	512e6adb08	Merge pull request #1896 from thomasopsomer/fix-sent Fix sentence boundaries serialization (issue #1834)	2018-01-28 21:18:51 +01:00
Matthew Honnibal	f5b1ad4100	Limit parser model size, to hopefully reduce memory during CI tests	2018-01-28 21:00:32 +01:00
Thomas Opsomer	45d62561f7	add test for the issue	2018-01-28 19:49:56 +01:00
Matthew Honnibal	6a8cb905aa	Merge pull request #1876 from GregDubbin/master Pattern matcher fixes	2018-01-24 16:38:11 +01:00
Matthew Honnibal	edb71a280e	Add test for #1883 : Unpickling Matcher	2018-01-24 15:42:33 +01:00
Matthew Honnibal	42a18ef903	Add test for #1868 : Vocab.__contains__ with ints	2018-01-23 23:27:05 +01:00
greg	85ab99e692	Correct test examples	2018-01-23 15:00:14 -05:00
Matthew Honnibal	91e916cb67	Add comment to new test	2018-01-23 19:11:53 +01:00
Matthew Honnibal	fd187d71ad	Add test for #1727	2018-01-23 19:11:01 +01:00
Matthew Honnibal	7e6dc283db	Fix unicode import in test	2018-01-22 23:55:44 +01:00
greg	686735b94e	Fix matcher import	2018-01-22 16:53:05 -05:00
Matthew Honnibal	4ce7d24fd5	Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.	2018-01-22 20:18:38 +01:00
greg	7072b395c9	Add greedy matcher tests	2018-01-16 15:46:13 -05:00
Matthew Honnibal	ccb51a9f36	Make .similarity() return 1.0 if all orth attrs match	2018-01-15 16:29:48 +01:00
Matthew Honnibal	82135d85b7	Fix test	2018-01-15 15:55:15 +01:00
Matthew Honnibal	4b09616b58	Add test for #1757 : Comparison against None	2018-01-15 15:55:01 +01:00
Matthew Honnibal	9e413449f6	Fix unicode error in new test	2018-01-15 15:39:00 +01:00
Matthew Honnibal	6b215d2dd3	Add test for Issue #1537	2018-01-15 15:20:56 +01:00
ines	5babb7d6f6	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-14 17:31:09 +01:00
ines	793890cb4d	Remove test for removed deprecation warning	2018-01-14 17:31:06 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	0153220304	Make set_vector add word to vocab. Fixes #1807	2018-01-14 13:57:57 +01:00
Ines Montani	55754f0cee	Merge pull request #1836 from fucking-signup/master Add tests for issue #1769	2018-01-13 00:23:35 +00:00
Kit	4ee97f20a0	Mark like_num tests as slow	2018-01-13 00:44:15 +01:00
Kit	855531537e	Rewrite tests for issue #1769	2018-01-12 23:49:51 +01:00
Kit	5b541cb5ec	Simplify tests for issue #1769	2018-01-12 23:34:27 +01:00
Kit	7a2adc4633	Remove some tests to see build status changes	2018-01-12 22:49:16 +01:00
Kit	0e62809a43	Rewrite tests for issue #1769	2018-01-12 22:26:06 +01:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
Kit	76f4eeca44	Remove tests to see build changes on Windows (Python 2.7)	2018-01-12 20:30:51 +01:00
Kit	7ec0956e8d	Add regression test (issue #1769 )	2018-01-08 03:42:04 +01:00
Søren Lind Kristiansen	62de5da1ff	Remove unsused dummy variable	2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen	10dab8eef8	Remove dummy variable from function calls	2018-01-05 09:37:05 +01:00
Kevin Humphreys	597df5bf83	add test	2018-01-03 13:00:05 -08:00
Ines Montani	ff9fc945ab	Merge pull request #1749 from sorenlind/da_ud_tokenization Tune Danish tokenizer to more closely match Universal Dependencies	2017-12-22 16:00:49 +00:00
ines	26f313dabc	Fix missing import	2017-12-22 16:21:44 +01:00
ines	8dc1c27841	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-22 16:01:00 +01:00
ines	b10ba848b8	xfail test that causes MemoryError on Python 2 on Windows Need to investigate this further!	2017-12-22 16:00:58 +01:00
Ines Montani	a3dd167d7f	Merge branch 'master' into da_ud_tokenization	2017-12-20 21:05:34 +00:00
Ines Montani	d682a8803e	Merge pull request #1672 from cbilgili/master Adds Turkish Lemmatization	2017-12-20 21:01:00 +00:00
Søren Lind Kristiansen	15d13efafd	Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.	2017-12-20 17:36:52 +01:00
Ines Montani	9c1ee65268	Add regression test for #1698	2017-12-12 10:36:11 +01:00
Isaac Sijaranamual	38021fbb00	Switch from python 3 only TemporaryDirectory to pytest's tmpdir	2017-12-11 00:16:04 +01:00
Isaac Sijaranamual	568130ce7c	Adds regression test_issue1622	2017-12-10 23:00:48 +01:00
Matthew Honnibal	36b47e3fa6	Fix (and test) vector pickling	2017-12-07 09:53:30 +01:00
Canbey Bilgili	abe098b255	Adds Turkish Lemmatization	2017-12-01 17:04:32 +03:00
Vadim Mazaev	4ba7ddf651	Bugfixies	2017-11-30 12:29:38 +03:00
Matthew Honnibal	6bc0f4d29f	Merge pull request #1611 from fsonntag/master Solving #1494	2017-11-29 23:11:23 +01:00
Matthew Honnibal	f9ed9ea529	Merge pull request #1624 from GreenRiverRUS/russian Add support for Russian	2017-11-29 23:10:01 +01:00
ines	a31506e060	Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654 )	2017-11-28 20:37:55 +01:00
ines	b62739fbfe	Add regression test for #1654	2017-11-28 20:27:54 +01:00
ines	2e50dbb9d7	Simplify test	2017-11-28 20:27:27 +01:00
Felix Sonntag	724ae7dc55	Fixed issue of infix capturing prefixes	2017-11-28 17:17:12 +01:00
Søren Lind Kristiansen	0ffd27b0f6	Add several Danish alternative spellings	2017-11-27 13:35:41 +01:00
Vadim Mazaev	53e7c38637	Fixed tests depends on pymorphy2	2017-11-26 21:04:44 +03:00
Vadim Mazaev	cacd859dcd	Added tag map, fixed tests fails, added more exceptions	2017-11-26 20:54:48 +03:00
Ines Montani	a7bb8f1b42	Merge pull request #1637 from sorenlind/da_tokenization Improve Danish tokenization	2017-11-26 15:41:38 +00:00
ines	c699aec089	Add offsets_from_biluo_tags helper and tests (see #1626 )	2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen	6aa241bcec	Add day of month tokenizer exceptions for Danish.	2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen	0c276ed020	Add weekday abbreviations and remove abiguous month abbreviations for Danish.	2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen	056547e989	Add multiple tokenizer exceptions for Danish.	2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen	8dc265ac0c	Add test for tokenization of 'i.' for Danish.	2017-11-24 11:29:37 +01:00
Matthew Honnibal	30ba81f881	Merge pull request #1576 from ligser/master Actually reset caches in pipe [wip]	2017-11-23 12:54:48 +01:00
ines	c90fe92e15	Fix displaCy test	2017-11-22 05:04:39 +01:00
ines	a6f33ac27d	Fix displaCy test	2017-11-22 04:19:28 +01:00
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
Burton DeWilde	635792997c	Add regression test for #1612	2017-11-20 12:05:35 -06:00
ines	d70a64d78b	Fix syntax error and formatting in test (see #1617 )	2017-11-20 14:01:25 +01:00
ines	17849dee4b	Fix French test (see #1617 )	2017-11-20 13:59:59 +01:00
Felix Sonntag	8be3392302	Added regression text for 1494	2017-11-19 16:30:35 +01:00
Motoki Wu	b818afaa0e	Added failing test for Issue #1207 . The noun chunk iterator should work for `Doc` but not for `Span`.	2017-11-17 17:04:27 -08:00
ines	a3d4dd1a5d	Test adding of lots of pipeline components (see #1585 ) Just to make sure that there's no error now or in the future with adding a large number of pipeline components.	2017-11-15 17:28:06 +01:00
Roman Domrachev	505c6a2f2f	Completely cleanup tokenizer cache Tokenizer cache can have be different keys than string That modification can slow down tokenizer and need to be measured	2017-11-15 17:55:48 +03:00
Roman Domrachev	3e21680814	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
Roman Domrachev	4e378dc4a4	Remove all obsolete code and test only initial problem	2017-11-14 20:45:04 +03:00
Roman	47ce2347b0	Create test that fails when actual cleanup caused	2017-11-14 20:28:13 +03:00
Roman Domrachev	3d247d2bb8	Get back previous testcase	2017-11-14 18:01:37 +03:00
Roman Domrachev	a2745b0e84	StringStore now actually cleaned Do not lose docs in ref tracking	2017-11-14 17:45:50 +03:00
Roman Domrachev	ee60a52ee7	Fix test imports and last batch cleanup	2017-11-11 11:32:16 +03:00
Roman Domrachev	3c600adf23	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
ines	ee97fd3cb4	Add regression test for #1547	2017-11-11 00:14:03 +01:00
ines	2df27db671	Add unicode declaration	2017-11-11 00:13:56 +01:00
ines	1c218397f6	Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options	2017-11-09 02:29:03 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00

1 2 3 4 5 ...

962 Commits