spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-03-17 16:42:16 +03:00

Author	SHA1	Message	Date
Ines Montani	139428c20f	Set unique vector names in tests	2019-09-16 15:16:54 +02:00
Ines Montani	655b434553	Merge branch 'master' into develop	2019-09-12 11:39:18 +02:00
tamuhey	71909cdf22	Fix iss4278 (#4279 ) * fix: len(tuple) == 2 * (#4278) add fail test * add contributor's aggreement	2019-09-12 10:44:49 +02:00
Ines Montani	e82a8d0d7a	Merge branch 'master' into develop	2019-09-11 11:52:38 +02:00
Ines Montani	8f9f48b04c	Add GreekLemmatizer.lookup (resolves #4272 )	2019-09-11 11:44:40 +02:00
Ines Montani	6279d74c65	Tidy up and auto-format	2019-09-11 11:38:22 +02:00
Matthew Honnibal	7b858ba606	Update from master	2019-09-10 20:14:08 +02:00
adrianeboyd	3780e2ff50	Flush tokenizer cache when necessary (#4258 ) Flush tokenizer cache when affixes, token_match, or special cases are modified. Fixes #4238, same issue as in #1250.	2019-09-08 20:52:46 +02:00
Matthew Honnibal	1a65c5b7af	Update develop from master	2019-09-08 18:21:41 +02:00
Adriane Boyd	0f28418446	Add regression test for #1061 back to test suite	2019-09-04 20:42:24 +02:00
Ines Montani	419ae59c79	Make flaky test test_issue_1971_4 more explicit	2019-08-31 14:08:05 +02:00
svlandeg	7bec0ebbcb	failing unit test for Issue 4190	2019-08-28 14:16:34 +02:00
Matthew Honnibal	22250cf6b7	Make regression test less sensitive to tag-map stuff	2019-08-25 21:54:26 +02:00
Matthew Honnibal	bb911e5f4e	Fix #3830 : 'subtok' label being added even if learn_tokens=False (#4188 ) * Prevent subtok label if not learning tokens The parser introduces the subtok label to mark tokens that should be merged during post-processing. Previously this happened even if we did not have the --learn-tokens flag set. This patch passes the config through to the parser, to prevent the problem. * Make merge_subtokens a parser post-process if learn_subtokens * Fix train script * Add test for 3830: subtok problem * Fix handlign of non-subtok in parser training	2019-08-23 17:54:00 +02:00
Sofie Van Landeghem	c417c380e3	Matcher ID fixes (#4179 ) * allow phrasematcher to link one match to multiple original patterns * small fix for defining ent_id in the matcher (anti-ghost prevention) * cleanup * formatting	2019-08-22 17:17:07 +02:00
Sofie Van Landeghem	de272f8b82	adding double match for optional operator at the end (#4166 )	2019-08-21 22:46:56 +02:00
Sofie Van Landeghem	01c5980187	Serialize POS attribute when doc.is_tagged (#4092 ) * fix and unit test for issue 3959 * additional unit test for manifestation of the same (resolved) bug	2019-08-21 21:59:30 +02:00
Sofie Van Landeghem	7539a4f3a8	use states[q] in while retry loop (#4162 )	2019-08-21 21:58:04 +02:00
Ines Montani	f580302673	Tidy up and auto-format	2019-08-20 17:36:34 +02:00
Ines Montani	364aaf5bc2	Simplify test	2019-08-20 16:41:58 +02:00
Sofie Van Landeghem	68ee0384fd	Unit test for Issue 3879 (#4153 ) * failing unit test for Issue #3879 * mark test as failing	2019-08-20 16:40:25 +02:00
Ines Montani	86cd7f0efd	Add regression test for #4120	2019-08-20 16:33:09 +02:00
Ines Montani	009280fbc5	Tidy up and auto-format	2019-08-18 15:09:16 +02:00
AJ Rader	2f3648700c	Correction of default lemmatizer lookup in English (Issue # 4104) (#4110 ) * pytest file for issue4104 established * edited default lookup english lemmatizer for spun; fixes issue 4102 * eliminated parameterization and sorted dictionary dependnency in issue 4104 test * added contributor agreement	2019-08-15 11:39:10 +02:00
Sofie Van Landeghem	963ea5e8d0	Update lemma and vector information after splitting a token (#4097 ) * fixing vector and lemma attributes after retokenizer.split * fixing unit test with mockup tensor * xp instead of numpy	2019-08-08 15:09:44 +02:00
Sofie Van Landeghem	ad09b0d6f3	fetch norm from lex if necessary for matching (#4080 )	2019-08-05 23:51:04 +02:00
adrianeboyd	925a852bb6	Improve NER per type scoring (#4052 ) * Improve NER per type scoring * include all gold labels in per type scoring, not only when recall > 0 * improve efficiency of per type scoring * Create Scorer tests, initially with NER tests * move regression test #3968 (per type NER scoring) to Scorer tests * add new test for per type NER scoring with imperfect P/R/F and per type P/R/F including a case where R == 0.0	2019-08-01 17:15:36 +02:00
Sofie Van Landeghem	f7d950de6d	ensure the lang of vocab and nlp stay consistent (#4057 ) * ensure the language of vocab and nlp stay consistent across serialization * equality with =	2019-08-01 17:13:01 +02:00
Sofie Van Landeghem	7de3b129ab	Resolve edge case when calling textcat.predict with empty doc (#4035 ) * resolve edge case where no doc has tokens when calling textcat.predict * more explicit value test	2019-07-30 14:58:01 +02:00
Sofie Van Landeghem	ba02957c80	Fix dependency copy for as_doc (#3969 ) * failing unit test for issue 3962 * attempt to fix Issue #3962 * create artificial unit test example * using length instead of self.length * sp * reformat with black * find better ancestor within span and use generic 'dep' * attach to span.root if there is no appropriate ancestor * comment span text * clean up ancestor code * reconstruct dep tree to keep same number of sentences	2019-07-23 18:28:54 +02:00
Ines Montani	a32b033b8c	Add regression test for #4002 Test that the PhraseMatcher can match on overwritten NORM attributes.	2019-07-22 14:18:24 +02:00
Falak Asad	ff1e73e35c	Bugfix/issue 3968 (#3982 ) * Fix for issue-3968 * Added contributor agreement * Made suggested changes	2019-07-18 00:20:32 +02:00
Ines Montani	073013f129	Auto-format [ci skip]	2019-07-17 12:34:13 +02:00
Ines Montani	62ff128888	Add regression test for #3951	2019-07-16 14:00:00 +02:00
Ines Montani	7f551050b1	Add regression test for #3972	2019-07-16 13:07:35 +02:00
Sofie Van Landeghem	ed774cb953	Fixing ngram bug (#3953 ) * minimal failing example for Issue #3661 * referenced Issue #3661 instead of Issue #3611 * cleanup	2019-07-12 10:01:35 +02:00
Ines Montani	673c864a06	Fix doc.count_by functionality (#3950 ) Fix doc.count_by functionality	2019-07-11 13:44:00 +02:00
Ines Montani	2426f4d44c	Fix default punctuation rules for splitting Hindi text (#3948 ) Fix default punctuation rules for splitting Hindi text Co-authored-by: yash <patadiayash@gmail.com> Co-authored-by: Ines Montani <ines@ines.io>	2019-07-11 13:36:28 +02:00
svlandeg	349107daa3	cleanup	2019-07-11 13:09:22 +02:00
Matthew Honnibal	b40b4c2c31	💫 Fix issue #3839 : Incorrect entity IDs from Matcher with operators (#3949 ) * Add regression test for issue #3541 * Add comment on bugfix * Remove incorrect test * Un-xfail test	2019-07-11 12:55:11 +02:00
Ines Montani	197cfd7ebc	Merge branch 'master' into pr/3948	2019-07-11 12:18:31 +02:00
Ines Montani	0b8406a05c	Tidy up and auto-format	2019-07-11 12:02:25 +02:00
yash	ae2d52e323	Add default encoding utf-8 for test file	2019-07-11 15:26:27 +05:30
yash	d5311b3c42	Add test file for issue (#3625 ) and spacy contributor agreement	2019-07-11 14:53:14 +05:30
svlandeg	e080412385	tracked the bug down to PreshCounter.inc - still unclear what goes wrong	2019-07-11 01:53:06 +02:00
svlandeg	a89fecce97	failing unit test for issue #3869	2019-07-11 00:43:55 +02:00
Matthew Honnibal	465456edb9	Un-xfail test #3880	2019-07-10 14:01:17 +02:00
Matthew Honnibal	87f7ec34d5	Add test for #3880	2019-07-10 13:53:55 +02:00
Ines Montani	82045aac8a	Merge regression tests	2019-07-10 12:49:18 +02:00
Ines Montani	570ab1f481	Fix handling of old entity ruler files Expected an `entity_ruler.jsonl` file in the top-level model directory, so the path passed to from_disk by default (model path plus componentn name), but with the suffix ".jsonl".	2019-07-10 12:14:12 +02:00

1 2 3 4 5 ...

405 Commits