spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 12:18:04 +03:00

Author	SHA1	Message	Date
ines	f0986df94b	Add test for #1488 (passes on v2.0.0a18?)	2017-11-03 14:44:36 +01:00
Matthew Honnibal	711278b667	Make test less flakey	2017-11-03 14:36:08 +01:00
Matthew Honnibal	0a534ae96a	Fix test for backprop d_pad	2017-11-03 14:04:16 +01:00
Matthew Honnibal	a22f96c3f1	Add test for backpropagating padding	2017-11-03 00:48:54 +01:00
ines	3af281a334	Update test model name	2017-11-01 23:02:00 +01:00
ines	8c2260e18c	Move span tests to /doc	2017-11-01 16:56:35 +01:00
ines	260cb37224	Catch deprecation warning	2017-11-01 16:49:18 +01:00
ines	5914faafbb	Fix .merge tests to not use deprecated API	2017-11-01 16:49:11 +01:00
Matthew Honnibal	9e0ebee81c	Add Token.is_sent_start property, so can deprecate Token.sent_start	2017-11-01 13:27:14 +01:00
Matthew Honnibal	c047498f87	Fix vectors test	2017-11-01 13:24:47 +01:00
Matthew Honnibal	86eba61fae	Fix token.vector when vectors are missing	2017-11-01 00:47:35 +01:00
Ines Montani	d11659463b	Merge pull request #1152 from jimregan/develop-irish [WIP] attempt a port from #1147	2017-11-01 00:23:43 +01:00
Jim O'Regan	08b0bfd153	merge	2017-10-31 22:55:59 +00:00
Jim O'Regan	00ecfa5417	Ó, not O	2017-10-31 22:54:42 +00:00
Ines Montani	25b1d6cd91	Fix syntax error	2017-10-31 22:36:03 +01:00
Matthew Honnibal	92dc127569	Fix test for Python 3	2017-10-31 22:21:55 +01:00
Jim O'Regan	fe4b10346a	replace example sentence until I get around to adding a punctuation.py	2017-10-31 20:24:53 +00:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Jim O'Regan	d4a8160c36	change quotes	2017-10-31 15:15:44 +00:00
Jim O'Regan	34ca59691b	no idea what is wrong here	2017-10-31 14:50:13 +00:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Matthew Honnibal	cb5217012f	Fix vector remapping	2017-10-31 11:40:46 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
Matthew Honnibal	64e4ff7c4b	Merge 'tidy-up' changes into branch. Resolve conflicts	2017-10-28 13:16:06 +02:00
Ines Montani	4033e70c71	Merge pull request #1461 from explosion/feature/disable-pipes 💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples	2017-10-27 12:21:40 +02:00
Matthew Honnibal	b0f3ea2200	Fix names of pipeline components NeuralDependencyParser --> DependencyParser NeuralEntityRecognizer --> EntityRecognizer TokenVectorEncoder --> Tensorizer NeuralLabeller --> MultitaskObjective	2017-10-26 12:38:23 +02:00
ines	de1e5f35d5	Merge branch 'develop' into feature/disable-pipes	2017-10-25 16:33:12 +02:00
ines	c0b55ebdac	Fix PhraseMatcher.__contains__ and add more tests	2017-10-25 16:31:11 +02:00
ines	657a4d91bc	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:19:05 +02:00
ines	1a722dac31	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:18:18 +02:00
Matthew Honnibal	b5de768852	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-25 14:44:16 +02:00
Matthew Honnibal	094512fd47	Fix model-mark on regression test.	2017-10-25 14:44:00 +02:00
Matthew Honnibal	e70f80f29e	Add Language.disable_pipes()	2017-10-25 13:46:41 +02:00
Ines Montani	d3bf488e16	Merge pull request #1171 from mollerhoj/support-danish Improve basic support for Danish	2017-10-24 20:29:57 +02:00
Matthew Honnibal	908809d488	Update tests	2017-10-24 17:05:15 +02:00
Matthew Honnibal	30e67fa808	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 16:08:23 +02:00
Matthew Honnibal	63f0bde749	Add test for #1250 : Tokenizer cache clobbered special-case attrs	2017-10-24 16:07:18 +02:00
ines	090aed940a	Add test for currently failing span.as_doc case	2017-10-24 16:00:56 +02:00
ines	4ef81a9ebc	Fix whitespace	2017-10-24 16:00:56 +02:00
Matthew Honnibal	4bea65a1a8	Fix Issue #1450 : Off-by-1 in * and ? matches Patterns that end in variable-length operators e.g. * and ? now end on the correct token. Previously, they were off by 1: the next token was pulled into the match, even if that's where the pattern failed.	2017-10-24 14:26:27 +02:00
Matthew Honnibal	391d5ef0d1	Normalize imports in regression test	2017-10-24 14:25:49 +02:00
Matthew Honnibal	b66b8f028b	Fix #1375 -- out-of-bounds on token.nbor()	2017-10-24 12:10:39 +02:00
Matthew Honnibal	a68d89a4f3	Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()	2017-10-24 12:05:25 +02:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
Matthew Honnibal	ccd2ab1a62	Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs	2017-10-24 11:22:46 +02:00
Matthew Honnibal	ef3e5a361b	Merge pull request #1442 from explosion/feature/fix-sp 💫Fix SP tag, tweak Vectors.__init__, fix Morphology	2017-10-24 10:24:07 +02:00
Matthew Honnibal	fdf25d10ba	Merge pull request #1440 from ramananbalakrishnan/develop Support single value for attribute list in doc.to_array	2017-10-24 10:23:12 +02:00
Matthew Honnibal	490ad3eaf0	Check that empty strings are handled. Closes #1242	2017-10-21 00:52:14 +02:00
Ramanan Balakrishnan	d2fe56a577	Add LCA matrix for spans and docs	2017-10-20 23:58:00 +05:30
Matthew Honnibal	d8391b1c4d	Fix #1434 : Matcher failed on ending ? if no token	2017-10-20 16:49:36 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	ebecaddb76	Make 'data_or_width' two keyword args in Vectors.__init__ Previously the data and width options were one argument in Vectors, which meant you couldn't say vectors = Vectors(strings, width=300). It's better to have two keywords.	2017-10-20 14:17:15 +02:00
Ramanan Balakrishnan	b3ab124fc5	Support strings for attribute list in doc.to_array	2017-10-20 11:46:57 +05:30
ines	bf415fd778	Add test for serializing extension attrs (see #1085 )	2017-10-19 00:53:08 +02:00
Matthew Honnibal	fe844148f6	Test pickling hooks	2017-10-17 19:43:52 +02:00
Matthew Honnibal	374819edf8	Test user_data deserialization, re #1085	2017-10-17 19:28:54 +02:00
Matthew Honnibal	8ca97f32a3	Fix doc pickling test	2017-10-17 18:19:57 +02:00
Matthew Honnibal	45d1dd90b1	Add tests for pickling doc	2017-10-17 17:20:58 +02:00
Matthew Honnibal	4174477161	Fix equality check in test	2017-10-16 19:50:35 +02:00
Matthew Honnibal	010a7309ff	Merge pull request #1402 from explosion/feature/fix-matcher-operators 💫 Fix Matcher variable-length operators	2017-10-16 17:53:19 +02:00
Matthew Honnibal	c29927d2e7	Fix matcher test	2017-10-16 17:22:18 +02:00
Matthew Honnibal	a928ae2f35	Merge branch 'develop' into feature/fix-matcher-operators	2017-10-16 13:38:36 +02:00
Matthew Honnibal	748d525801	Add more matcher operator tests	2017-10-16 13:38:01 +02:00
ines	3516aa0cea	Port over changes from #1389	2017-10-14 13:32:55 +02:00
ines	cd6a29dce7	Port over changes from #1294	2017-10-14 13:28:46 +02:00
ines	38c756fd85	Port over changes from #1287	2017-10-14 13:16:21 +02:00
ines	612224c10d	Port over changes from #1157	2017-10-14 13:11:39 +02:00
ines	9b3f8f9ec3	Fix formatting and add comment on languages	2017-10-14 13:11:18 +02:00
ines	a4d974d97b	Port over URL pattern changes from #1411	2017-10-14 12:58:07 +02:00
Matthew Honnibal	cf6da9301a	Update lemmatizer test	2017-10-12 22:50:52 +02:00
Matthew Honnibal	462caf835a	Fix SBD test	2017-10-12 21:18:22 +02:00
Ines Montani	37aa523a8e	Merge pull request #1408 from explosion/feature/dot-underscore 💫 Custom attributes via Doc._, Token._ and Span._	2017-10-11 18:35:56 +02:00
ines	51519251c2	Fix underscore method test	2017-10-11 13:34:19 +02:00
ines	c6ae49e8bf	Fix formatting	2017-10-11 13:34:11 +02:00
ines	453c47ca24	Add German lemmatizer tests	2017-10-11 13:27:26 +02:00
ines	15fe0fd82d	Fix tests	2017-10-11 13:27:18 +02:00
ines	e0ff145a8b	Merge branch 'develop' into feature/dot-underscore	2017-10-11 11:57:05 +02:00
Matthew Honnibal	fd47f8e89f	Fix failing test	2017-10-11 08:38:34 +02:00
Matthew Honnibal	462b2e26b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-11 08:23:04 +02:00
Matthew Honnibal	2c118ab3a6	Add tests for Doc creation	2017-10-11 03:21:23 +02:00
Matthew Honnibal	d84136b4a9	Update add label test	2017-10-10 22:57:41 +02:00
Matthew Honnibal	e0a9b02b67	Merge Span._ and Span.as_doc methods	2017-10-09 22:00:15 -05:00
Matthew Honnibal	09d61ada5e	Merge pull request #1396 from explosion/feature/pipeline-management 💫 Improve pipeline and factory management	2017-10-10 04:29:54 +02:00
Matthew Honnibal	f0f2739ae3	Add test for serialization issue raised in #1105	2017-10-10 03:57:58 +02:00
ines	de374dc72a	Merge branch 'feature/pipeline-management' into feature/dot-underscore	2017-10-09 14:37:51 +02:00
Matthew Honnibal	2534cd57d7	Add bandaid solution to the 'shadowing' problem in #864	2017-10-09 08:59:35 +02:00
Matthew Honnibal	d8a2506023	Merge pull request #1401 from explosion/feature/add-parser-action 💫 Allow labels to be added to pre-trained parser and NER modes	2017-10-09 04:57:51 +02:00
Matthew Honnibal	689349e32f	Merge pull request #1400 from explosion/feature/sentence-parsing 💫 Force parser to respect preset sentence boundaries	2017-10-09 04:31:43 +02:00
Matthew Honnibal	fad2b8315f	Merge branch 'develop' into feature/add-parser-action	2017-10-09 04:13:04 +02:00
Matthew Honnibal	6c79841c0d	Fix tests for history features	2017-10-09 04:12:24 +02:00
Matthew Honnibal	dde87e6b0d	Add tests for adding parser actions	2017-10-09 03:42:35 +02:00
Matthew Honnibal	81a64119db	Fix string-to-unicode problem	2017-10-09 00:59:49 +02:00
Matthew Honnibal	02c2af7119	Fix test	2017-10-09 00:29:37 +02:00
Matthew Honnibal	5a67efeccc	Add tests for sentence segmentation presetting	2017-10-09 00:02:23 +02:00
Matthew Honnibal	9bd8191739	Add tests for Underscore	2017-10-07 18:56:19 +02:00
Matthew Honnibal	3b67eabfea	Allow empty dictionaries to match any token in Matcher Often patterns need to match "any token". A clean way to denote this is with the empty dict {}: this sets no constraints on the token, so should always match. The problem was that having attributes length==0 was used as an end-of-array signal, so the matcher didn't handle this case correctly. This patch compiles empty token spec dicts into a constraint NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the lexeme -- so this always matches.	2017-10-07 03:36:15 +02:00
ines	0adadcb3f0	Fix beam parse model test	2017-10-07 02:15:15 +02:00
ines	b38a8f4a94	Fix and update pipe methods tests	2017-10-07 02:06:23 +02:00
Matthew Honnibal	3a65a0c970	Start adding tests for new pipeline management	2017-10-07 01:48:23 +02:00
ines	61a503a611	Fix parser test	2017-10-07 00:38:51 +02:00
Matthew Honnibal	c6cd81f192	Wrap try/except around model saving	2017-10-05 08:14:24 -05:00
Matthew Honnibal	fd4baff475	Update tests	2017-10-05 08:12:27 -05:00
Matthew Honnibal	40edb65ee7	Make test work for Python 2.7	2017-10-04 16:36:50 +02:00
Matthew Honnibal	db05d4d582	Add test for #1380 . Passes without fix?	2017-10-04 14:56:31 +02:00
Matthew Honnibal	4a59f6358c	Fix thinc imports	2017-10-03 19:21:26 +02:00
Ines Montani	959c46eabe	Merge pull request #1365 from wannaphongcom/develop Add Thai language for spaCy v2	2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun	7b5263ffa4	fix thai test	2017-09-26 23:54:15 +07:00
Matthew Honnibal	41cc5c4c17	Merge branch 'develop' into feature/phrasematcher	2017-09-26 09:59:17 -05:00
Wannaphong Phatthiyaphaibun	5cba67146c	add thai in spacy2	2017-09-26 21:36:27 +07:00
Matthew Honnibal	74f08e1ad5	Update test	2017-09-26 06:45:56 -05:00
Matthew Honnibal	20193371f5	Don't share CNN, to reduce complexities	2017-09-21 14:59:48 +02:00
Matthew Honnibal	cc408fc189	Make PhraseMatcher API like Matcher API	2017-09-20 22:20:35 +02:00
Matthew Honnibal	43ad250dd5	Update matcher tests	2017-09-20 21:54:49 +02:00
Matthew Honnibal	c013e5996f	Fix parser test	2017-09-17 13:13:20 -05:00
ines	ece30c28a8	Don't split hyphenated words in German This way, the tokenizer matches the tokenization in German treebanks	2017-09-16 20:40:15 +02:00
Matthew Honnibal	ebf8942564	Fix test for Python3	2017-09-16 16:22:38 +02:00
Matthew Honnibal	8c945310fb	Excuse emoji failure on narrow unicode builds	2017-09-16 16:21:13 +02:00
Matthew Honnibal	3fa5b40b5c	Add test for hash consistency	2017-09-16 11:21:35 +02:00
Jim O'Regan	7de709483b	missed adding here	2017-09-11 10:51:21 +01:00
Jim O'Regan	b1b6123867	add ga_tokenizer	2017-09-11 10:31:41 +01:00
Jim O'Regan	187be6d372	copy/paste error	2017-09-11 09:33:17 +01:00
Jim O'Regan	c283e9edfe	first stab at test	2017-09-11 08:57:48 +01:00
Matthew Honnibal	456bb8a74c	Unxfail and close #1305	2017-09-06 19:14:17 +02:00
Matthew Honnibal	99e44fbdbb	Update regression test	2017-09-06 19:13:51 +02:00
Matthew Honnibal	497a9308a8	Xfail new lemmatizer test	2017-09-06 18:41:22 +02:00
Matthew Honnibal	5384fff5ce	Add test for 1305: Incorrect lemmatization of VBZ for English	2017-09-06 18:40:18 +02:00
Matthew Honnibal	d5fbf27335	Fix test	2017-09-04 16:45:11 +02:00
Matthew Honnibal	cb4839033c	Fix loader for EN tests	2017-09-04 15:19:18 +02:00
Matthew Honnibal	644d6c9e1a	Improve lemmatization tests, re #1296	2017-09-04 15:17:44 +02:00
Jim Geovedi	fbc62a09c7	added {pre,suf,in}fix tests	2017-08-20 13:43:00 +07:00
Jim Geovedi	713d7c0aa0	added indonesian lang test	2017-08-20 12:17:14 +07:00
Jim Geovedi	fa544e6c9a	Merge remote-tracking branch 'upstream/develop' into indonesian	2017-08-20 11:49:40 +07:00
Matthew Honnibal	41c2218c53	Fix test for vectors	2017-08-19 22:09:12 +02:00
Matthew Honnibal	ef87562741	Restore vectors test utils	2017-08-19 20:35:16 +02:00
Matthew Honnibal	1391f9da37	Restore vectors tests	2017-08-19 20:34:58 +02:00
Matthew Honnibal	d55d6e1cfa	Fix comparison of Token from different docs. Closes #1257	2017-08-19 16:39:32 +02:00
Matthew Honnibal	4fda02c7e6	Add test for new Span.to_array method	2017-08-19 16:24:38 +02:00
Matthew Honnibal	c606b4a42c	Add test for Doc.char_span	2017-08-19 16:18:23 +02:00
Matthew Honnibal	42d47c1e5c	Fix tagger serialization	2017-08-19 04:16:32 +02:00
Matthew Honnibal	2da96a0ec7	Fix beam test	2017-08-19 04:15:46 +02:00
Matthew Honnibal	a7309a217d	Update tagger serialization	2017-08-18 23:12:05 +02:00
Matthew Honnibal	de7e8703e3	Restore tests for beam parser	2017-08-18 22:27:42 +02:00
Matthew Honnibal	52c180ecf5	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `ea8de11ad5`, reversing changes made to `08e443e083`.	2017-08-14 13:00:23 +02:00
Matthew Honnibal	92ebab6073	Update beam-update tests	2017-08-13 08:56:02 +02:00
Matthew Honnibal	24b45b45c6	Add test for beam update	2017-08-12 17:15:28 -05:00
Matthew Honnibal	b353e4d843	Work on parser beam training	2017-08-12 14:47:45 -05:00
Jim Geovedi	cc4772cac2	reworks	2017-08-03 13:08:38 +07:00
Jim Geovedi	783f7d8b86	added test set for Indonesian language	2017-07-29 18:21:07 +07:00
Matthew Honnibal	d6a5c2c85a	Add test for NER	2017-07-22 01:48:58 +02:00
Matthew Honnibal	28244df4da	Add test for beam parsing	2017-07-22 01:48:35 +02:00
Matthew Honnibal	2424493970	Remove unnecessary import of Mock	2017-07-22 01:13:54 +02:00
Matthew Honnibal	289f23df51	Test beam parsing	2017-07-20 15:03:10 +02:00
Matthew Honnibal	f014138c11	Fix parser tests	2017-07-20 00:16:52 +02:00
mollerhoj	e840077601	Add some basic tests for Danish	2017-07-03 15:49:51 +02:00
ines	34a2eecb17	Add simple "naughty strings" test (see #1107 )	2017-06-06 17:43:51 +02:00
ines	cc9c5dc7a3	Fix noun chunks test	2017-06-05 16:39:04 +02:00
Matthew Honnibal	b4cdd05466	Add vectors.pyx in setup	2017-06-05 12:45:29 +02:00
Matthew Honnibal	30369d580f	Start testing Vectors class	2017-06-05 12:32:49 +02:00
ines	51d7414e94	Make sure sents are a list	2017-06-05 12:30:13 +02:00
ines	a0f4592f0a	Update tests	2017-06-05 02:26:13 +02:00
ines	3e105bcd36	Update tests	2017-06-05 02:09:27 +02:00
ines	078232932c	Fix tokenizer fixture scope	2017-06-05 01:06:34 +02:00
Matthew Honnibal	58be0e1f6f	Update tests	2017-06-04 16:35:06 -05:00
Matthew Honnibal	bb98d45a63	Fix tests	2017-06-04 16:00:44 -05:00
Matthew Honnibal	55d0621532	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-06-04 15:53:25 -05:00
Matthew Honnibal	5b9f116aca	Update tests	2017-06-04 15:53:17 -05:00
ines	8a29308d0b	Remove unused imports	2017-06-04 22:39:29 +02:00
Ines Montani	112c5787eb	Merge pull request #1101 from oroszgy/hu_tokenizer_fix More robust Hungarian tokenizer.	2017-06-04 22:37:51 +02:00
ines	96867a24ae	Fix typo	2017-06-04 22:36:40 +02:00
ines	f432bb4b48	Fix fixture scopes	2017-06-04 22:34:31 +02:00
ines	a66cf24ee8	xfail tokenizer serialization tests for now Tests pass locally, but not on Travis – needs more investigation	2017-06-04 13:58:20 +02:00
ines	e47eef5e03	Update German tokenizer exceptions and tests	2017-06-03 21:07:44 +02:00
ines	d77c2cc8bb	Add tests for English norm exceptions	2017-06-03 20:59:50 +02:00
ines	3152ee5ca2	Update serialization tests for tokenizer	2017-06-03 17:05:28 +02:00
ines	1ebd0d3f27	Add assert_packed_msg_equal util function	2017-06-03 17:04:30 +02:00
ines	de974f7bef	Add serializer tests for tokenizer	2017-06-03 13:26:34 +02:00
ines	d21459f87d	Update serializer tests	2017-06-02 21:42:26 +02:00
ines	d86e7cde93	Add entity recognizer to parser serialization tests	2017-06-02 18:40:06 +02:00
ines	0051c05964	Add tests for serializing parser	2017-06-02 18:37:19 +02:00
ines	cef547a9f0	Add serialization tests for tensorizer	2017-06-02 18:18:30 +02:00
ines	f74a45c1fe	Remove unnecessary argument	2017-06-02 18:17:46 +02:00
ines	43b4d63f85	Add serialization tests for tagger	2017-06-02 17:29:34 +02:00
ines	acd65c00f6	Add serialization tests for StringStore and Vocab	2017-06-02 10:57:42 +02:00
ines	9692c98f57	Add test utils for temp file and temp dir	2017-06-02 10:56:09 +02:00
Matthew Honnibal	4c97371051	Fixes for thinc 6.7	2017-06-01 04:22:16 -05:00
Gyorgy Orosz	f0c3b09242	More robust Hungarian tokenizer.	2017-05-31 22:28:40 +02:00
ines	5e1c361270	Update tests README with info on model tests	2017-05-31 12:22:58 +02:00
Ines Montani	e6cf3c7e1c	Merge pull request #1093 from oroszgy/hu_emoji_fix Fixed emoji handling for Hungarian	2017-05-31 11:33:24 +02:00
Matthew Honnibal	6937e311a4	Update doc tests	2017-05-30 23:34:23 +02:00
Gyorgy Orosz	8c0b4b850e	Fixed emoji handling for Hungarian	2017-05-30 21:34:46 +02:00
Matthew Honnibal	b127645afc	Fix test_misc merge conflict	2017-05-29 18:31:44 -05:00
Matthew Honnibal	e0e8eae7c7	Tweak package test	2017-05-29 18:30:42 -05:00
ines	20a7003c0d	Update model fixtures and reorganise tests	2017-05-29 22:14:31 +02:00
ines	795fe43a4d	Add load_test_model function with importorskip() Loads model only if it can be imported, i.e. if it's installed as a package.	2017-05-29 22:11:31 +02:00
ines	6e3937efc5	Check for arguments of model markers to specify models to test Lets user set --models --en for only English models	2017-05-29 22:10:16 +02:00
Matthew Honnibal	f4aafca222	Merge changes to test_misc	2017-05-29 12:26:02 +02:00
Matthew Honnibal	ff26aa6c37	Work on to/from bytes/disk serialization methods	2017-05-29 11:45:45 +02:00
ines	df920ba0e7	Add tests for displaCy and util functions and fix util typo	2017-05-29 10:51:19 +02:00
ines	c5714d4fb2	xfail matcher test for now until setting norm via Span.merge works	2017-05-29 10:51:02 +02:00
Matthew Honnibal	c91b121aeb	Move serialization functions to util	2017-05-29 10:13:42 +02:00
Matthew Honnibal	1fa2bfb600	Add model_to_bytes and model_from_bytes helpers. Probably belong in thinc.	2017-05-29 09:27:04 +02:00
Matthew Honnibal	6dad4117ad	Work on serialization for models	2017-05-29 01:37:57 +02:00
ines	7b1ddcc04d	Add test for vocab serialization	2017-05-29 01:09:52 +02:00
ines	00b2094dc3	Fix typos, long integers and tests	2017-05-29 01:09:52 +02:00
ines	804dbb8d25	Add StringStore test for API docs	2017-05-29 01:09:52 +02:00
Matthew Honnibal	92dbf28c1e	Hack a fixture in the vectors tests, for xfail	2017-05-28 20:28:32 +02:00
Matthew Honnibal	fe11564b8e	Finish stringstore change. Also xfail vectors tests	2017-05-28 15:10:22 +02:00
Matthew Honnibal	b007a2b0d3	Update stringstore tests	2017-05-28 14:08:09 +02:00
Matthew Honnibal	84e66ca6d4	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
Matthew Honnibal	fe4a746300	Accomodate symbols in new string scheme	2017-05-28 13:03:16 +02:00
Matthew Honnibal	a5606c3eda	Work on changing StringStore to return hashes.	2017-05-28 12:36:27 +02:00
ines	a8e58e04ef	Add symbols class to punctuation rules to handle emoji (see #1088 ) Currently doesn't work for Hungarian, because of conflicts with the custom punctuation rules. Also doesn't take multi-character emoji like 👩🏽‍💻 into account.	2017-05-27 17:57:10 +02:00
Matthew Honnibal	4917cbb484	Include sent_start test	2017-05-23 18:40:37 +02:00
ines	fb0ff0272f	xfail neural parser tests for now and remove test for deprecated method	2017-05-23 12:40:37 +02:00
Matthew Honnibal	5418bcf5d7	Resolve conflict on test	2017-05-23 04:37:16 -05:00
ines	e6acd3bbf2	Fix matcher tests and matcher docs	2017-05-23 11:36:02 +02:00
ines	d0c6d4f76d	Fix formatting	2017-05-23 11:32:00 +02:00
Matthew Honnibal	3959d778ac	Revert "Revert "WIP on improving parser efficiency"" This reverts commit `532afef4a8`.	2017-05-23 03:06:53 -05:00
Matthew Honnibal	532afef4a8	Revert "WIP on improving parser efficiency" This reverts commit `bdaac7ab44`.	2017-05-23 03:05:25 -05:00
Matthew Honnibal	bdaac7ab44	WIP on improving parser efficiency	2017-05-23 02:59:31 -05:00
ines	b3c7ee0148	Fix tests and use the new Matcher API	2017-05-22 13:54:20 +02:00
Matthew Honnibal	187f370734	Update tests for matcher changes	2017-05-22 12:59:50 +02:00
Matthew Honnibal	7e2cdc0c81	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-05-22 12:39:34 +02:00
Matthew Honnibal	2f78413a02	PseudoProjectivity->nonproj	2017-05-22 05:39:03 -05:00
Matthew Honnibal	d8bb5bb959	Implement StringStore serialization, and update tests	2017-05-22 12:38:00 +02:00
Matthew Honnibal	5db89053aa	Merge docstrings	2017-05-21 13:46:23 -05:00
Matthew Honnibal	836fe1d880	Update neural net tests	2017-05-19 18:11:29 -05:00
ines	a804045597	Use is_ancestor instead of deprecated is_ancestor_of	2017-05-19 20:23:40 +02:00
Matthew Honnibal	793430aa7a	Get spaCy train command working with neural network * Integrate models into pipeline * Add basic serialization (maybe incorrect) * Fix pickle on vocab	2017-05-17 12:04:50 +02:00
Matthew Honnibal	c9a5d5d24b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-05-16 16:22:05 +02:00
Matthew Honnibal	8cf097ca88	Redesign training to integrate NN components * Obsolete .parser, .entity etc names in favour of .pipeline * Components no longer create models on initialization * Models created by loading method (from_disk(), from_bytes() etc), or .begin_training() * Add .predict(), .set_annotations() methods in components * Pass state through pipeline, to allow components to share information more flexibly.	2017-05-16 16:17:30 +02:00
Matthew Honnibal	221b4c1ee8	Fix test for Python 3	2017-05-16 13:06:30 +02:00
Matthew Honnibal	1d7c18e58a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-05-15 21:53:47 +02:00
Matthew Honnibal	a9edb3aa1d	Improve integration of NN parser, to support unified training API	2017-05-15 21:53:27 +02:00
ines	b462076d80	Merge load_lang_class and get_lang_class	2017-05-14 01:31:10 +02:00
ines	5858857a78	Update languages list in conftest	2017-05-13 15:37:54 +02:00
ines	8c2a0c026d	Fix parse_tree test	2017-05-13 12:32:45 +02:00
Matthew Honnibal	ee1d35bdb0	Fix merge conflict	2017-05-13 03:20:19 +02:00
Matthew Honnibal	b2540d2379	Merge Kengz's tree_print patch	2017-05-13 03:18:49 +02:00
Matthew Honnibal	7253b4e649	Remove old serialization tests	2017-05-09 18:12:58 +02:00
Matthew Honnibal	f9327343ce	Start updating serializer test	2017-05-09 18:12:03 +02:00
ines	2c3bdd09b1	Add English test for like_num	2017-05-09 11:06:34 +02:00
ines	22375eafb0	Fix and merge attrs and lex_attrs tests	2017-05-09 11:06:25 +02:00
ines	c714841cc8	Move language-specific tests to tests/lang	2017-05-09 00:02:37 +02:00
ines	bd57b611cc	Update conftest to lazy load languages	2017-05-09 00:02:21 +02:00
ines	3c0f85de8e	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
ines	be5541bd16	Fix import and tokenizer exceptions	2017-05-08 16:20:14 +02:00
ines	2324788970	Remove bad tests	2017-05-08 16:15:27 +02:00
Gregory Howard	c0afcd22bb	Merge remote-tracking branch 'remotes/upstream/master'	2017-04-27 14:42:54 +02:00
Gregory Howard	8ff4682255	correcting tokenizer exception. Adding tests for lemmatization	2017-04-27 11:52:14 +02:00
Ines Montani	7da9cefd25	Merge pull request #1022 from luvogels/master Initial support for Norwegian Bokmål	2017-04-27 11:16:06 +02:00
Gregory Howard	44cb486849	Adding unitest for tokenization in french (with title)	2017-04-27 10:59:38 +02:00
luvogels	d12a0b6431	Hooked up tokenizer tests	2017-04-26 23:21:41 +02:00
luvogels	8de59ce3b9	Added tokenizer tests	2017-04-26 19:10:18 +02:00
Matthew Honnibal	4d98511db7	Make Span hashable. Closes #1019	2017-04-26 19:01:05 +02:00
Matthew Honnibal	24c4c51f13	Try to make test999 less flakey	2017-04-26 18:42:06 +02:00
Gregory Howard	ed5f094451	Adding insensitive lemmatisation test	2017-04-25 18:07:02 +02:00
ghoward	26e31afc18	renamming tests	2017-04-25 17:46:01 +02:00
ghoward	c085c2d391	Adding some unitests	2017-04-25 17:44:16 +02:00
Matthew Honnibal	c4be9c36fe	Fix unicode header in tests	2017-04-24 10:09:01 +02:00
Matthew Honnibal	65f10b53e5	Fix test	2017-04-24 00:25:55 +02:00
Matthew Honnibal	70a43858e1	Fix flakey test	2017-04-24 00:06:30 +02:00
Matthew Honnibal	3973af2d15	Make training test less flakey	2017-04-23 22:59:34 +02:00
ines	42305bc519	Remove unnecessary test	2017-04-23 21:21:41 +02:00
ines	012ea594d1	Add file for misc tests	2017-04-23 21:06:51 +02:00
ines	83f66947dc	Rename test_download to test_cli	2017-04-23 21:06:50 +02:00
Matthew Honnibal	874a3cbb07	Add test for Issue #955	2017-04-23 17:57:01 +02:00
Matthew Honnibal	5d8af40445	Add test for Issue #999	2017-04-23 17:06:30 +02:00
Matthew Honnibal	040751ad17	Remove xfail on Test #910	2017-04-23 16:28:55 +02:00
Ben Eyal	e90e8a3f10	Enable test	2017-04-20 02:25:24 +03:00
ines	2bd89e7ade	Tidy up Hebrew tests and test for punctuation (see #995 )	2017-04-19 19:28:03 +02:00
ines	13d30b6c01	xfail lemmatizer test that's causing problems (see #546 )	2017-04-16 21:18:39 +02:00
ines	0084466a66	Remove unused utf8open util and replace os.path with ensure_path	2017-04-16 20:37:45 +02:00
Matthew Honnibal	1dca7eeb03	Add unicode declaration on new regression test	2017-04-07 18:09:23 +02:00
ines	887827fc6a	Merge branch 'develop'	2017-04-07 17:36:23 +02:00
ines	444dd511c5	Fix xpassing URL test case	2017-04-07 17:36:05 +02:00
ines	bf0f15e762	Add / to tokenizer infixes (resolves #891 )	2017-04-07 17:30:44 +02:00
ines	00b9011a49	Fix whitespace	2017-04-07 17:29:59 +02:00
Matthew Honnibal	0513c43bf0	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-07 17:07:10 +02:00
Matthew Honnibal	cc36c308f4	Fix noun_chunk rules around coordination Closes #693.	2017-04-07 17:06:40 +02:00
Matthew Honnibal	ab846256cf	Merge pull request #966 from recognai/master Prepare Spanish language for training models, including configuration, rich-UD tag map and tests	2017-04-07 16:12:29 +02:00
Matthew Honnibal	83dca920d4	Rename test #913 -> #957 , comment Make test for #957 reference correct bug. Add comment. Previous commit closes #957.	2017-04-07 15:54:25 +02:00
Matthew Honnibal	5887383fc0	Add test for Issue #913 : Hang from bad regex	2017-04-07 15:47:27 +02:00
oeg	c693d40791	feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests	2017-04-06 18:48:45 +02:00
Matthew Honnibal	cfff4e0f61	Improve test	2017-03-31 13:59:32 +02:00
Matthew Honnibal	e854f28304	Add test for Issue #758 Issue #758 occurs when no actions are available for a single token doc after merging.	2017-03-31 13:26:25 +02:00
Matthew Honnibal	0fefdfcbda	Merge pull request #935 from ericzhao28/master Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862)	2017-03-30 02:51:24 +02:00
Eric Zhao	aafdf6ffb8	Add option to use label karg to determine ent_type in doc.merge	2017-03-28 23:35:03 -07:00
Matthew Honnibal	b94286de30	Fix regression test	2017-03-25 22:35:07 +01:00
Matthew Honnibal	4f400fa486	Prevent lemmatization of base nouns Update lemmatizer's base-form check, for change in morphology class. Closes #903.	2017-03-25 21:51:12 +01:00
Matthew Honnibal	4454c1b23f	Block lemmatization of base-form adjectives Fixes check that an adjective is a base form (as opposed to a comparative or superlative), so that it's not lemmatized. e.g. inner -!> inn. Closes #912.	2017-03-25 21:29:57 +01:00
Ines Montani	97cb4d5e3c	Merge branch 'master' into master	2017-03-25 10:03:47 +01:00
Iddo Berger	da135bd823	add hebrew tokenizer	2017-03-24 18:27:44 +03:00
Matthew Honnibal	f40fbc3710	Add test for Issue #910 : Resuming entity training	2017-03-23 23:38:57 +01:00
ines	f830213c4c	Remove compatibility check test Will only cause problems when incrementing version and not updating table. Also depends on external URL, which is bad.	2017-03-20 13:20:26 +01:00
Ines Montani	b6ee241e26	Fix print statements	2017-03-20 11:46:37 +01:00
ines	fe0ff00fe1	Fix spacing	2017-03-19 11:55:37 +01:00
ines	5712da6095	Add regression test for #891	2017-03-19 11:48:01 +01:00
ines	aefb898e37	Add title-case version of morph rules (resolves #686 )	2017-03-18 17:27:11 +01:00
ines	64ec17abc1	Pass xpassing tests and add xfails for failures	2017-03-18 17:20:46 +01:00
ines	d0b85faf69	Pass regression test for #401 (resolves #401 ) Fixed in new English models.	2017-03-18 17:06:49 +01:00
ines	be9daefbdd	Remove actual model downloading from tests	2017-03-18 17:01:10 +01:00
Matthew Honnibal	de0e6385b4	Merge branch 'master' of https://github.com/explosion/spaCy	2017-03-18 16:17:28 +01:00
Matthew Honnibal	fe442cac53	Fix #717 : Set correct lemma for contracted verbs	2017-03-18 16:16:10 +01:00
ines	ad934a9abd	Add regression test for #693	2017-03-18 16:12:30 +01:00
ines	f57c616830	Add regression test for #704 and test new model (resolves #704 ) (using new English model)	2017-03-18 16:04:14 +01:00
Matthew Honnibal	413138de79	Fix #719 : Lemmatizer can no longer output empty string	2017-03-18 16:02:06 +01:00
ines	ab1451f997	Don't mark compatibility test as slow	2017-03-18 15:17:39 +01:00
ines	ec3e810662	Add directory cli and set up command line interface	2017-03-18 15:14:48 +01:00
Matthew Honnibal	6420f86f02	Merge changes to __init__.py	2017-03-17 19:51:45 +01:00
ines	0e533ad0cc	Mark compatibility table test as slow (temporary) Prevent Travis from running test test until models repo is published	2017-03-17 13:11:36 +01:00
Matthew Honnibal	a630726b13	Fix typo in tests	2017-03-16 20:50:36 -05:00
Matthew Honnibal	f98b30583f	Fix tests	2017-03-16 19:48:00 -05:00
Matthew Honnibal	db51abf685	Fix tests	2017-03-16 18:53:47 -05:00
Matthew Honnibal	fea9fe08af	Merge pull request #866 from juanmirocks/master Fix lemmatization of OOV words	2017-03-16 23:37:36 +01:00
Matthew Honnibal	28bb546939	Merge pull request #883 from ericzhao28/master Add `lower_` and `upper_` properties to `Span` class	2017-03-16 23:35:47 +01:00
Matthew Honnibal	8843b84bd1	Merge remote-tracking branch 'origin/develop-downloads'	2017-03-16 12:00:42 -05:00
ines	4cfc8ffbd2	Reformat pickle tests	2017-03-15 17:39:54 +01:00
ines	2a0fcf1354	Add tests for new download module	2017-03-15 17:39:43 +01:00
Matthew Honnibal	4cab8ac136	Update morph exceptions test	2017-03-15 09:31:34 -05:00
ines	42ba740dde	Revert "Merge branch 'debug'" This reverts commit `89b79d1178`, reversing changes made to `02bdf490a1`.	2017-03-13 20:11:52 +01:00
ines	4c5f51e49e	Update regression test	2017-03-13 15:16:11 +01:00
ines	02bdf490a1	Remove regression test to see if it caused pytest Travis error	2017-03-13 13:00:22 +01:00
ines	17018750ac	Add regression test for #717	2017-03-13 12:58:22 +01:00
ines	2883ebfca2	Remove print statement	2017-03-13 12:30:42 +01:00
ines	98c13d8aa9	Add regression test for #401	2017-03-13 12:28:41 +01:00
ines	444d665f9d	Add regression test for #686	2017-03-13 12:23:35 +01:00
ines	46b17e5b51	Add regression test for #719	2017-03-13 12:17:35 +01:00
ines	c8ae682ff9	Add regression test for #636	2017-03-13 12:08:31 +01:00
ines	337f9601f2	Add missing unicode declaration	2017-03-13 12:08:19 +01:00
ines	d70386ec6e	Update docstring in #886 regression test	2017-03-13 12:00:38 +01:00
ines	51ba3ef0a8	Add regression test for #886	2017-03-13 11:44:58 +01:00
ines	1da29a7146	Use new Lemmatizer data and remove file import Since there's currently only an English lemmatizer, the global Lemmatizer imports from spacy.en. This is unideal and still needs to be fixed.	2017-03-12 13:58:22 +01:00
ines	c89e30d1a3	Add test for English time exceptions ("1a.m." etc.)	2017-03-12 13:58:22 +01:00
ines	66c1f194f9	Use consistent unicode declarations	2017-03-12 13:07:28 +01:00
Em	9c809efc25	Removed mapStr	2017-03-11 16:23:26 -08:00
Matthew Honnibal	ea2592879f	Merge branch 'master' of https://github.com/explosion/spaCy	2017-03-11 11:13:37 -06:00
Em	426d17167f	Added string manipulation for spans	2017-03-10 16:50:02 -08:00
ines	10e29189ac	Adjust URL testcases and xfail problems (instead of comment)	2017-03-10 14:22:50 +01:00
Matthew Honnibal	ea53647362	Merge branch 'develop'	2017-03-10 02:49:39 -06:00
Dan Rapp	123d3f2d38	Fix error in test case parameterization	2017-03-09 12:18:21 -07:00
Dan Rapp	b9307dfcd7	Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix	2017-03-09 11:42:14 -07:00
Dan Rapp	3b1df3808d	Issue #840 - URL pattenr too broad	2017-03-09 11:39:39 -07:00
Matthew Honnibal	5b0b968d13	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-03-08 15:03:10 +01:00
Matthew Honnibal	0ac3d27689	Fix handling of trailing whitespace Fix off-by-one error that meant trailing spaces were being dropped. Closes #792	2017-03-08 15:01:40 +01:00
ines	c2e3e651b8	Re-add regression test for #859	2017-03-08 14:36:09 +01:00
Matthew Honnibal	16670d3251	Xfail the vocab pickling for now	2017-03-07 21:43:28 +01:00
Matthew Honnibal	a89c3500f6	Fixes to hacky vocab pickling	2017-03-07 20:58:55 +01:00
Matthew Honnibal	3edb8ae207	Whitespace	2017-03-07 17:16:26 +01:00
Matthew Honnibal	5de7e712b7	Add support for pickling StringStore.	2017-03-07 17:15:18 +01:00
Matthew Honnibal	4e75e74247	Update regression test for variable-length pattern problem in the matcher.	2017-03-07 16:08:32 +01:00
Matthew Honnibal	6d67213b80	Add test for 850: Matcher fails on zero-or-more.	2017-03-07 15:55:28 +01:00
Aniruddha Adhikary	696215a3fb	add tests for Bengali	2017-03-05 11:25:12 +06:00
ines	8dff040032	Revert "Add regression test for #859 " This reverts commit `c4f16c66d1`.	2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela	a8cfde46d3	#781 Fix test — colocalizes is lemmatized to colocaliz and colicalize	2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela	a471114eb2	#781 add regression test, failing previous bug fix	2017-03-01 21:30:51 +01:00
ines	c4f16c66d1	Add regression test for #859	2017-03-01 16:07:27 +01:00
Matthew Honnibal	34bcc8706d	Merge branch 'french-tokenizer-exceptions'	2017-02-27 11:21:21 +01:00
Matthew Honnibal	0aaa546435	Fix test after updating the French tokenizer stuff	2017-02-27 11:20:47 +01:00
ines	376c5813a7	Remove print statements from test	2017-02-24 18:26:32 +01:00
ines	7c1260e98c	Add regression test	2017-02-24 18:22:49 +01:00
ines	51eb190ef4	Remove print statements from test	2017-02-24 17:41:12 +01:00
Matthew Honnibal	db5ada3995	Merge branch 'master' of https://github.com/explosion/spaCy	2017-02-24 14:28:12 +01:00
Matthew Honnibal	8f94897d07	Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766	2017-02-24 14:27:02 +01:00
ines	67991b6e5f	Add more test cases to #775 regression test to cover #847	2017-02-18 14:10:44 +01:00
ines	44de3c7642	Reformat test and use text_file fixture	2017-02-16 23:49:19 +01:00
ines	3dd22e9c88	Mark vectors test as xfail (temporary)	2017-02-16 23:28:51 +01:00
ines	85d249d451	Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )"" This reverts commit `ea05f78660`.	2017-02-16 23:26:25 +01:00
ines	ea05f78660	Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )" This reverts commit `7d8c9eee7f`, reversing changes made to `f6b69babcc`.	2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque	06a71d22df	Fix test failure by using unicode literals	2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque	3ba109622c	Add regression test with non ' ' space character as token	2017-02-16 12:23:27 +01:00
ines	21f09d10d7	Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"" This reverts commit `f02a2f9322`.	2017-02-10 13:17:05 +01:00
ines	f02a2f9322	Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions" This reverts commit `b95afdf39c`, reversing changes made to `b0ccf32378`.	2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque	309da78bf0	Merge branch 'master' into tokenizer_exceptions	2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque	4ce0bbc6b6	Update unit tests	2017-02-09 16:30:43 +01:00
ines	654fe447b1	Add Swedish tokenizer tests (see #807 )	2017-02-05 11:47:07 +01:00
Michael Wallin	35100c8bdd	[issue 805] Add regression test and the required fixture	2017-02-04 16:21:34 +02:00
Michael Wallin	1a1952afa5	[finnish] Add initial tests for tokenizer	2017-02-04 13:54:10 +02:00
Ines Montani	afc6365388	Update regression test for #801 to match current expected behaviour	2017-02-02 16:23:05 +01:00
Ines Montani	13a4ab37e0	Add regression test for #801	2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque	85f951ca99	Add tokenizer exceptions for French	2017-02-02 08:36:16 +01:00
Ines Montani	e4875834fe	Fix formatting	2017-01-31 15:19:33 +01:00
Ines Montani	c304834e45	Add missing import	2017-01-31 15:18:30 +01:00
Ines Montani	e6465b9ca3	Parametrize test cases and mark as xfail	2017-01-31 15:14:42 +01:00
latkins	e4c84321a5	Added regression test for Issue #792 .	2017-01-31 13:47:42 +00:00
Ines Montani	19501f3340	Add regression test for #775	2017-01-25 13:16:52 +01:00
Raphaël Bournhonesque	1be9c0e724	Add fr tokenization unit tests	2017-01-24 10:57:37 +01:00
Ines Montani	0967eb07be	Add regression test for #768	2017-01-23 21:25:46 +01:00
Ines Montani	5f6f48e734	Add regression test for #759	2017-01-20 15:11:48 +01:00
Ines Montani	d704cfa60d	Fix typo	2017-01-16 21:30:33 +01:00
Matthew Honnibal	2c60d0cb1e	Test #743 : Tokens unhashable.	2017-01-16 13:27:26 +01:00
Ines Montani	50878ef598	Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744 )	2017-01-16 13:10:38 +01:00
Ines Montani	e053c7693b	Fix formatting	2017-01-16 13:09:52 +01:00
Ines Montani	116c675c3c	Merge pull request #742 from oroszgy/hu_tokenizer_fix Improved Hungarian tokenizer	2017-01-14 23:52:44 +01:00
Gyorgy Orosz	92345b6a41	Further numeric test.	2017-01-14 22:44:19 +01:00
Gyorgy Orosz	b4df202bfa	Better error handling	2017-01-14 22:24:58 +01:00
Gyorgy Orosz	b03a46792c	Better error handling	2017-01-14 22:09:29 +01:00
Ines Montani	332ce2d758	Update README.md	2017-01-14 21:12:11 +01:00
Gyorgy Orosz	9505c6a72b	Passing all old tests.	2017-01-14 20:39:21 +01:00
Gyorgy Orosz	63037e79af	Fixed hyphen handling in the Hungarian tokenizer.	2017-01-14 16:30:11 +01:00
Gyorgy Orosz	f77c0284d6	Maintaining compatibility with other spacy tokenizers.	2017-01-14 16:19:15 +01:00
Gyorgy Orosz	1be5da1ac6	Fixed Hungarian tokenizer for numbers	2017-01-14 15:51:59 +01:00
Ines Montani	a89e269a5a	Fix test formatting and consistency	2017-01-14 13:41:19 +01:00
Ines Montani	3424e3a7e5	Update README.md	2017-01-13 15:54:54 +01:00
Ines Montani	49186b34a1	Mark lemmatizer tests as models since they use installed data	2017-01-13 15:12:07 +01:00
Ines Montani	138deb80a1	Modernise vector tests, use add_vecs_to_vocab and don't depend on models	2017-01-13 15:12:07 +01:00
Ines Montani	96f0caa28a	Fix test name for consistency	2017-01-13 15:12:07 +01:00
Ines Montani	dc2bb1259f	Add util function to add vectors to vocab	2017-01-13 15:12:07 +01:00
Ines Montani	db9b25663d	Reformat add_docs_equal and add docstring	2017-01-13 15:12:07 +01:00
Ines Montani	62ce0a0073	Add README.md to tests to explain organisation and conventions	2017-01-13 15:11:18 +01:00
Ines Montani	38d60f6b90	Modernise serializer I/O tests and don't depend on models where possible	2017-01-13 02:24:56 +01:00
Ines Montani	4bb5b89ee4	Add text_file_b fixture using BytesIO	2017-01-13 02:23:50 +01:00
Ines Montani	49febd8c62	Modernise noun chunks tests and don't depend on models	2017-01-13 02:01:00 +01:00
Ines Montani	3ee97b5686	Rename test_parser to test_noun_chunks	2017-01-13 01:36:33 +01:00
Ines Montani	a308703f47	Remove old tests	2017-01-13 01:34:48 +01:00
Ines Montani	12eb8edf26	Move parser tests from unit to parser	2017-01-13 01:34:38 +01:00
Ines Montani	138c53ff2e	Merge tokenizer tests	2017-01-13 01:34:14 +01:00
Ines Montani	01f36ca3ff	Move attrs tests from unit to root and modernise	2017-01-13 01:33:50 +01:00
Ines Montani	3610d27967	Move alignment tests from munge to gold and modernise	2017-01-13 01:33:31 +01:00
Ines Montani	094ff7396a	Reformat and rename Pragmatic Segmenter tests and mark xfails	2017-01-13 01:30:20 +01:00
Ines Montani	affcf1b19d	Modernise lemmatizer tests	2017-01-12 23:41:17 +01:00
Ines Montani	33d9cf87f9	Modernise tagger tests and fix xpassing test	2017-01-12 23:40:52 +01:00
Ines Montani	33e5f8dc2e	Create basic and extended test set for URLs	2017-01-12 23:40:02 +01:00
Ines Montani	5e4f5ebfc8	Modernise BILUO tests	2017-01-12 23:39:18 +01:00
Ines Montani	09acfbca01	Add Lemmatizer fixture	2017-01-12 23:38:55 +01:00
Ines Montani	514bfa2597	Add path fixture for spaCy data path	2017-01-12 23:38:47 +01:00
Ines Montani	e9e99a5670	Add regression test for #740	2017-01-12 22:57:38 +01:00
Ines Montani	6935d55409	Fix formatting	2017-01-12 22:56:20 +01:00
Ines Montani	5f0d196a31	Modernise and merge matcher tests	2017-01-12 22:23:11 +01:00
Ines Montani	d5d774413a	Update comments on EN and DE fixtures	2017-01-12 22:03:07 +01:00
Ines Montani	9b4bea1df9	Tidy up and rename regression tests and remove unnecessary imports	2017-01-12 22:00:37 +01:00
Ines Montani	5e1b6178e3	Fix formatting and consistency	2017-01-12 22:00:06 +01:00
Ines Montani	a3fd32455e	Remove redundant language loading integration tests	2017-01-12 21:59:48 +01:00
Ines Montani	61f1ca09c2	Modernise serializer codecs tests	2017-01-12 21:58:55 +01:00
Ines Montani	5dbc6e59f6	Modernise Huffman tests	2017-01-12 21:58:40 +01:00
Ines Montani	edeeeccea5	Modernise packer tests and don't depend on models where possible	2017-01-12 21:58:07 +01:00
Ines Montani	d084676cd0	Modernise and merge serialization tests	2017-01-12 21:57:19 +01:00
Ines Montani	442237787c	Add assert_docs_equal util to compare two docs	2017-01-12 21:56:52 +01:00
Ines Montani	eac3f700fb	Add fixture for entity recognizer	2017-01-12 21:56:32 +01:00
Ines Montani	b438cfddbc	Modernise matcher tests and split into two files	2017-01-12 17:51:46 +01:00
Ines Montani	27482ebed8	Move matcher tests for #188 and #242 to regression tests Modernise tests and remove unnecessary imports	2017-01-12 17:33:57 +01:00
Ines Montani	0a4dc632bd	Update test to not create redundant Doc object	2017-01-12 17:33:18 +01:00
Ines Montani	a2526e66d8	Fix formatting, naming and unicode declaration	2017-01-12 16:51:13 +01:00
Ines Montani	052cdff07d	Modernise vector similarity tests	2017-01-12 16:51:13 +01:00
Ines Montani	bd20ec0a6a	Add get_cosine util function	2017-01-12 16:51:13 +01:00
Ines Montani	51ef75f629	Fix regression test for #615 and remove unnecessary imports	2017-01-12 16:51:12 +01:00
Ines Montani	aeb747e10c	Adjust formatting	2017-01-12 16:51:12 +01:00
Ines Montani	8e3e58a7e6	Modernise and merge lexeme vocab tests	2017-01-12 16:51:12 +01:00
Ines Montani	c3d4516fc2	Move test for #361 to regression tests	2017-01-12 16:51:12 +01:00
Ines Montani	7cb3d74426	Modernise span tests and don't depend on models	2017-01-12 15:30:49 +01:00
Ines Montani	92e3d8b3ee	Modernise vocab API tests and remove old xfailing tests	2017-01-12 15:27:46 +01:00
Ines Montani	7ea87684cd	Rename test_vocab.py to test_vocab_api.py	2017-01-12 15:12:21 +01:00
Ines Montani	0da2ee5c68	Merge flag features tests into orth tests in tests root	2017-01-12 15:12:00 +01:00
Ines Montani	03c136cfd3	Remove StringStore tests from vocab tests	2017-01-12 15:11:15 +01:00
Ines Montani	d7bd57abdf	Modernise add vectors vocab test	2017-01-12 15:09:49 +01:00
Ines Montani	89525ef345	Use consistent test names	2017-01-12 15:09:21 +01:00
Ines Montani	f8803808ce	Remove old unused tests and conftest files	2017-01-12 15:09:05 +01:00
Ines Montani	4d0bfebcd9	Move Pragmatic Segmenter test cases (currently unused) to parser tests	2017-01-12 15:08:02 +01:00
Ines Montani	26d018d874	Add tests for StringStore	2017-01-12 15:07:31 +01:00
Ines Montani	9b6784bab5	Add fixture for StringStore	2017-01-12 15:05:40 +01:00
Ines Montani	99d66d613a	Modernise tests for merging spans and don't depend on models	2017-01-12 12:26:26 +01:00
Ines Montani	fa8f67596d	Remove unused old test	2017-01-12 12:26:08 +01:00
Ines Montani	359f73a96b	Move test for #54 to regression tests	2017-01-12 12:25:51 +01:00
Ines Montani	3f3a46722c	Remove unused conftest	2017-01-12 12:25:24 +01:00
Ines Montani	c2406e92bc	Allow setting ents in get_doc	2017-01-12 12:25:10 +01:00
Ines Montani	c5914c6fe5	Fix and pass regression test for #736	2017-01-12 11:48:56 +01:00
Ines Montani	a6790b6694	Rename tags to pos in get_doc and allow adding tags to tokens	2017-01-12 11:18:36 +01:00
Ines Montani	1add8ace67	Merge lemmatizer tests	2017-01-12 11:16:53 +01:00
Ines Montani	3bc082abdf	Modernise morph exceptions test and don't depend on models	2017-01-12 11:14:29 +01:00
Ines Montani	ec7739b76e	Add regression test for #736	2017-01-12 11:12:44 +01:00
Ines Montani	6c1c564891	Move language-specific tests out of redundant tokenizer directories	2017-01-12 02:17:18 +01:00
Ines Montani	8fecedac3a	Tidy up	2017-01-12 02:16:37 +01:00
Ines Montani	ae7edd30e7	Move text file back to tokenizer tests directory	2017-01-12 02:10:23 +01:00
Ines Montani	ffcaba9017	Remove old and/or redundant tests	2017-01-12 02:10:18 +01:00
Ines Montani	19c4132097	Modernise space attachment parser tests and don't depend on models	2017-01-12 01:54:44 +01:00
Ines Montani	69778924c8	Modernise and merge parser tests and don't depend on models	2017-01-12 01:07:29 +01:00
Ines Montani	178c147612	Modernise nonprojectivity tests and don't depend on models	2017-01-12 01:06:36 +01:00
Ines Montani	1a3984742c	Modernise sentence boundary detection tests and don't depend on models (where possible)	2017-01-11 23:53:08 +01:00
Ines Montani	0cdb6ea61d	Remove old unused pickle test	2017-01-11 23:52:28 +01:00
Ines Montani	c9671329dc	Move test for #309 to regression tests	2017-01-11 23:52:13 +01:00
Ines Montani	d0e37b5670	Modernise parser tests and don't depend on models	2017-01-11 21:30:27 +01:00
Ines Montani	342cb41782	Add apply_transition_sequence util function to utils	2017-01-11 21:30:14 +01:00
Ines Montani	09807addff	Add en_parser fixture	2017-01-11 21:29:59 +01:00
Ines Montani	55d151aa61	Modernise Doc parse tree navigation tests and don't depend on models	2017-01-11 21:14:15 +01:00
Ines Montani	7262421bb2	Use consistent test names	2017-01-11 19:00:52 +01:00
Ines Montani	33800c9367	Rename "tokens" tests to "doc"	2017-01-11 18:59:01 +01:00
Ines Montani	3a9c6a9563	Remove old unused files	2017-01-11 18:58:38 +01:00
Ines Montani	8e962de39f	Remove old word vector tests	2017-01-11 18:55:08 +01:00
Ines Montani	e027936920	Modernise Doc noun chunks tests	2017-01-11 18:54:56 +01:00
Ines Montani	439f396acd	Modernise Doc array tests and don't depend on models	2017-01-11 18:54:46 +01:00
Ines Montani	05447be884	Modernise test for adding entities	2017-01-11 18:54:24 +01:00
Ines Montani	6e883f4c00	Modernise Doc API tests and don't depend on models	2017-01-11 18:05:36 +01:00
Ines Montani	8bf3bb5c44	Make words optional for get_doc	2017-01-11 18:05:10 +01:00
Ines Montani	928db7e419	Fix StringIO import for Python 3	2017-01-11 14:07:48 +01:00
Ines Montani	69998f216b	Rename test_tokens_api.py to test_doc_api.py	2017-01-11 13:58:56 +01:00
Ines Montani	d94dea1b18	Merge token tests into token API tests	2017-01-11 13:57:02 +01:00
Ines Montani	eb23424ab0	Modernise token API tests and don't depend on loading models	2017-01-11 13:56:54 +01:00
Ines Montani	c682b8ca90	Merge conftests into one cohesive file	2017-01-11 13:56:32 +01:00
Ines Montani	909f24d7df	Add test utils and get_doc helper function Create Doc object from given vocab, words and annotations to allow tests not to depend on loading the models.	2017-01-11 13:55:33 +01:00
Ines Montani	3e6e1f0251	Tidy up regression tests	2017-01-10 19:24:10 +01:00
Ines Montani	869963c3c4	Mark extensive prefix/suffix tests as slow	2017-01-10 15:57:35 +01:00
Ines Montani	487e020ebe	Add simple test for surrounding brackets	2017-01-10 15:57:26 +01:00
Ines Montani	0ba5cf51d2	Assert length first	2017-01-10 15:57:00 +01:00
Ines Montani	2185d31907	Adjust names and formatting	2017-01-10 15:56:35 +01:00
Ines Montani	e10d4ca964	Remove semi-redundant URLs and punctuation for faster testing	2017-01-10 15:54:25 +01:00
Ines Montani	3a3cb2c90c	Add unicode declaration	2017-01-10 15:53:15 +01:00
Matthew Honnibal	64f747cb65	Token comparison test	2017-01-09 19:12:00 +01:00
Matthew Honnibal	18c3c2d05c	Add tests for token comparison, re Issue #631	2017-01-09 19:09:59 +01:00
Matthew Honnibal	42cd598f57	Use correct fixtures in URL tokenizer	2017-01-09 14:10:40 +01:00
Ines Montani	aa876884f0	Revert "Revert "Merge remote-tracking branch 'origin/master'"" This reverts commit `fb9d3bb022`.	2017-01-09 13:28:13 +01:00
Ines Montani	d5c72c40eb	Remove old tests for old website example code	2017-01-08 22:28:53 +01:00
Ines Montani	5d28664fc5	Don't test Hungarian for numbers and hyphens for now Reinvestigate behaviour of case affixes given reorganised tokenizer patterns.	2017-01-08 20:45:40 +01:00
Ines Montani	abb09782f9	Move sun.txt to original location and fix path to not break parser tests	2017-01-08 20:32:54 +01:00
Ines Montani	8328925e1f	Add newlines to long German text	2017-01-05 18:13:30 +01:00
Ines Montani	55b46d7cf6	Add tokenizer tests for German	2017-01-05 18:11:25 +01:00
Ines Montani	5bb4081f52	Remove redundant test_tokenizer.py for English	2017-01-05 18:11:11 +01:00
Ines Montani	8216ba599b	Add tests for longer and mixed English texts	2017-01-05 18:11:04 +01:00
Ines Montani	65f937d5c6	Move basic contraction tests to test_contractions.py	2017-01-05 18:09:53 +01:00
Ines Montani	bbe7cab3a1	Move non-English-specific tests back to general tokenizer tests	2017-01-05 18:09:29 +01:00
Ines Montani	038002d616	Reformat HU tokenizer tests and adapt to general style Improve readability of test cases and add conftest.py with fixture	2017-01-05 18:06:44 +01:00
Ines Montani	637f785036	Add general sanity tests for all tokenizers	2017-01-05 16:25:38 +01:00
Ines Montani	c5f2dc15de	Move English tokenizer tests to directory /en	2017-01-05 16:25:04 +01:00
Ines Montani	8b45363b4d	Modernize and merge general tokenizer tests	2017-01-05 13:17:05 +01:00
Ines Montani	02cfda48c9	Modernize and merge tokenizer tests for string loading	2017-01-05 13:16:55 +01:00
Ines Montani	a11f684822	Modernize and merge tokenizer tests for whitespace	2017-01-05 13:16:33 +01:00
Ines Montani	8b284fc6f1	Modernize and merge tokenizer tests for text from file	2017-01-05 13:15:52 +01:00
Ines Montani	2c2e878653	Modernize and merge tokenizer tests for punctuation	2017-01-05 13:14:16 +01:00
Ines Montani	8a74129cdf	Modernize and merge tokenizer tests for prefixes/suffixes/infixes	2017-01-05 13:13:12 +01:00
Ines Montani	0e65dca9a5	Modernize and merge tokenizer tests for exception and emoticons	2017-01-05 13:11:31 +01:00
Ines Montani	34c47bb20d	Fix formatting	2017-01-05 13:10:51 +01:00
Ines Montani	2e72683baa	Add missing docstrings	2017-01-05 13:10:21 +01:00
Ines Montani	da10a049a6	Add unicode declarations	2017-01-05 13:09:48 +01:00
Ines Montani	58adae8774	Remove unused file	2017-01-05 13:09:22 +01:00
Ines Montani	c6e5a5349d	Move regression test for #360 into own file	2017-01-04 00:49:31 +01:00
Ines Montani	8279993a6f	Modernize and merge tokenizer tests for punctuation	2017-01-04 00:49:20 +01:00
Ines Montani	550630df73	Update tokenizer tests for contractions	2017-01-04 00:48:42 +01:00
Ines Montani	109f202e8f	Update conftest fixture	2017-01-04 00:48:21 +01:00
Ines Montani	ee6b49b293	Modernize tokenizer tests for emoticons	2017-01-04 00:47:59 +01:00
Ines Montani	f09b5a5dfd	Modernize tokenizer tests for infixes	2017-01-04 00:47:42 +01:00
Ines Montani	59059fed27	Move regression test for #351 to own file	2017-01-04 00:47:11 +01:00
Ines Montani	667051375d	Modernize tokenizer tests for whitespace	2017-01-04 00:46:35 +01:00
Ines Montani	aafc894285	Modernize tokenizer tests for contractions Use @pytest.mark.parametrize.	2017-01-03 23:02:21 +01:00
Ines Montani	fb9d3bb022	Revert "Merge remote-tracking branch 'origin/master'" This reverts commit `d3b181cdf1`, reversing changes made to `b19cfcc144`.	2017-01-03 18:21:36 +01:00
Matthew Honnibal	3ba7c167a8	Fix URL tests	2016-12-30 17:10:08 -06:00
Matthew Honnibal	9936a1b9b5	Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns	2016-12-30 14:53:40 -06:00
kengz	73a38bd4d1	Merge remote-tracking branch 'upstream/master'	2016-12-30 12:19:59 -05:00
kengz	da44183ae1	move parse_tree logic to a new tokens/printers.py file	2016-12-30 12:19:18 -05:00
Matthew Honnibal	3e8d9c772e	Test interaction of token_match and punctuation Check that the new token_match function applies after punctuation is split off.	2016-12-31 00:52:17 +11:00

... 9 10 11 12 13 ...

1298 Commits