spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-03-21 18:34:14 +03:00

Author	SHA1	Message	Date
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
Burton DeWilde	635792997c	Add regression test for #1612	2017-11-20 12:05:35 -06:00
ines	d70a64d78b	Fix syntax error and formatting in test (see #1617 )	2017-11-20 14:01:25 +01:00
ines	17849dee4b	Fix French test (see #1617 )	2017-11-20 13:59:59 +01:00
Felix Sonntag	8be3392302	Added regression text for 1494	2017-11-19 16:30:35 +01:00
Motoki Wu	b818afaa0e	Added failing test for Issue #1207 . The noun chunk iterator should work for `Doc` but not for `Span`.	2017-11-17 17:04:27 -08:00
ines	a3d4dd1a5d	Test adding of lots of pipeline components (see #1585 ) Just to make sure that there's no error now or in the future with adding a large number of pipeline components.	2017-11-15 17:28:06 +01:00
Roman Domrachev	505c6a2f2f	Completely cleanup tokenizer cache Tokenizer cache can have be different keys than string That modification can slow down tokenizer and need to be measured	2017-11-15 17:55:48 +03:00
Roman Domrachev	3e21680814	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
Roman Domrachev	4e378dc4a4	Remove all obsolete code and test only initial problem	2017-11-14 20:45:04 +03:00
Roman	47ce2347b0	Create test that fails when actual cleanup caused	2017-11-14 20:28:13 +03:00
Roman Domrachev	3d247d2bb8	Get back previous testcase	2017-11-14 18:01:37 +03:00
Roman Domrachev	a2745b0e84	StringStore now actually cleaned Do not lose docs in ref tracking	2017-11-14 17:45:50 +03:00
Roman Domrachev	ee60a52ee7	Fix test imports and last batch cleanup	2017-11-11 11:32:16 +03:00
Roman Domrachev	3c600adf23	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
ines	ee97fd3cb4	Add regression test for #1547	2017-11-11 00:14:03 +01:00
ines	2df27db671	Add unicode declaration	2017-11-11 00:13:56 +01:00
ines	1c218397f6	Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options	2017-11-09 02:29:03 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	4194bc5744	Xfail flakey serialization test	2017-11-08 13:55:13 +01:00
ines	42a0fbf291	Fix textcat simple train example	2017-11-07 01:25:54 +01:00
ines	5f43953536	Move test	2017-11-06 23:14:10 +01:00
Matthew Honnibal	1831dbd065	Add test of simple textcat workflow	2017-11-06 22:04:29 +01:00
Matthew Honnibal	2f7e9f390d	Make test less flakey	2017-11-06 17:34:50 +01:00
Matthew Honnibal	407b08017e	Make test less flakey	2017-11-06 17:31:40 +01:00
Matthew Honnibal	102f797933	Fix lemma ordering in test	2017-11-06 17:02:17 +01:00
Matthew Honnibal	63c6ae4191	Fix lemmatizer test	2017-11-06 11:57:06 +01:00
Matthew Honnibal	00435d8f0c	Add extra beam parsing test	2017-11-05 14:39:57 +01:00
ines	5e7d98f72a	Remove test for #1491	2017-11-03 22:10:57 +01:00
ines	718f1c50fb	Add regression test for #1491	2017-11-03 21:11:20 +01:00
Matthew Honnibal	144a93c2a5	Back-off to tensor for similarity if no vectors	2017-11-03 20:56:33 +01:00
Matthew Honnibal	d6e831bf89	Fix lemmatizer tests	2017-11-03 19:46:34 +01:00
ines	eef930c73e	Assert instead of print	2017-11-03 18:50:57 +01:00
ines	f0986df94b	Add test for #1488 (passes on v2.0.0a18?)	2017-11-03 14:44:36 +01:00
Matthew Honnibal	711278b667	Make test less flakey	2017-11-03 14:36:08 +01:00
Matthew Honnibal	0a534ae96a	Fix test for backprop d_pad	2017-11-03 14:04:16 +01:00
Matthew Honnibal	a22f96c3f1	Add test for backpropagating padding	2017-11-03 00:48:54 +01:00
ines	3af281a334	Update test model name	2017-11-01 23:02:00 +01:00
ines	8c2260e18c	Move span tests to /doc	2017-11-01 16:56:35 +01:00
ines	260cb37224	Catch deprecation warning	2017-11-01 16:49:18 +01:00
ines	5914faafbb	Fix .merge tests to not use deprecated API	2017-11-01 16:49:11 +01:00
Matthew Honnibal	9e0ebee81c	Add Token.is_sent_start property, so can deprecate Token.sent_start	2017-11-01 13:27:14 +01:00
Matthew Honnibal	c047498f87	Fix vectors test	2017-11-01 13:24:47 +01:00
Matthew Honnibal	86eba61fae	Fix token.vector when vectors are missing	2017-11-01 00:47:35 +01:00
Ines Montani	d11659463b	Merge pull request #1152 from jimregan/develop-irish [WIP] attempt a port from #1147	2017-11-01 00:23:43 +01:00
Jim O'Regan	08b0bfd153	merge	2017-10-31 22:55:59 +00:00
Jim O'Regan	00ecfa5417	Ó, not O	2017-10-31 22:54:42 +00:00
Ines Montani	25b1d6cd91	Fix syntax error	2017-10-31 22:36:03 +01:00
Matthew Honnibal	92dc127569	Fix test for Python 3	2017-10-31 22:21:55 +01:00
Jim O'Regan	fe4b10346a	replace example sentence until I get around to adding a punctuation.py	2017-10-31 20:24:53 +00:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Jim O'Regan	d4a8160c36	change quotes	2017-10-31 15:15:44 +00:00
Jim O'Regan	34ca59691b	no idea what is wrong here	2017-10-31 14:50:13 +00:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Matthew Honnibal	cb5217012f	Fix vector remapping	2017-10-31 11:40:46 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
Matthew Honnibal	64e4ff7c4b	Merge 'tidy-up' changes into branch. Resolve conflicts	2017-10-28 13:16:06 +02:00
Ines Montani	4033e70c71	Merge pull request #1461 from explosion/feature/disable-pipes 💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples	2017-10-27 12:21:40 +02:00
Matthew Honnibal	b0f3ea2200	Fix names of pipeline components NeuralDependencyParser --> DependencyParser NeuralEntityRecognizer --> EntityRecognizer TokenVectorEncoder --> Tensorizer NeuralLabeller --> MultitaskObjective	2017-10-26 12:38:23 +02:00
ines	de1e5f35d5	Merge branch 'develop' into feature/disable-pipes	2017-10-25 16:33:12 +02:00
ines	c0b55ebdac	Fix PhraseMatcher.__contains__ and add more tests	2017-10-25 16:31:11 +02:00
ines	657a4d91bc	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:19:05 +02:00
ines	1a722dac31	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:18:18 +02:00
Matthew Honnibal	b5de768852	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-25 14:44:16 +02:00
Matthew Honnibal	094512fd47	Fix model-mark on regression test.	2017-10-25 14:44:00 +02:00
Matthew Honnibal	e70f80f29e	Add Language.disable_pipes()	2017-10-25 13:46:41 +02:00
Ines Montani	d3bf488e16	Merge pull request #1171 from mollerhoj/support-danish Improve basic support for Danish	2017-10-24 20:29:57 +02:00
Matthew Honnibal	908809d488	Update tests	2017-10-24 17:05:15 +02:00
Matthew Honnibal	30e67fa808	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 16:08:23 +02:00
Matthew Honnibal	63f0bde749	Add test for #1250 : Tokenizer cache clobbered special-case attrs	2017-10-24 16:07:18 +02:00
ines	090aed940a	Add test for currently failing span.as_doc case	2017-10-24 16:00:56 +02:00
ines	4ef81a9ebc	Fix whitespace	2017-10-24 16:00:56 +02:00
Matthew Honnibal	4bea65a1a8	Fix Issue #1450 : Off-by-1 in * and ? matches Patterns that end in variable-length operators e.g. * and ? now end on the correct token. Previously, they were off by 1: the next token was pulled into the match, even if that's where the pattern failed.	2017-10-24 14:26:27 +02:00
Matthew Honnibal	391d5ef0d1	Normalize imports in regression test	2017-10-24 14:25:49 +02:00
Matthew Honnibal	b66b8f028b	Fix #1375 -- out-of-bounds on token.nbor()	2017-10-24 12:10:39 +02:00
Matthew Honnibal	a68d89a4f3	Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()	2017-10-24 12:05:25 +02:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
Matthew Honnibal	ccd2ab1a62	Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs	2017-10-24 11:22:46 +02:00
Matthew Honnibal	ef3e5a361b	Merge pull request #1442 from explosion/feature/fix-sp 💫Fix SP tag, tweak Vectors.__init__, fix Morphology	2017-10-24 10:24:07 +02:00
Matthew Honnibal	fdf25d10ba	Merge pull request #1440 from ramananbalakrishnan/develop Support single value for attribute list in doc.to_array	2017-10-24 10:23:12 +02:00
Matthew Honnibal	490ad3eaf0	Check that empty strings are handled. Closes #1242	2017-10-21 00:52:14 +02:00
Ramanan Balakrishnan	d2fe56a577	Add LCA matrix for spans and docs	2017-10-20 23:58:00 +05:30
Matthew Honnibal	d8391b1c4d	Fix #1434 : Matcher failed on ending ? if no token	2017-10-20 16:49:36 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	ebecaddb76	Make 'data_or_width' two keyword args in Vectors.__init__ Previously the data and width options were one argument in Vectors, which meant you couldn't say vectors = Vectors(strings, width=300). It's better to have two keywords.	2017-10-20 14:17:15 +02:00
Ramanan Balakrishnan	b3ab124fc5	Support strings for attribute list in doc.to_array	2017-10-20 11:46:57 +05:30
ines	bf415fd778	Add test for serializing extension attrs (see #1085 )	2017-10-19 00:53:08 +02:00
Matthew Honnibal	fe844148f6	Test pickling hooks	2017-10-17 19:43:52 +02:00
Matthew Honnibal	374819edf8	Test user_data deserialization, re #1085	2017-10-17 19:28:54 +02:00
Matthew Honnibal	8ca97f32a3	Fix doc pickling test	2017-10-17 18:19:57 +02:00
Matthew Honnibal	45d1dd90b1	Add tests for pickling doc	2017-10-17 17:20:58 +02:00
Matthew Honnibal	4174477161	Fix equality check in test	2017-10-16 19:50:35 +02:00
Matthew Honnibal	010a7309ff	Merge pull request #1402 from explosion/feature/fix-matcher-operators 💫 Fix Matcher variable-length operators	2017-10-16 17:53:19 +02:00
Matthew Honnibal	c29927d2e7	Fix matcher test	2017-10-16 17:22:18 +02:00
Matthew Honnibal	a928ae2f35	Merge branch 'develop' into feature/fix-matcher-operators	2017-10-16 13:38:36 +02:00
Matthew Honnibal	748d525801	Add more matcher operator tests	2017-10-16 13:38:01 +02:00
ines	3516aa0cea	Port over changes from #1389	2017-10-14 13:32:55 +02:00
ines	cd6a29dce7	Port over changes from #1294	2017-10-14 13:28:46 +02:00
ines	38c756fd85	Port over changes from #1287	2017-10-14 13:16:21 +02:00
ines	612224c10d	Port over changes from #1157	2017-10-14 13:11:39 +02:00
ines	9b3f8f9ec3	Fix formatting and add comment on languages	2017-10-14 13:11:18 +02:00
ines	a4d974d97b	Port over URL pattern changes from #1411	2017-10-14 12:58:07 +02:00
Matthew Honnibal	cf6da9301a	Update lemmatizer test	2017-10-12 22:50:52 +02:00
Matthew Honnibal	462caf835a	Fix SBD test	2017-10-12 21:18:22 +02:00
Ines Montani	37aa523a8e	Merge pull request #1408 from explosion/feature/dot-underscore 💫 Custom attributes via Doc._, Token._ and Span._	2017-10-11 18:35:56 +02:00
ines	51519251c2	Fix underscore method test	2017-10-11 13:34:19 +02:00
ines	c6ae49e8bf	Fix formatting	2017-10-11 13:34:11 +02:00
ines	453c47ca24	Add German lemmatizer tests	2017-10-11 13:27:26 +02:00
ines	15fe0fd82d	Fix tests	2017-10-11 13:27:18 +02:00
ines	e0ff145a8b	Merge branch 'develop' into feature/dot-underscore	2017-10-11 11:57:05 +02:00
Matthew Honnibal	fd47f8e89f	Fix failing test	2017-10-11 08:38:34 +02:00
Matthew Honnibal	462b2e26b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-11 08:23:04 +02:00
Matthew Honnibal	2c118ab3a6	Add tests for Doc creation	2017-10-11 03:21:23 +02:00
Matthew Honnibal	d84136b4a9	Update add label test	2017-10-10 22:57:41 +02:00
Matthew Honnibal	e0a9b02b67	Merge Span._ and Span.as_doc methods	2017-10-09 22:00:15 -05:00
Matthew Honnibal	09d61ada5e	Merge pull request #1396 from explosion/feature/pipeline-management 💫 Improve pipeline and factory management	2017-10-10 04:29:54 +02:00
Matthew Honnibal	f0f2739ae3	Add test for serialization issue raised in #1105	2017-10-10 03:57:58 +02:00
ines	de374dc72a	Merge branch 'feature/pipeline-management' into feature/dot-underscore	2017-10-09 14:37:51 +02:00
Matthew Honnibal	2534cd57d7	Add bandaid solution to the 'shadowing' problem in #864	2017-10-09 08:59:35 +02:00
Matthew Honnibal	d8a2506023	Merge pull request #1401 from explosion/feature/add-parser-action 💫 Allow labels to be added to pre-trained parser and NER modes	2017-10-09 04:57:51 +02:00
Matthew Honnibal	689349e32f	Merge pull request #1400 from explosion/feature/sentence-parsing 💫 Force parser to respect preset sentence boundaries	2017-10-09 04:31:43 +02:00
Matthew Honnibal	fad2b8315f	Merge branch 'develop' into feature/add-parser-action	2017-10-09 04:13:04 +02:00
Matthew Honnibal	6c79841c0d	Fix tests for history features	2017-10-09 04:12:24 +02:00
Matthew Honnibal	dde87e6b0d	Add tests for adding parser actions	2017-10-09 03:42:35 +02:00
Matthew Honnibal	81a64119db	Fix string-to-unicode problem	2017-10-09 00:59:49 +02:00
Matthew Honnibal	02c2af7119	Fix test	2017-10-09 00:29:37 +02:00
Matthew Honnibal	5a67efeccc	Add tests for sentence segmentation presetting	2017-10-09 00:02:23 +02:00
Matthew Honnibal	9bd8191739	Add tests for Underscore	2017-10-07 18:56:19 +02:00
Matthew Honnibal	3b67eabfea	Allow empty dictionaries to match any token in Matcher Often patterns need to match "any token". A clean way to denote this is with the empty dict {}: this sets no constraints on the token, so should always match. The problem was that having attributes length==0 was used as an end-of-array signal, so the matcher didn't handle this case correctly. This patch compiles empty token spec dicts into a constraint NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the lexeme -- so this always matches.	2017-10-07 03:36:15 +02:00
ines	0adadcb3f0	Fix beam parse model test	2017-10-07 02:15:15 +02:00
ines	b38a8f4a94	Fix and update pipe methods tests	2017-10-07 02:06:23 +02:00
Matthew Honnibal	3a65a0c970	Start adding tests for new pipeline management	2017-10-07 01:48:23 +02:00
ines	61a503a611	Fix parser test	2017-10-07 00:38:51 +02:00
Matthew Honnibal	c6cd81f192	Wrap try/except around model saving	2017-10-05 08:14:24 -05:00
Matthew Honnibal	fd4baff475	Update tests	2017-10-05 08:12:27 -05:00
Matthew Honnibal	40edb65ee7	Make test work for Python 2.7	2017-10-04 16:36:50 +02:00
Matthew Honnibal	db05d4d582	Add test for #1380 . Passes without fix?	2017-10-04 14:56:31 +02:00
Matthew Honnibal	4a59f6358c	Fix thinc imports	2017-10-03 19:21:26 +02:00
Ines Montani	959c46eabe	Merge pull request #1365 from wannaphongcom/develop Add Thai language for spaCy v2	2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun	7b5263ffa4	fix thai test	2017-09-26 23:54:15 +07:00
Matthew Honnibal	41cc5c4c17	Merge branch 'develop' into feature/phrasematcher	2017-09-26 09:59:17 -05:00
Wannaphong Phatthiyaphaibun	5cba67146c	add thai in spacy2	2017-09-26 21:36:27 +07:00
Matthew Honnibal	74f08e1ad5	Update test	2017-09-26 06:45:56 -05:00
Matthew Honnibal	20193371f5	Don't share CNN, to reduce complexities	2017-09-21 14:59:48 +02:00
Matthew Honnibal	cc408fc189	Make PhraseMatcher API like Matcher API	2017-09-20 22:20:35 +02:00
Matthew Honnibal	43ad250dd5	Update matcher tests	2017-09-20 21:54:49 +02:00
Matthew Honnibal	c013e5996f	Fix parser test	2017-09-17 13:13:20 -05:00
ines	ece30c28a8	Don't split hyphenated words in German This way, the tokenizer matches the tokenization in German treebanks	2017-09-16 20:40:15 +02:00

1 2 3 4 5 ...

931 Commits