spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-25 01:33:59 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	e7e22d8be6	Move import within get_exceptions() function, to speed import	2017-02-27 11:34:48 +01:00
Matthew Honnibal	0aaa546435	Fix test after updating the French tokenizer stuff	2017-02-27 11:20:47 +01:00
Matthew Honnibal	26446aa728	Avoid loading all French exceptions on import Move exceptions loading behind a get_tokenizer_exceptions() function for French, instead of loading into the top-level namespace. This cuts import times from 0.6s to 0.2s, at the expense of making the French data a little different from the others (there's no top-level TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat unsatisfying.	2017-02-25 11:55:00 +01:00
ines	7c1260e98c	Add regression test	2017-02-24 18:22:49 +01:00
ines	0e2e331b58	Convert exceptions to Python list	2017-02-24 18:22:40 +01:00
ines	51eb190ef4	Remove print statements from test	2017-02-24 17:41:12 +01:00
ines	67991b6e5f	Add more test cases to #775 regression test to cover #847	2017-02-18 14:10:44 +01:00
ines	30ce2a6793	Exclude "shed" and "Shed" from tokenizer exceptions (see #847 )	2017-02-18 14:10:44 +01:00
Ines Montani	de997c1a33	Merge pull request #842 from magnusburton/master Added regular verb rules for Swedish	2017-02-17 11:18:20 +01:00
Magnus Burton	41fcfd06b8	Added regular verb rules for Swedish	2017-02-17 10:04:04 +01:00
ines	aa92d4e9b5	Fix unicode regex for Python 2 (see #834 )	2017-02-16 23:49:54 +01:00
ines	44de3c7642	Reformat test and use text_file fixture	2017-02-16 23:49:19 +01:00
ines	3dd22e9c88	Mark vectors test as xfail (temporary)	2017-02-16 23:28:51 +01:00
ines	85d249d451	Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )"" This reverts commit `ea05f78660`.	2017-02-16 23:26:25 +01:00
ines	ea05f78660	Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )" This reverts commit `7d8c9eee7f`, reversing changes made to `f6b69babcc`.	2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque	06a71d22df	Fix test failure by using unicode literals	2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque	3ba109622c	Add regression test with non ' ' space character as token	2017-02-16 12:23:27 +01:00
Raphaël Bournhonesque	e17dc2db75	Remove useless import	2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque	3fd2742649	load_vectors should accept arbitrary space characters as word tokens Fix bug #834	2017-02-16 12:08:30 +01:00
ines	f08e180a47	Make groups non-capturing Prevents hitting the 100 named groups limit in Python	2017-02-10 13:35:02 +01:00
ines	fa3b8512da	Use consistent imports and exports Bundle everything in language_data to keep it consistent with other languages and make TOKENIZER_EXCEPTIONS importable from there.	2017-02-10 13:34:09 +01:00
ines	21f09d10d7	Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"" This reverts commit `f02a2f9322`.	2017-02-10 13:17:05 +01:00
ines	f02a2f9322	Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions" This reverts commit `b95afdf39c`, reversing changes made to `b0ccf32378`.	2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque	309da78bf0	Merge branch 'master' into tokenizer_exceptions	2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque	4ce0bbc6b6	Update unit tests	2017-02-09 16:30:43 +01:00
Raphaël Bournhonesque	5d706ab95d	Merge tokenizer exceptions from PR #802	2017-02-09 16:30:28 +01:00
ines	654fe447b1	Add Swedish tokenizer tests (see #807 )	2017-02-05 11:47:07 +01:00
ines	6715615d55	Add missing EXC variable and combine tokenizer exceptions	2017-02-05 11:42:52 +01:00
Ines Montani	30a52d576b	Merge pull request #807 from magnusburton/master Added swedish lemma rules and more verb contractions	2017-02-05 11:34:19 +01:00
Magnus Burton	19c0ce745a	Added swedish lemma rules	2017-02-04 17:53:32 +01:00
Michael Wallin	d25556bf80	[issue 805] Fix issue	2017-02-04 16:22:21 +02:00
Michael Wallin	35100c8bdd	[issue 805] Add regression test and the required fixture	2017-02-04 16:21:34 +02:00
ines	0ab353b0ca	Add line breaks to Finnish stop words for better readability	2017-02-04 13:40:25 +01:00
Michael Wallin	1a1952afa5	[finnish] Add initial tests for tokenizer	2017-02-04 13:54:10 +02:00
Michael Wallin	f9bb25d1cf	[finnish] Reformat and correct stop words	2017-02-04 13:54:10 +02:00
Michael Wallin	73f66ec570	Add preliminary support for Finnish	2017-02-04 13:54:10 +02:00
Ines Montani	65d6202107	Merge pull request #802 from Tpt/fr-tokenizer Adds more French tokenizer exceptions	2017-02-03 10:52:20 +01:00
Tpt	75a74857bb	Adds more French tokenizer exceptions	2017-02-03 13:45:18 +04:00
Ines Montani	afc6365388	Update regression test for #801 to match current expected behaviour	2017-02-02 16:23:05 +01:00
Ines Montani	012f4820cb	Keep infixes of punctuation + hyphens as one token (see #801 )	2017-02-02 16:22:40 +01:00
Ines Montani	1219a5f513	Add = to tokenizer prefixes	2017-02-02 16:21:11 +01:00
Ines Montani	ff04748eb6	Add missing emoticon	2017-02-02 16:21:00 +01:00
Ines Montani	13a4ab37e0	Add regression test for #801	2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque	85f951ca99	Add tokenizer exceptions for French	2017-02-02 08:36:16 +01:00
Matvey Ezhov	32a22291bc	Small `Doc.count_by` documentation update Current example doesn't work	2017-01-31 19:18:45 +03:00
Ines Montani	e4875834fe	Fix formatting	2017-01-31 15:19:33 +01:00
Ines Montani	c304834e45	Add missing import	2017-01-31 15:18:30 +01:00
Ines Montani	e6465b9ca3	Parametrize test cases and mark as xfail	2017-01-31 15:14:42 +01:00
latkins	e4c84321a5	Added regression test for Issue #792 .	2017-01-31 13:47:42 +00:00
Matthew Honnibal	6c665b81df	Fix redundant == TAG in from_array conditional	2017-01-31 00:46:21 +11:00

1 2 3 4 5 ...

2368 Commits