spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-08 09:41:11 +03:00

Author	SHA1	Message	Date
Eric Zhao	aafdf6ffb8	Add option to use label karg to determine ent_type in doc.merge	2017-03-28 23:35:03 -07:00
Em	9c809efc25	Removed mapStr	2017-03-11 16:23:26 -08:00
Em	426d17167f	Added string manipulation for spans	2017-03-10 16:50:02 -08:00
Ines Montani	a16aff17aa	Merge pull request #876 from PySUST/master [Bangla] Update "tokenizer_exceptions.py"	2017-03-10 14:46:00 +01:00
ines	10e29189ac	Adjust URL testcases and xfail problems (instead of comment)	2017-03-10 14:22:50 +01:00
ines	b04893a059	Make regex locale-independent for Python 2	2017-03-10 14:21:57 +01:00
Ines Montani	1c40890321	Add missing comma Should fix Travis build error	2017-03-10 09:34:54 +01:00
Shuvanon Razik	c251703428	Update abbreviations	2017-03-10 10:45:01 +06:00
Dan Rapp	123d3f2d38	Fix error in test case parameterization	2017-03-09 12:18:21 -07:00
Dan Rapp	b9307dfcd7	Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix	2017-03-09 11:42:14 -07:00
Dan Rapp	3b1df3808d	Issue #840 - URL pattenr too broad	2017-03-09 11:39:39 -07:00
shuvanon	85438aee1b	update tokenizertokenizer	2017-03-08 17:29:39 +06:00
shuvanon	45bc78461c	update tokenizertokenizer	2017-03-08 17:27:12 +06:00
Aniruddha Adhikary	696215a3fb	add tests for Bengali	2017-03-05 11:25:12 +06:00
Aniruddha Adhikary	8f3bfe9bfc	[Bengali] basic tag map, morph, lemma rules and exceptions	2017-03-04 12:36:59 +06:00
ines	8dff040032	Revert "Add regression test for #859 " This reverts commit `c4f16c66d1`.	2017-03-01 21:56:20 +01:00
ines	c4f16c66d1	Add regression test for #859	2017-03-01 16:07:27 +01:00
Aniruddha Adhikary	d91be7aed4	add punctuations for Bengali	2017-02-28 21:07:14 +06:00
Aniruddha Adhikary	5a4fc09576	add basic Bengali support	2017-02-28 07:48:37 +06:00
Matthew Honnibal	cc9b2b74e3	Merge branch 'french-tokenizer-exceptions'	2017-02-27 11:44:39 +01:00
Matthew Honnibal	bd4375a2e6	Remove comment	2017-02-27 11:44:26 +01:00
Matthew Honnibal	e7e22d8be6	Move import within get_exceptions() function, to speed import	2017-02-27 11:34:48 +01:00
Matthew Honnibal	34bcc8706d	Merge branch 'french-tokenizer-exceptions'	2017-02-27 11:21:21 +01:00
Matthew Honnibal	0aaa546435	Fix test after updating the French tokenizer stuff	2017-02-27 11:20:47 +01:00
Matthew Honnibal	26446aa728	Avoid loading all French exceptions on import Move exceptions loading behind a get_tokenizer_exceptions() function for French, instead of loading into the top-level namespace. This cuts import times from 0.6s to 0.2s, at the expense of making the French data a little different from the others (there's no top-level TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat unsatisfying.	2017-02-25 11:55:00 +01:00
ines	376c5813a7	Remove print statements from test	2017-02-24 18:26:32 +01:00
ines	7c1260e98c	Add regression test	2017-02-24 18:22:49 +01:00
ines	0e2e331b58	Convert exceptions to Python list	2017-02-24 18:22:40 +01:00
ines	51eb190ef4	Remove print statements from test	2017-02-24 17:41:12 +01:00
Matthew Honnibal	db5ada3995	Merge branch 'master' of https://github.com/explosion/spaCy	2017-02-24 14:28:12 +01:00
Matthew Honnibal	8f94897d07	Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766	2017-02-24 14:27:02 +01:00
ines	67991b6e5f	Add more test cases to #775 regression test to cover #847	2017-02-18 14:10:44 +01:00
ines	30ce2a6793	Exclude "shed" and "Shed" from tokenizer exceptions (see #847 )	2017-02-18 14:10:44 +01:00
Ines Montani	de997c1a33	Merge pull request #842 from magnusburton/master Added regular verb rules for Swedish	2017-02-17 11:18:20 +01:00
Magnus Burton	41fcfd06b8	Added regular verb rules for Swedish	2017-02-17 10:04:04 +01:00
ines	aa92d4e9b5	Fix unicode regex for Python 2 (see #834 )	2017-02-16 23:49:54 +01:00
ines	44de3c7642	Reformat test and use text_file fixture	2017-02-16 23:49:19 +01:00
ines	3dd22e9c88	Mark vectors test as xfail (temporary)	2017-02-16 23:28:51 +01:00
ines	85d249d451	Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )"" This reverts commit `ea05f78660`.	2017-02-16 23:26:25 +01:00
ines	ea05f78660	Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )" This reverts commit `7d8c9eee7f`, reversing changes made to `f6b69babcc`.	2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque	06a71d22df	Fix test failure by using unicode literals	2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque	3ba109622c	Add regression test with non ' ' space character as token	2017-02-16 12:23:27 +01:00
Raphaël Bournhonesque	e17dc2db75	Remove useless import	2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque	3fd2742649	load_vectors should accept arbitrary space characters as word tokens Fix bug #834	2017-02-16 12:08:30 +01:00
ines	f08e180a47	Make groups non-capturing Prevents hitting the 100 named groups limit in Python	2017-02-10 13:35:02 +01:00
ines	fa3b8512da	Use consistent imports and exports Bundle everything in language_data to keep it consistent with other languages and make TOKENIZER_EXCEPTIONS importable from there.	2017-02-10 13:34:09 +01:00
ines	21f09d10d7	Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"" This reverts commit `f02a2f9322`.	2017-02-10 13:17:05 +01:00
ines	f02a2f9322	Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions" This reverts commit `b95afdf39c`, reversing changes made to `b0ccf32378`.	2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque	309da78bf0	Merge branch 'master' into tokenizer_exceptions	2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque	4ce0bbc6b6	Update unit tests	2017-02-09 16:30:43 +01:00

1 2 3 4 5 ...

2393 Commits