spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-10 10:41:14 +03:00

Author	SHA1	Message	Date
Ines Montani	363f09e68c	Merge pull request #726 from magnusburton/master Added Swedish abbreviations as token exceptions	2017-01-09 14:58:15 +01:00
Matthew Honnibal	42cd598f57	Use correct fixtures in URL tokenizer	2017-01-09 14:10:40 +01:00
Matthew Honnibal	d9a77ddf14	Return None for data path if it doesn't exist	2017-01-09 14:10:05 +01:00
Matthew Honnibal	e4862d1dab	Merge branch 'develop'	2017-01-09 13:36:01 +01:00
Ines Montani	aa876884f0	Revert "Revert "Merge remote-tracking branch 'origin/master'"" This reverts commit `fb9d3bb022`.	2017-01-09 13:28:13 +01:00
Ines Montani	d5c72c40eb	Remove old tests for old website example code	2017-01-08 22:28:53 +01:00
Ines Montani	eef94e3ee2	Split off period after two or more uppercase letters (fixes #483 )	2017-01-08 22:28:25 +01:00
Ines Montani	a89a6000e5	Remove unused import	2017-01-08 22:17:37 +01:00
Ines Montani	5d28664fc5	Don't test Hungarian for numbers and hyphens for now Reinvestigate behaviour of case affixes given reorganised tokenizer patterns.	2017-01-08 20:45:40 +01:00
Ines Montani	53362b6b93	Reorganise Hungarian prefixes/suffixes/infixes Use global prefixes and suffixes for non-language-specific rules, import list of alpha unicode characters and adjust regexes.	2017-01-08 20:40:33 +01:00
Ines Montani	347c4a2d06	Reorganise and reformat global tokenizer prefixes, suffixes and infixes	2017-01-08 20:37:39 +01:00
Ines Montani	0dec90e9f7	Use global abbreviation data languages and remove duplicates	2017-01-08 20:36:00 +01:00
Ines Montani	7c3cb2a652	Add global abbreviations data	2017-01-08 20:34:03 +01:00
Ines Montani	de5aa92bc2	Handle deprecated tokenizer prefix data	2017-01-08 20:33:28 +01:00
Ines Montani	abb09782f9	Move sun.txt to original location and fix path to not break parser tests	2017-01-08 20:32:54 +01:00
Ines Montani	cab39c59c5	Add missing contractions to English tokenizer exceptions Inspired by https://github.com/kootenpv/contractions/blob/master/contractions/__init __.py	2017-01-05 19:59:06 +01:00
Ines Montani	a23504fe07	Move abbreviations below other exceptions	2017-01-05 19:58:07 +01:00
Ines Montani	7d2cf934b9	Generate he/she/it correctly with 's instead of 've	2017-01-05 19:57:00 +01:00
Ines Montani	8328925e1f	Add newlines to long German text	2017-01-05 18:13:30 +01:00
Ines Montani	55b46d7cf6	Add tokenizer tests for German	2017-01-05 18:11:25 +01:00
Ines Montani	5bb4081f52	Remove redundant test_tokenizer.py for English	2017-01-05 18:11:11 +01:00
Ines Montani	8216ba599b	Add tests for longer and mixed English texts	2017-01-05 18:11:04 +01:00
Ines Montani	65f937d5c6	Move basic contraction tests to test_contractions.py	2017-01-05 18:09:53 +01:00
Ines Montani	bbe7cab3a1	Move non-English-specific tests back to general tokenizer tests	2017-01-05 18:09:29 +01:00
Ines Montani	038002d616	Reformat HU tokenizer tests and adapt to general style Improve readability of test cases and add conftest.py with fixture	2017-01-05 18:06:44 +01:00
Ines Montani	bc911322b3	Move ") to emoticons (see Tweebo challenge test)	2017-01-05 18:05:38 +01:00
Ines Montani	637f785036	Add general sanity tests for all tokenizers	2017-01-05 16:25:38 +01:00
Ines Montani	c5f2dc15de	Move English tokenizer tests to directory /en	2017-01-05 16:25:04 +01:00
Ines Montani	8b45363b4d	Modernize and merge general tokenizer tests	2017-01-05 13:17:05 +01:00
Ines Montani	02cfda48c9	Modernize and merge tokenizer tests for string loading	2017-01-05 13:16:55 +01:00
Ines Montani	a11f684822	Modernize and merge tokenizer tests for whitespace	2017-01-05 13:16:33 +01:00
Ines Montani	8b284fc6f1	Modernize and merge tokenizer tests for text from file	2017-01-05 13:15:52 +01:00
Ines Montani	2c2e878653	Modernize and merge tokenizer tests for punctuation	2017-01-05 13:14:16 +01:00
Ines Montani	8a74129cdf	Modernize and merge tokenizer tests for prefixes/suffixes/infixes	2017-01-05 13:13:12 +01:00
Ines Montani	0e65dca9a5	Modernize and merge tokenizer tests for exception and emoticons	2017-01-05 13:11:31 +01:00
Ines Montani	34c47bb20d	Fix formatting	2017-01-05 13:10:51 +01:00
Ines Montani	2e72683baa	Add missing docstrings	2017-01-05 13:10:21 +01:00
Ines Montani	da10a049a6	Add unicode declarations	2017-01-05 13:09:48 +01:00
Ines Montani	58adae8774	Remove unused file	2017-01-05 13:09:22 +01:00
Ines Montani	c6e5a5349d	Move regression test for #360 into own file	2017-01-04 00:49:31 +01:00
Ines Montani	8279993a6f	Modernize and merge tokenizer tests for punctuation	2017-01-04 00:49:20 +01:00
Ines Montani	550630df73	Update tokenizer tests for contractions	2017-01-04 00:48:42 +01:00
Ines Montani	109f202e8f	Update conftest fixture	2017-01-04 00:48:21 +01:00
Ines Montani	ee6b49b293	Modernize tokenizer tests for emoticons	2017-01-04 00:47:59 +01:00
Ines Montani	f09b5a5dfd	Modernize tokenizer tests for infixes	2017-01-04 00:47:42 +01:00
Ines Montani	59059fed27	Move regression test for #351 to own file	2017-01-04 00:47:11 +01:00
Ines Montani	667051375d	Modernize tokenizer tests for whitespace	2017-01-04 00:46:35 +01:00
Ines Montani	aafc894285	Modernize tokenizer tests for contractions Use @pytest.mark.parametrize.	2017-01-03 23:02:21 +01:00
Ines Montani	1d237664af	Add lowercase lemma to tokenizer exceptions	2017-01-03 23:02:21 +01:00
Ines Montani	84a87951eb	Fix typos	2017-01-03 18:27:43 +01:00

1 2 3 4 5 ...

2163 Commits