spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-23 18:41:59 +03:00

Author	SHA1	Message	Date
ines	24606d364c	Remove redundant language_data.py files in languages Originally intended to collect all components of a language, but just made things messy. Now each component is in charge of exporting itself properly.	2017-05-08 15:55:29 +02:00
ines	7f05e977fa	Reorganise French language data	2017-05-08 15:49:05 +02:00
Gregory Howard	f2ab7d77b4	Lazy imports language	2017-05-03 11:01:42 +02:00
Gregory Howard	92f368f83b	Removing extra spaces	2017-04-27 12:02:14 +02:00
Gregory Howard	8ff4682255	correcting tokenizer exception. Adding tests for lemmatization	2017-04-27 11:52:14 +02:00
Gregory Howard	ad8129cb45	Improvement of rules now title insentive and have same declaration format	2017-04-27 10:23:56 +02:00
Gregory Howard	ed5f094451	Adding insensitive lemmatisation test	2017-04-25 18:07:02 +02:00
ghoward	55c6910f90	Look_up table for languages in spacy. Need to find an another name for lemmatizerlookup. I was not inspired. Trying to uses new files in fr language.	2017-04-24 16:39:00 +02:00
Ben Eyal	d8098a8be2	Use `regex` instead of `re`	2017-04-20 02:22:52 +03:00
ines	66c1f194f9	Use consistent unicode declarations	2017-03-12 13:07:28 +01:00
Matthew Honnibal	bd4375a2e6	Remove comment	2017-02-27 11:44:26 +01:00
Matthew Honnibal	e7e22d8be6	Move import within get_exceptions() function, to speed import	2017-02-27 11:34:48 +01:00
Matthew Honnibal	26446aa728	Avoid loading all French exceptions on import Move exceptions loading behind a get_tokenizer_exceptions() function for French, instead of loading into the top-level namespace. This cuts import times from 0.6s to 0.2s, at the expense of making the French data a little different from the others (there's no top-level TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat unsatisfying.	2017-02-25 11:55:00 +01:00
ines	0e2e331b58	Convert exceptions to Python list	2017-02-24 18:22:40 +01:00
ines	f08e180a47	Make groups non-capturing Prevents hitting the 100 named groups limit in Python	2017-02-10 13:35:02 +01:00
ines	fa3b8512da	Use consistent imports and exports Bundle everything in language_data to keep it consistent with other languages and make TOKENIZER_EXCEPTIONS importable from there.	2017-02-10 13:34:09 +01:00
ines	21f09d10d7	Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"" This reverts commit `f02a2f9322`.	2017-02-10 13:17:05 +01:00
ines	f02a2f9322	Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions" This reverts commit `b95afdf39c`, reversing changes made to `b0ccf32378`.	2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque	5d706ab95d	Merge tokenizer exceptions from PR #802	2017-02-09 16:30:28 +01:00
Raphaël Bournhonesque	85f951ca99	Add tokenizer exceptions for French	2017-02-02 08:36:16 +01:00
Raphaël Bournhonesque	1faaf698ca	Add infixes and abbreviation exceptions (fr)	2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque	cf8474401b	Remove unused import statement	2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque	902f136f18	Add support for elision in French	2017-01-24 10:57:37 +01:00
Ines Montani	0dec90e9f7	Use global abbreviation data languages and remove duplicates	2017-01-08 20:36:00 +01:00
Ines Montani	2b2ea8ca11	Reorganise language data	2016-12-18 16:54:19 +01:00
Ines Montani	e0a7b5c612	Fix formatting	2016-12-17 12:33:09 +01:00
Ines Montani	08162dce67	Move shared functions and constants to global language data	2016-12-17 12:32:48 +01:00
Ines Montani	6a60a61086	Move update_exc to global language data utils	2016-12-17 12:29:02 +01:00
Ines Montani	487ce1e20a	Add encoding declaration	2016-12-17 12:25:44 +01:00
Ines Montani	1b3b043660	Add French stopwords	2016-12-08 20:12:43 +01:00
Ines Montani	8863e504eb	Update French language data	2016-12-08 20:07:14 +01:00
Matthew Honnibal	3d4bd96e8a	Fix infixes in french	2016-11-02 20:41:43 +01:00
Matthew Honnibal	ad1c747c6b	Fix stray POS in language stubs	2016-11-02 20:37:55 +01:00
Matthew Honnibal	6dbf4f7ad7	Stub out support for French, Spanish, Italian and Portuguese	2016-11-02 20:02:41 +01:00

34 Commits