spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-01-30 11:14:08 +03:00

Author	SHA1	Message	Date
Mathias Deschamps	c0691b2ab4	Add tokenizer exceptions for ing verbs Extend list of tokenizing exceptions introduced in `123810b`	2017-11-13 17:46:05 +01:00
Mathias Deschamps	288298ead9	Add norm exception for ing verbs Some ing verbs are sometimes written in or in'. Make the NORM form correct	2017-11-13 17:46:05 +01:00
Abhinav Sharma	59f5740ede	improved upon the list of included stop_words	2017-11-13 17:13:49 +05:30
ines	123810b6de	Add "lovin'" to tokenizer exceptions (see #1248 )	2017-11-09 17:09:30 +01:00
Ines Montani	42b241ccd0	Update language code in usage example in comment	2017-11-08 11:36:38 +01:00
Abhinav Sharma	84edade82d	Create examples.py Populated the file with the translations of English example sentences	2017-11-08 13:23:08 +05:30
ines	bcf42b8846	Fix typo	2017-11-08 01:06:37 +01:00
ines	acb9bdb852	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
ines	baa231745c	Fix Dutch tag map	2017-11-05 21:41:50 +01:00
ines	507ecb67af	Fix Spanish tag map	2017-11-05 19:23:34 +01:00
ines	975e1042ff	Fix Italian tag map	2017-11-05 18:34:09 +01:00
ines	6b2d6e4937	Fix Portuguese tag map	2017-11-05 18:31:00 +01:00
ines	fa2687fded	Fix Dutch tag map	2017-11-05 17:57:59 +01:00
ines	fb8990d916	Fix Spanish tag map	2017-11-05 17:48:46 +01:00
ines	9d13288f73	Fix French tag map	2017-11-05 17:47:59 +01:00
ines	54579805c5	Fix French tag map	2017-11-05 17:44:05 +01:00
Matthew Honnibal	0d4bd6414e	Fix Italian tag map	2017-11-05 14:11:03 +01:00
ines	ef597622a6	Add Portuguese tag map	2017-11-05 13:58:34 +01:00
ines	793c62dfda	Add Dutch tag map	2017-11-05 13:48:07 +01:00
ines	f7485a09c8	Fix Italian tag map	2017-11-05 13:12:58 +01:00
ines	3cef901834	Add tag map for French and Italian	2017-11-04 23:32:51 +01:00
ines	6c15aafebd	Fix formatting	2017-11-04 23:07:02 +01:00
ines	9baab241b4	Add skeleton language data for Turkish	2017-11-02 16:32:24 +01:00
ines	c6fea3e5f6	Add Romanian and Croatian skeletons (experimental) Add language data templates to make it easier for others to contribute to the language support	2017-11-01 23:04:28 +01:00
ines	18c859500b	Add missing imports	2017-11-01 23:02:51 +01:00
ines	819e30a26e	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00
ines	9659391944	Update deprecated methods and add warnings	2017-11-01 16:49:42 +01:00
Ines Montani	d11659463b	Merge pull request #1152 from jimregan/develop-irish [WIP] attempt a port from #1147	2017-11-01 00:23:43 +01:00
ines	7e424a1804	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
Ines Montani	06c25a8882	Remove comma that caused list to wrap in tuple! Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)	2017-10-31 20:13:16 +01:00
Ines Montani	147448b65b	Add missing symbols	2017-10-31 19:34:45 +01:00
Ines Montani	9b0de9fb43	Fix import of symbols (now nested one level lower)	2017-10-31 19:17:58 +01:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Ines Montani	090bd00369	Merge pull request #1464 from mayukh18/develop_bengali_pronouns added the bengali pronouns for v2.0	2017-10-25 21:55:25 +02:00
mayukh18	1bc07758fa	added few bengali pronouns	2017-10-25 22:24:40 +05:30
Ines Montani	d3bf488e16	Merge pull request #1171 from mollerhoj/support-danish Improve basic support for Danish	2017-10-24 20:29:57 +02:00
Matthew Honnibal	66766c1454	Restore SP tag to English tag_map, until models migrate	2017-10-24 17:05:00 +02:00
ines	c55db0a4a1	Add example sentences for Japanese and Chinese (see #1107 )	2017-10-24 13:02:24 +02:00
ines	66f8f9d4a0	Fix Japanese tokenizer JapaneseTokenizer now returns a Doc, not individual words	2017-10-24 13:02:19 +02:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
Matthew Honnibal	49895fbef6	Rename 'SP' special tag to '_SP' Renaming the tag with an underscore lets us add it to the tag map without worrying that we'll change the sequence of tags, which throws off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag, the "VERB" tag is pushed to a different class ID, and the model is all messed up.	2017-10-20 14:01:12 +02:00
Ines Montani	f0d577e460	Merge pull request #1425 from explosion/feature/hindi-tokenizer 💫 Basic Hindi tokenization support	2017-10-18 13:34:52 +02:00
Matthew Honnibal	839de87ca9	Make lambda func a named function, for pickling	2017-10-17 18:21:20 +02:00
Matthew Honnibal	9ce7d6af87	Make lex attr functions top-level functions, to promote pickling	2017-10-17 18:19:18 +02:00
Ines Montani	aab299c8ae	Merge pull request #1429 from vishnunekkanti/develop fix syntax error in zh	2017-10-17 14:45:02 +02:00
ines	485c4f6df5	Add Hungarian examples (see #1107 )	2017-10-17 02:37:45 +02:00
Vishnu Kumar Nekkanti	d3c54cf39a	fixed SyntaxError while checking for jieba	2017-10-16 18:51:33 +05:30
ines	266e7180a7	Add Language class, stop words and basic stemmer that sets NORM	2017-10-14 14:59:52 +02:00
ines	e85e1d571b	Update base punctuation	2017-10-14 14:59:23 +02:00
ines	9d6c8eaa49	Update base norm exceptions with more unicode characters e.g. unicode variations of punctuation used in Chinese	2017-10-14 14:58:52 +02:00

1 2 3 4 5

210 Commits