spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-25 01:33:59 +03:00

History

Jani Monoses 42b34832e4 Update Romanian stopword list (#2316 ) * Contributor agreement for janimo * Update Romanian stopword list Include the correct spellings of all the words already in the repo that are using cedillas (ş and ţ) instead of commas (ș and ț). Add another unrelated spelling fix. See https://github.com/stopwords-iso/stopwords-ro/pull/1 and https://github.com/stopwords-iso/stopwords-ro/pull/2		2018-05-10 12:16:56 +02:00
..
bn	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
da	Add Danish lemmatizer (#2184 )	2018-04-07 19:07:28 +02:00
de	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
en	quick typo fix	2018-03-24 17:26:35 +01:00
es	Fix Spanish noun_chunks (resolves #2210 )	2018-04-18 18:44:01 -04:00
fa	add persian language	2018-01-27 13:27:26 +03:30
fi	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00
fr	Update stop_words.py for French language (#2310 )	2018-05-09 12:04:38 +02:00
ga	Remove comma that caused list to wrap in tuple!	2017-10-31 20:13:16 +01:00
he	Don't make copies of language data components	2017-10-11 15:34:55 +02:00
hi	remove no-break spaces from Hindi example (fixes #1750 )	2017-12-20 11:35:30 -08:00
hr	Update stop_words.py	2018-03-24 17:31:24 +01:00
hu	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
id	Find lowercased forms of numeric words	2018-01-08 03:25:08 +01:00
it	Fix syntax error in italian lemmatizer	2018-04-03 23:13:22 +02:00
ja	Port Japanese mecab tokenizer from v1 (#2036 )	2018-05-03 18:38:26 +02:00
nb	Copied French syntax iterator to simplify future changes	2018-02-05 14:45:05 +01:00
nl	Find lowercased forms of ordinal words, where possible	2018-01-08 03:28:50 +01:00
pl	Merge pull request #2142 from jimregan/polish-more-tokens	2018-03-24 19:06:44 +01:00
pt	Update lex_attrs.py (#2307 )	2018-05-09 20:49:31 +02:00
ro	Update Romanian stopword list (#2316 )	2018-05-10 12:16:56 +02:00
ru	Add Russian example sentences (see #1107 )	2018-02-01 20:09:40 +01:00
sv	fixes #2238 (#2241 )	2018-04-28 14:55:22 +02:00
th	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
tr	Port over Turkish changes	2018-03-24 17:31:07 +01:00
vi	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 )	2018-03-29 12:19:51 +02:00
xx	Tidy up language data	2017-10-11 02:22:49 +02:00
zh	add ChineseDefaults class for pickling	2017-12-28 17:13:58 +08:00
__init__.py	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
char_classes.py	add ٪ as punctuation	2018-01-23 18:11:33 +03:30
entity_rules.py	Reorganise entity rules	2017-05-09 01:37:10 +02:00
lex_attrs.py	Merge pull request #1891 from fucking-signup/master	2018-02-18 13:47:47 +01:00
norm_exceptions.py	Update base norm exceptions with more unicode characters	2017-10-14 14:58:52 +02:00
punctuation.py	Add symbols class to punctuation rules to handle emoji (see #1088 )	2017-05-27 17:57:10 +02:00
tag_map.py	Fix formatting	2017-05-09 11:08:14 +02:00
tokenizer_exceptions.py	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00