spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-13 05:34:15 +03:00

History

Lucas Abbade be7fdc59d1 Update lex_attrs.py (#2307 ) * Update lex_attrs.py Fixed spelling mistakes of some numbers (according to Brazilian Portuguese). * Update lex_attrs.py As requested, I've included the correct spelling for both Brazilian Portuguese and Portuguese Portuguese. I will advise however, that the two are separated in the future. Brazilian Portuguese is a very different language from the original one, although most of the writing is unified, the way people talk in both countries is radically different. Keeping both languages as one may lead to bigger issues in the future, especially when it comes to spell checking.		2018-05-09 20:49:31 +02:00
..
bn	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
da	Add Danish lemmatizer (#2184 )	2018-04-07 19:07:28 +02:00
de	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
en	quick typo fix	2018-03-24 17:26:35 +01:00
es	Fix Spanish noun_chunks (resolves #2210 )	2018-04-18 18:44:01 -04:00
fa	add persian language	2018-01-27 13:27:26 +03:30
fi	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00
fr	Update stop_words.py for French language (#2310 )	2018-05-09 12:04:38 +02:00
ga	Remove comma that caused list to wrap in tuple!	2017-10-31 20:13:16 +01:00
he	Don't make copies of language data components	2017-10-11 15:34:55 +02:00
hi	remove no-break spaces from Hindi example (fixes #1750 )	2017-12-20 11:35:30 -08:00
hr	Update stop_words.py	2018-03-24 17:31:24 +01:00
hu	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
id	Find lowercased forms of numeric words	2018-01-08 03:25:08 +01:00
it	Fix syntax error in italian lemmatizer	2018-04-03 23:13:22 +02:00
ja	Port Japanese mecab tokenizer from v1 (#2036 )	2018-05-03 18:38:26 +02:00
nb	Copied French syntax iterator to simplify future changes	2018-02-05 14:45:05 +01:00
nl	Find lowercased forms of ordinal words, where possible	2018-01-08 03:28:50 +01:00
pl	Merge pull request #2142 from jimregan/polish-more-tokens	2018-03-24 19:06:44 +01:00
pt	Update lex_attrs.py (#2307 )	2018-05-09 20:49:31 +02:00
ro	Add Romanian and Croatian skeletons (experimental)	2017-11-01 23:04:28 +01:00
ru	Add Russian example sentences (see #1107 )	2018-02-01 20:09:40 +01:00
sv	fixes #2238 (#2241 )	2018-04-28 14:55:22 +02:00
th	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
tr	Port over Turkish changes	2018-03-24 17:31:07 +01:00
vi	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 )	2018-03-29 12:19:51 +02:00
xx	Tidy up language data	2017-10-11 02:22:49 +02:00
zh	add ChineseDefaults class for pickling	2017-12-28 17:13:58 +08:00
__init__.py	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
char_classes.py	add ٪ as punctuation	2018-01-23 18:11:33 +03:30
entity_rules.py	Reorganise entity rules	2017-05-09 01:37:10 +02:00
lex_attrs.py	Merge pull request #1891 from fucking-signup/master	2018-02-18 13:47:47 +01:00
norm_exceptions.py	Update base norm exceptions with more unicode characters	2017-10-14 14:58:52 +02:00
punctuation.py	Add symbols class to punctuation rules to handle emoji (see #1088 )	2017-05-27 17:57:10 +02:00
tag_map.py	Fix formatting	2017-05-09 11:08:14 +02:00
tokenizer_exceptions.py	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00