spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-24 07:00:04 +03:00

History

Brixjohn 52f3c95004 Added alpha support for Tagalog language (#3062 ) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template		2018-12-18 13:08:38 +01:00
..
ar	Additions to Arabic stop words. (#2422 )	2018-06-08 02:33:23 +02:00
bn	update bengali token rules for hyphen and digits (#2731 )	2018-09-05 21:49:00 +02:00
ca	Catalan Language Support (#2940 )	2018-11-26 15:25:47 +01:00
da	Add Danish lemmatizer (#2184 )	2018-04-07 19:07:28 +02:00
de	Also include lowercase norm exceptions	2018-10-13 15:37:30 +02:00
el	Optimize Greek language support (#2658 )	2018-08-14 02:31:32 +02:00
en	quick typo fix	2018-03-24 17:26:35 +01:00
es	Fix Spanish noun_chunks (resolves #2210 )	2018-04-18 18:44:01 -04:00
fa	Add Persian(Farsi) language support (#2797 )	2018-10-13 15:31:49 +02:00
fi	Enhancement/lang fi examples (#2547 )	2018-07-15 09:50:27 +02:00
fr	Lemmatization of Adjectives - French : adding rules and vocabulary (#3045 )	2018-12-16 18:11:07 +01:00
ga	Remove comma that caused list to wrap in tuple!	2017-10-31 20:13:16 +01:00
he	Don't make copies of language data components	2017-10-11 15:34:55 +02:00
hi	Fix missing comma	2018-10-28 00:09:16 +02:00
hr	Update stop_words.py	2018-03-24 17:31:24 +01:00
hu	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
id	Update Indonesian model (#2752 )	2018-09-14 12:30:32 +02:00
it	Fix syntax error in italian lemmatizer	2018-04-03 23:13:22 +02:00
ja	Add Japanese stop words. (#2549 )	2018-07-17 10:12:48 +02:00
nb	Updated wordforms for Norwegian lemmatizer (#3007 )	2018-12-06 15:46:18 +01:00
nl	Fix typo [ci skip]	2018-07-24 18:45:40 +02:00
pl	Lex _attrs for polish language (#2750 )	2018-09-10 11:53:57 +02:00
pt	Update Portuguese Language (#2790 )	2018-09-29 09:51:45 +02:00
ro	Updates to Romanian support (#2354 )	2018-05-24 11:40:00 +02:00
ru	Correcting lang/ru/examples.py (#2845 )	2018-10-13 15:19:43 +02:00
si	Adding "This is a sentence" example to Sinhala (#2846 )	2018-10-14 00:06:40 +02:00
sv	Add abbreviations from UD_Swedish-Talbanken (#2613 )	2018-08-07 13:53:17 +02:00
te	Basic support for Telugu language (#2751 )	2018-09-10 11:53:18 +02:00
th	Change PyThaiNLP Url (#2876 )	2018-10-27 14:46:07 +02:00
tl	Added alpha support for Tagalog language (#3062 )	2018-12-18 13:08:38 +01:00
tr	Port over Turkish changes	2018-03-24 17:31:07 +01:00
tt	Add Tatar Language Support (#2444 )	2018-06-19 10:17:53 +02:00
ur	Add Urdu Language Support (#2430 )	2018-06-22 11:14:03 +02:00
vi	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 )	2018-03-29 12:19:51 +02:00
xx	Tidy up language data	2017-10-11 02:22:49 +02:00
zh	Fix Chinese language related bugs (#2634 )	2018-08-07 11:26:31 +02:00
__init__.py	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
char_classes.py	Adding basic support for Sinhala language. (#2788 )	2018-09-25 12:18:25 +02:00
entity_rules.py	Reorganise entity rules	2017-05-09 01:37:10 +02:00
lex_attrs.py	Merge pull request #1891 from fucking-signup/master	2018-02-18 13:47:47 +01:00
norm_exceptions.py	Update base norm exceptions with more unicode characters	2017-10-14 14:58:52 +02:00
punctuation.py	Add symbols class to punctuation rules to handle emoji (see #1088 )	2017-05-27 17:57:10 +02:00
tag_map.py	Fix formatting	2017-05-09 11:08:14 +02:00
tokenizer_exceptions.py	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00