spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-10 02:31:16 +03:00

History

Duygu Altinok 0e55f806dd Turkish tokenization improvements (#6268 ) * added single and paired orth variants * added token match * added long text tokenization test * inverted init * normalized lemmas to lowercase * more abbrevs * tests for ordinals and abbrevs * separated period abbvrevs to another list * fiex typo * added ordinal and abbrev tests * added number tests for dates * minor refinement * added inflected abbrevs regex * added percentage and inflection * cosmetics * added token match * added url inflection tests * excluded url tokens from custom pattern * removed url match import		2020-10-29 09:43:17 +01:00
..
ar	Revert #4334	2019-09-29 17:32:12 +02:00
bn	Revert #4334	2019-09-29 17:32:12 +02:00
ca	Revert #4334	2019-09-29 17:32:12 +02:00
cs	Adding num_like test for Czech (#5946 )	2020-08-21 17:06:33 +02:00
da	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
de	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
el	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
en	English: adds ordinal numbers (#5830 )	2020-07-29 20:22:47 +02:00
es	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
eu	Add __init__.py to eu and hy tests (#5278 )	2020-04-08 20:03:06 +02:00
fa	Fix syntax iterators for Persian (#5437 )	2020-05-14 16:51:03 +02:00
fi	add two abbreviations and some additional unit tests (#5040 )	2020-02-22 14:12:32 +01:00
fr	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
ga	Revert #4334	2019-09-29 17:32:12 +02:00
gu	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
he	Hebrew like num (#5952 )	2020-08-24 14:30:05 +02:00
hi	Hindi: Adds tests for lexical attributes (norm and like_num) (#5829 )	2020-10-07 10:23:32 +02:00
hu	Tidy up and auto-format	2020-03-25 12:28:12 +01:00
hy	Add missing declaration	2020-05-21 17:30:05 +02:00
id	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
it	Revert #4334	2019-09-29 17:32:12 +02:00
ja	Revert "Convert custom user_data to token extension format for Japanese tokenizer (#5652 )" (#5665 )	2020-06-29 14:34:15 +02:00
ko	Revert #4334	2019-09-29 17:32:12 +02:00
lb	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
lt	Improve Lithuanian tokenization (#5205 )	2020-03-25 11:28:12 +01:00
mk	Include Macedonian language (#6230 )	2020-10-15 15:55:01 +02:00
ml	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
nb	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
ne	Add Nepali Language (#5622 )	2020-06-22 10:25:46 +02:00
nl	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
pl	Update Polish tokenizer for UD_Polish-PDB (#5432 )	2020-05-19 15:59:55 +02:00
pt	Revert #4334	2019-09-29 17:32:12 +02:00
ro	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
ru	Revert #4334	2019-09-29 17:32:12 +02:00
sa	Added support for Sanskrit language (#5956 )	2020-08-25 10:56:29 +02:00
sr	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
sv	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
th	Revert #4334	2019-09-29 17:32:12 +02:00
tr	Turkish tokenization improvements (#6268 )	2020-10-29 09:43:17 +01:00
tt	Add trailing whitespace to multiline test text (#4877 )	2020-01-06 14:58:59 +01:00
uk	Revert #4334	2019-09-29 17:32:12 +02:00
ur	Revert #4334	2019-09-29 17:32:12 +02:00
yo	Adding support for Yoruba Language (#4614 )	2019-12-21 14:11:50 +01:00
zh	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
__init__.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_attrs.py	Tidy up and auto-format	2019-12-21 19:04:17 +01:00
test_initialize.py	Adding support for Yoruba Language (#4614 )	2019-12-21 14:11:50 +01:00