spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-09 11:14:21 +03:00

History

Adriane Boyd f94168a41e Backport bugfixes from v3.1.0 to v3.0 (#8739 ) * Fix scoring normalization (#7629) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup * Use a context manager when reading model (fix #7036) (#8244) * Fix other open calls without context managers (#8245) * Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246) * Don't add duplicate patterns (fix #8216) * Refactor EntityRuler init This simplifies the EntityRuler init code. This is helpful as prep for allowing the EntityRuler to reset itself. * Make EntityRuler.clear reset matchers Includes a new test for this. * Tidy PhraseMatcher instantiation Since the attr can be None safely now, the guard if is no longer required here. Also renamed the `_validate` attr. Maybe it's not needed? * Fix NER test * Add test to make sure patterns aren't increasing * Move test to regression tests * Exclude generated .cpp files from package (#8271) * Fix non-deterministic deduplication in Greek lemmatizer (#8421) * Fix setting empty entities in Example.from_dict (#8426) * Filter W036 for entity ruler, etc. (#8424) * Preserve paths.vectors/initialize.vectors setting in quickstart template * Various fixes for spans in Docs.from_docs (#8487) * Fix spans offsets if a doc ends in a single space and no space is inserted * Also include spans key in merged doc for empty spans lists * Fix duplicate spacy package CLI opts (#8551) Use `-c` for `--code` and not additionally for `--create-meta`, in line with the docs. * Raise an error for textcat with <2 labels (#8584) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs * Add Macedonian models to website (#8637) * Fix Azerbaijani init, extend lang init tests (#8656) * Extend langs in initialize tests * Fix az init * Fix ru/uk lemmatizer mp with spawn (#8657) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible. * Use 0-vector for OOV lexemes (#8639) * Set version to v3.0.7 Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>		2021-07-19 09:20:40 +02:00
..
af	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
am	Tidy up and auto-format	2021-02-13 12:55:56 +11:00
ar	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
az	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
bg	Bulgarian tokenizer exceptions (#7114 )	2021-02-19 19:19:19 +01:00
bn	Implement overwrite param for all custom lemmatizers (#6794 )	2021-01-26 14:53:43 +11:00
ca	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
cs	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
da	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3	2021-01-14 11:49:58 +01:00
de	Merge branch 'develop' into master-tmp	2020-10-04 14:52:20 +02:00
el	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
en	Fix/fix en ordinals (#8028 )	2021-05-07 10:26:42 +02:00
es	Tidy up and auto-format	2021-01-30 12:52:33 +11:00
et	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
eu	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
fa	Implement overwrite param for all custom lemmatizers (#6794 )	2021-01-26 14:53:43 +11:00
fi	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
fr	Improvements to French stopwords list (#7941 )	2021-06-02 11:50:49 +02:00
ga	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
gu	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
he	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 )	2021-01-17 19:56:05 +08:00
hi	Auto-format [ci skip]	2020-10-15 10:08:53 +02:00
hr	Remove tag map	2020-12-09 11:13:49 +11:00
hu	Fix Hungarian % tokenization (#6013 )	2020-09-02 13:06:16 +02:00
hy	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
id	Merge branch 'develop' into master-tmp	2020-10-04 14:52:20 +02:00
is	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
it	Added more exception to the italian language from https://forum.wordr … (#7246 )	2021-03-30 10:23:32 +02:00
ja	Add lexeme norm defaults	2020-09-30 10:20:14 +02:00
kn	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
ko	Add lexeme norm defaults	2020-09-30 10:20:14 +02:00
ky	Tidy up and auto-format	2021-01-30 12:52:33 +11:00
lb	Remove default initialize lookups	2020-10-01 21:54:33 +02:00
lij	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
lt	Fix escape sequence	2021-01-30 12:39:58 +11:00
lv	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
mk	Tidy up and auto-format	2021-01-30 12:52:33 +11:00
ml	Add missing lex_attr_getters (resolves #5806 )	2020-07-25 12:55:18 +02:00
mr	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
nb	Add / to nb infixes (#7991 )	2021-05-04 11:00:10 +02:00
ne	Remove unicode declarations and update language data	2020-09-04 13:19:16 +02:00
nl	Implement overwrite param for all custom lemmatizers (#6794 )	2021-01-26 14:53:43 +11:00
pl	Implement overwrite param for all custom lemmatizers (#6794 )	2021-01-26 14:53:43 +11:00
pt	Tidy up and auto-format	2021-01-15 11:57:36 +11:00
ro	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3	2021-01-14 11:49:58 +01:00
ru	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
sa	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
si	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
sk	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
sl	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
sq	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
sr	Remove default initialize lookups	2020-10-01 21:54:33 +02:00
sv	Implement overwrite param for all custom lemmatizers (#6794 )	2021-01-26 14:53:43 +11:00
ta	Merge branch 'develop' into master-tmp	2020-10-15 09:06:03 +02:00
te	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
th	Add Thai tag map (LST20 Corpus) (#6163 )	2020-10-07 11:12:01 +02:00
ti	Tidy up and auto-format	2021-01-15 11:57:36 +11:00
tl	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
tn	Tidy up and auto-format	2021-02-13 12:55:56 +11:00
tr	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
tt	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
uk	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
ur	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
vi	Merge pull request #6165 from explosion/feature/update-tokenizers-initialize	2020-10-01 09:49:47 +02:00
xx	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
yo	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
zh	Setup / install / quickstart updates	2020-10-23 11:27:54 +02:00
__init__.py	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
char_classes.py	Add all symbols in Unicode Currency Symbols block (#8212 )	2021-05-31 18:03:40 +10:00
lex_attrs.py	Merge branch 'develop' into master-tmp	2020-09-04 13:15:36 +02:00
norm_exceptions.py	Tidy up and auto-format	2020-02-18 15:38:18 +01:00
punctuation.py	Simplify language data and revert detailed configs	2020-07-24 14:50:26 +02:00
tokenizer_exceptions.py	Merge branch 'develop' into master-tmp	2020-09-04 13:15:36 +02:00