spaCy/spacy/lang
Adriane Boyd f94168a41e
Backport bugfixes from v3.1.0 to v3.0 (#8739)
* Fix scoring normalization (#7629)

* fix scoring normalization

* score weights by total sum instead of per component

* cleanup

* more cleanup

* Use a context manager when reading model (fix #7036) (#8244)

* Fix other open calls without context managers (#8245)

* Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246)

* Don't add duplicate patterns (fix #8216)

* Refactor EntityRuler init

This simplifies the EntityRuler init code. This is helpful as prep for
allowing the EntityRuler to reset itself.

* Make EntityRuler.clear reset matchers

Includes a new test for this.

* Tidy PhraseMatcher instantiation

Since the attr can be None safely now, the guard if is no longer
required here.

Also renamed the `_validate` attr. Maybe it's not needed?

* Fix NER test

* Add test to make sure patterns aren't increasing

* Move test to regression tests

* Exclude generated .cpp files from package (#8271)

* Fix non-deterministic deduplication in Greek lemmatizer (#8421)

* Fix setting empty entities in Example.from_dict (#8426)

* Filter W036 for entity ruler, etc. (#8424)

* Preserve paths.vectors/initialize.vectors setting in quickstart template

* Various fixes for spans in Docs.from_docs (#8487)

* Fix spans offsets if a doc ends in a single space and no space is
  inserted
* Also include spans key in merged doc for empty spans lists

* Fix duplicate spacy package CLI opts (#8551)

Use `-c` for `--code` and not additionally for `--create-meta`, in line
with the docs.

* Raise an error for textcat with <2 labels (#8584)

* Raise an error for textcat with <2 labels

Raise an error if initializing a `textcat` component without at least
two labels.

* Add similar note to docs

* Update positive_label description in API docs

* Add Macedonian models to website (#8637)

* Fix Azerbaijani init, extend lang init tests (#8656)

* Extend langs in initialize tests

* Fix az init

* Fix ru/uk lemmatizer mp with spawn (#8657)

Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.

* Use 0-vector for OOV lexemes (#8639)

* Set version to v3.0.7

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-07-19 09:20:40 +02:00
..
af Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
am Tidy up and auto-format 2021-02-13 12:55:56 +11:00
ar Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
az Backport bugfixes from v3.1.0 to v3.0 (#8739) 2021-07-19 09:20:40 +02:00
bg Bulgarian tokenizer exceptions (#7114) 2021-02-19 19:19:19 +01:00
bn Implement overwrite param for all custom lemmatizers (#6794) 2021-01-26 14:53:43 +11:00
ca Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
cs Tidy up and auto-format 2021-01-05 13:41:53 +11:00
da Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
de Merge branch 'develop' into master-tmp 2020-10-04 14:52:20 +02:00
el Backport bugfixes from v3.1.0 to v3.0 (#8739) 2021-07-19 09:20:40 +02:00
en Fix/fix en ordinals (#8028) 2021-05-07 10:26:42 +02:00
es Tidy up and auto-format 2021-01-30 12:52:33 +11:00
et Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
eu Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
fa Implement overwrite param for all custom lemmatizers (#6794) 2021-01-26 14:53:43 +11:00
fi Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
fr Improvements to French stopwords list (#7941) 2021-06-02 11:50:49 +02:00
ga Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
gu Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
he raise NotImplementedError when noun_chunks iterator is not implemented (#6711) 2021-01-17 19:56:05 +08:00
hi Auto-format [ci skip] 2020-10-15 10:08:53 +02:00
hr Remove tag map 2020-12-09 11:13:49 +11:00
hu Fix Hungarian % tokenization (#6013) 2020-09-02 13:06:16 +02:00
hy Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
id Merge branch 'develop' into master-tmp 2020-10-04 14:52:20 +02:00
is Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
it Added more exception to the italian language from https://forum.wordr… (#7246) 2021-03-30 10:23:32 +02:00
ja Add lexeme norm defaults 2020-09-30 10:20:14 +02:00
kn Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
ko Add lexeme norm defaults 2020-09-30 10:20:14 +02:00
ky Tidy up and auto-format 2021-01-30 12:52:33 +11:00
lb Remove default initialize lookups 2020-10-01 21:54:33 +02:00
lij Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
lt Fix escape sequence 2021-01-30 12:39:58 +11:00
lv Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
mk Tidy up and auto-format 2021-01-30 12:52:33 +11:00
ml Add missing lex_attr_getters (resolves #5806 ) 2020-07-25 12:55:18 +02:00
mr Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
nb Add / to nb infixes (#7991) 2021-05-04 11:00:10 +02:00
ne Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
nl Implement overwrite param for all custom lemmatizers (#6794) 2021-01-26 14:53:43 +11:00
pl Implement overwrite param for all custom lemmatizers (#6794) 2021-01-26 14:53:43 +11:00
pt Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ro Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
ru Backport bugfixes from v3.1.0 to v3.0 (#8739) 2021-07-19 09:20:40 +02:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
si Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sk Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sl Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sq Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sr Remove default initialize lookups 2020-10-01 21:54:33 +02:00
sv Implement overwrite param for all custom lemmatizers (#6794) 2021-01-26 14:53:43 +11:00
ta Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
te Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
th Add Thai tag map (LST20 Corpus) (#6163) 2020-10-07 11:12:01 +02:00
ti Tidy up and auto-format 2021-01-15 11:57:36 +11:00
tl Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
tn Tidy up and auto-format 2021-02-13 12:55:56 +11:00
tr Tidy up and auto-format 2021-01-05 13:41:53 +11:00
tt Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
uk Backport bugfixes from v3.1.0 to v3.0 (#8739) 2021-07-19 09:20:40 +02:00
ur Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
vi Merge pull request #6165 from explosion/feature/update-tokenizers-initialize 2020-10-01 09:49:47 +02:00
xx Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
yo Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
zh Setup / install / quickstart updates 2020-10-23 11:27:54 +02:00
__init__.py Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
char_classes.py Add all symbols in Unicode Currency Symbols block (#8212) 2021-05-31 18:03:40 +10:00
lex_attrs.py Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
norm_exceptions.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
punctuation.py Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
tokenizer_exceptions.py Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00