spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-05 09:14:22 +03:00

History

adrianeboyd adc9745718 Modify morphology to support arbitrary features (#4932 ) * Restructure tag maps for MorphAnalysis changes Prepare tag maps for upcoming MorphAnalysis changes that allow arbritrary features. * Use default tag map rather than duplicating for ca / uk / vi * Import tag map into defaults for ga * Modify tag maps so all morphological fields and features are strings * Move features from `"Other"` to the top level * Rewrite tuples as strings separated by `","` * Rewrite morph symbols for fr lemmatizer as strings * Export MorphAnalysis under spacy.tokens * Modify morphology to support arbitrary features Modify `Morphology` and `MorphAnalysis` so that arbitrary features are supported. * Modify `MorphAnalysisC` so that it can support arbitrary features and multiple values per field. `MorphAnalysisC` is redesigned to contain: * key: hash of UD FEATS string of morphological features * array of `MorphFeatureC` structs that each contain a hash of `Field` and `Field=Value` for a given morphological feature, which makes it possible to: * find features by field * represent multiple values for a given field * `get_field()` is renamed to `get_by_field()` and is no longer `nogil`. Instead a new helper function `get_n_by_field()` is `nogil` and returns `n` features by field. * `MorphAnalysis.get()` returns all possible values for a field as a list of individual features such as `["Tense=Pres", "Tense=Past"]`. * `MorphAnalysis`'s `str()` and `repr()` are the UD FEATS string. * `Morphology.feats_to_dict()` converts a UD FEATS string to a dict where: * Each field has one entry in the dict * Multiple values remain separated by a separator in the value string * `Token.morph_` returns the UD FEATS string and you can set `Token.morph_` with a UD FEATS string or with a tag map dict. * Modify get_by_field to use np.ndarray Modify `get_by_field()` to use np.ndarray. Remove `max_results` from `get_n_by_field()` and always iterate over all the fields. * Rewrite without MorphFeatureC * Add shortcut for existing feats strings as keys Add shortcut for existing feats strings as keys in `Morphology.add()`. * Check for '_' as empty analysis when adding morphs * Extend helper converters in Morphology Add and extend helper converters that convert and normalize between: * UD FEATS strings (`"Case=dat,gen\|Number=sing"`) * per-field dict of feats (`{"Case": "dat,gen", "Number": "sing"}`) * list of individual features (`["Case=dat", "Case=gen", "Number=sing"]`) All converters sort fields and values where applicable.		2020-01-23 22:01:54 +01:00
..
af	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
ar	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
bg	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
bn	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
ca	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
cs	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
da	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
de	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
el	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
en	More formatting changes	2019-12-25 17:59:52 +01:00
es	More formatting changes	2019-12-25 17:59:52 +01:00
et	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
fa	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
fi	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
fr	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
ga	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
he	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
hi	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
hr	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
hu	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
id	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
is	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
it	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
ja	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
kn	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
ko	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
lb	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
lt	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
lv	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
mr	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
nb	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
nl	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
pl	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
pt	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
ro	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
ru	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
si	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
sk	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
sl	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
sq	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
sr	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
sv	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
ta	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
te	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
th	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
tl	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
tr	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
tt	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
uk	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
ur	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
vi	Modify morphology to support arbitrary features (#4932 )	2020-01-23 22:01:54 +01:00
xx	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
yo	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
zh	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
__init__.py	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
char_classes.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
lex_attrs.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
norm_exceptions.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
punctuation.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
tag_map.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
tokenizer_exceptions.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00