spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-13 01:32:32 +03:00

Author	SHA1	Message	Date
Adriane Boyd	55614d6799	Add profile=False to currently unprofiled cython	2023-09-28 17:09:41 +02:00
Basile Dura	b0228d8ea6	ci: add cython linter (#12694 ) * chore: add cython-linter dev dependency * fix: lexeme.pyx * fix: morphology.pxd * fix: tokenizer.pxd * fix: vocab.pxd * fix: morphology.pxd (line length) * ci: add cython-lint * ci: fix cython-lint call * Fix kb/candidate.pyx. * Fix kb/kb.pyx. * Fix kb/kb_in_memory.pyx. * Fix kb. * Fix training/ partially. * Fix training/. Ignore trailing whitespaces and too long lines. * Fix ml/. * Fix matcher/. * Fix pipeline/. * Fix tokens/. * Fix build errors. Fix vocab.pyx. * Fix cython-lint install and run. * Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution. * Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues. * Make cython-lint install conditional. Fix tokenizer.pyx. * Fix remaining files. Reenable cython-lint check. * Readded parentheses. * Fix test_build_dependencies(). * Add explanatory comment to cython-lint execution. --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-07-19 12:03:31 +02:00
Daniël de Kok	e2b70df012	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 ) * Use isort with Black profile * isort all the things * Fix import cycles as a result of import sorting * Add DOCBIN_ALL_ATTRS type definition * Add isort to requirements * Remove isort from build dependencies check * Typo	2023-06-14 17:48:41 +02:00
Adriane Boyd	b60b027927	Add default option to MorphAnalysis.get (#12545 ) * Add default to MorphAnalysis.get Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for the user to provide a default return value if the field is not found. The default return value remains `[]`, which is not the same as `dict.get`, but is already established as this method's default return value with the return type `List[str]`. However the new `default` option does not enforce that the user-provided default is actually `List[str]`. * Restore test case	2023-04-20 14:06:32 +02:00
Adriane Boyd	fdb8815ef5	Minor refactor for Morphology and MorphAnalysis (#5804 ) * `MorphAnalysis.get` returns only the field values * Move `_normalize_props` inside `Morphology` as `Morphology.normalize_attrs` and simplify * Simplify POS field detection/conversion * Convert all non-POS features to strings * `Morphology` returns an empty string for a missing morph to align with the FEATS string returned for an existing morph * Remove unused `list_to_feats`	2020-07-24 09:28:06 +02:00
Sofie Van Landeghem	8d3c0306e1	refactor fixes (#5664 ) * fixes in ud_train, UX for morphs * update pyproject with new version of thinc * fixes in debug_data script * cleanup of old unused error messages * remove obsolete TempErrors * move error messages to errors.py * add ENT_KB_ID to default DocBin serialization * few fixes to simple_ner * fix tags	2020-06-29 14:33:00 +02:00
svlandeg	dc069e90b3	fix token.morph_ for v.3 (cf PR #5517 )	2020-06-20 21:13:11 +02:00
Ines Montani	52728d8fa3	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00
Adriane Boyd	e1b7cbd197	Remove MorphAnalysis __str__ and __repr__	2020-05-29 14:33:47 +02:00
adrianeboyd	adc9745718	Modify morphology to support arbitrary features (#4932 ) * Restructure tag maps for MorphAnalysis changes Prepare tag maps for upcoming MorphAnalysis changes that allow arbritrary features. * Use default tag map rather than duplicating for ca / uk / vi * Import tag map into defaults for ga * Modify tag maps so all morphological fields and features are strings * Move features from `"Other"` to the top level * Rewrite tuples as strings separated by `","` * Rewrite morph symbols for fr lemmatizer as strings * Export MorphAnalysis under spacy.tokens * Modify morphology to support arbitrary features Modify `Morphology` and `MorphAnalysis` so that arbitrary features are supported. * Modify `MorphAnalysisC` so that it can support arbitrary features and multiple values per field. `MorphAnalysisC` is redesigned to contain: * key: hash of UD FEATS string of morphological features * array of `MorphFeatureC` structs that each contain a hash of `Field` and `Field=Value` for a given morphological feature, which makes it possible to: * find features by field * represent multiple values for a given field * `get_field()` is renamed to `get_by_field()` and is no longer `nogil`. Instead a new helper function `get_n_by_field()` is `nogil` and returns `n` features by field. * `MorphAnalysis.get()` returns all possible values for a field as a list of individual features such as `["Tense=Pres", "Tense=Past"]`. * `MorphAnalysis`'s `str()` and `repr()` are the UD FEATS string. * `Morphology.feats_to_dict()` converts a UD FEATS string to a dict where: * Each field has one entry in the dict * Multiple values remain separated by a separator in the value string * `Token.morph_` returns the UD FEATS string and you can set `Token.morph_` with a UD FEATS string or with a tag map dict. * Modify get_by_field to use np.ndarray Modify `get_by_field()` to use np.ndarray. Remove `max_results` from `get_n_by_field()` and always iterate over all the fields. * Rewrite without MorphFeatureC * Add shortcut for existing feats strings as keys Add shortcut for existing feats strings as keys in `Morphology.add()`. * Check for '_' as empty analysis when adding morphs * Extend helper converters in Morphology Add and extend helper converters that convert and normalize between: * UD FEATS strings (`"Case=dat,gen\|Number=sing"`) * per-field dict of feats (`{"Case": "dat,gen", "Number": "sing"}`) * list of individual features (`["Case=dat", "Case=gen", "Number=sing"]`) All converters sort fields and values where applicable.	2020-01-23 22:01:54 +01:00
Matthew Honnibal	7d782aa97b	Add more docstrings for MorphAnalysis	2019-09-12 16:48:30 +02:00
Matthew Honnibal	41a3016019	Refactor morphologizer class map	2019-03-09 20:55:33 +01:00
Matthew Honnibal	9dceb97570	Extend morphanalysis API	2019-03-08 01:38:34 +01:00
Matthew Honnibal	3300e3d7ab	Implement more MorphAnalysis API	2019-03-08 00:09:16 +01:00
Matthew Honnibal	a40d73cb2a	Build out morphological analysis API	2019-03-07 21:59:25 +01:00
Matthew Honnibal	357066ee2f	Work on morphanalysis class	2019-03-07 18:32:51 +01:00
Matthew Honnibal	932d7dde1c	Fix compile error	2019-03-07 14:34:54 +01:00
Matthew Honnibal	b9ade7d4e0	Add MorphAnalysisC struct	2019-03-07 14:03:07 +01:00
Matthew Honnibal	d0ca64bb07	Fix imports in morphanalysis	2019-03-07 12:14:53 +01:00
Matthew Honnibal	21008ad2d8	Draft API for morphological analysis class	2019-03-07 10:45:24 +01:00

20 Commits