spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-04 20:30:24 +03:00

Author	SHA1	Message	Date
Adriane Boyd	c155f333bb	Revert "Temporarily use v3.1.0 models in CI" This reverts commit `bd6433bbab`.	2021-11-02 14:25:05 +01:00
Adriane Boyd	53a3523910	Revert "Temporarily ignore W095 in assemble CLI CI test (#9460 )" This reverts commit `8db574e0b5`.	2021-11-02 14:24:54 +01:00
Adriane Boyd	4d5db737e9	Revert "Temporarily skip compat tests (#9594 )" This reverts commit `667572adca`.	2021-11-02 14:24:06 +01:00
Adriane Boyd	667572adca	Temporarily skip compat tests (#9594 )	2021-11-02 14:10:48 +01:00
Lj Miranda	f1bc655a38	Add initial Tagalog (tl) tests (#9582 ) * Add tl_tokenizer to test fixtures * Add tagalog tests	2021-11-02 08:35:49 +01:00
xxyzz	90ec820f05	Add WordDumb to spaCy Universe (#9572 ) * Add WordDumb to spaCy Universe * Add standalone category Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2021-11-01 18:38:41 +09:00
Bruce W. Lee (이웅성)	a4dcb68cf6	Adding LingFeat Software to spaCy Universe. (#9574 ) * add lingfeat in universe * add lingfeat in universe * Fix JSON * Minor cleanup Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2021-11-01 18:38:14 +09:00
Vasundhara	5279c7c4ba	Fix broken link to mappings-exceptions (#9573 )	2021-10-31 13:44:29 +09:00
svlandeg	87cf72d1c8	pass nO through	2021-10-29 17:38:11 +02:00
svlandeg	1cc0d05812	fixes	2021-10-29 17:10:07 +02:00
Adriane Boyd	bb26550e22	Fix StaticVectors after floret+mypy merge (#9566 )	2021-10-29 16:25:43 +02:00
Adriane Boyd	322635e371	Set version to v3.2.0 (#9565 )	2021-10-29 15:22:40 +02:00
svlandeg	dbaf68a439	formatting	2021-10-29 14:19:30 +02:00
svlandeg	87fb268f76	Merge remote-tracking branch 'upstream/master' into refactor/parser-gpu	2021-10-29 14:16:43 +02:00
Adriane Boyd	5e9db156c2	Merge pull request #9563 from adrianeboyd/chore/update-develop-from-master-v3.2-3 Update develop from master for v3.2	2021-10-29 14:08:14 +02:00
svlandeg	753f9ee685	cleanup	2021-10-29 13:25:15 +02:00
Adriane Boyd	2d430958e1	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-3	2021-10-29 12:18:15 +02:00
Paul O'Leary McCann	006df1ae1f	Clarify error when words are of wrong type (#9541 ) * Clarify error when words are of wrong type See #9437 * Update docs * Use try/except * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-10-29 12:08:40 +02:00
Paul O'Leary McCann	2fd8d616e7	Add docs section for spacy.cli.train.train (#9545 ) * Add section for spacy.cli.train.train * Add link from training page to train function * Ensure path in train helper * Update docs Co-authored-by: Ines Montani <ines@ines.io>	2021-10-29 10:36:34 +02:00
Adriane Boyd	5477453ea3	Docs for thinc-apple-ops (#9549 ) * Docs for thinc-apple-ops * Ignore thinc-apple-ops in reqs tests * Fix install quickstart * Add cupy cuda 113, 114 extras * Remove draft section Co-authored-by: Ines Montani <ines@ines.io>	2021-10-29 10:35:31 +02:00
Adriane Boyd	12974bf4d9	Add micro PRF for morph scoring (#9546 ) * Add micro PRF for morph scoring For pipelines where morph features are added by more than one component and a reference training corpus may not contain all features, a micro PRF score is more flexible than a simple accuracy score. An example is the reading and inflection features added by the Japanese tokenizer. * Use `morph_micro_f` as the default morph score for Japanese morphologizers. * Update docstring * Fix typo in docstring * Update Scorer API docs * Fix results type * Organize score list by attribute prefix	2021-10-29 10:29:29 +02:00
Philip Vollet	76173b0866	fixed typo and URL (#9560 )	2021-10-29 13:57:44 +09:00
Adriane Boyd	72dc63b3fb	Update for python 3.10 (#9519 ) * Update for python 3.10 * Update mac image * Update build constraints for python 3.10 * Add extras for cupy cuda 11.3-11.5 * Remove cupy-cuda115 extra * Require thinc>=8.0.12 * Switch CI to windows-2019 * Skip mypy for python 3.10	2021-10-28 15:32:06 +02:00
Adriane Boyd	554fa414ec	Require spacy-transformers v1.1 in transformers extra (#9557 ) So that the install/upgrade quickstart also upgrades `spacy-transformers` with `pip install spacy[transformers]`, require `spacy-transformers>=1.1.2` in the `transformers` extra.	2021-10-28 11:18:19 +02:00
Matthew Honnibal	79d5957c47	Xfail. 6 failures	2021-10-27 23:26:07 +02:00
Matthew Honnibal	6b5302cdf3	More xfail. 7 failures	2021-10-27 23:24:33 +02:00
Matthew Honnibal	7309e49286	Xfail beam stuff. 9 failures	2021-10-27 23:21:55 +02:00
Matthew Honnibal	880182afdb	Work on parser. 15 tests failing	2021-10-27 23:02:29 +02:00
Matthew Honnibal	af9a30b192	Keep working through errors	2021-10-27 17:13:11 +02:00
Matthew Honnibal	b67dd0cf89	Keep working through errors	2021-10-27 17:10:33 +02:00
Adriane Boyd	c053f158c5	Add support for floret vectors (#8909 ) * Add support for fasttext-bloom hash-only vectors Overview: * Extend `Vectors` to have two modes: `default` and `ngram` * `default` is the default mode and equivalent to the current `Vectors` * `ngram` supports the hash-only ngram tables from `fasttext-bloom` * Extend `spacy.StaticVectors.v2` to handle both modes with no changes for `default` vectors * Extend `spacy init vectors` to support ngram tables The `ngram` mode only supports vector tables produced by this fork of fastText, which adds an option to represent all vectors using only the ngram buckets table and which uses the exact same ngram generation algorithm and hash function (`MurmurHash3_x64_128`). `fasttext-bloom` produces an additional `.hashvec` table, which can be loaded by `spacy init vectors --fasttext-bloom-vectors`. https://github.com/adrianeboyd/fastText/tree/feature/bloom Implementation details: * `Vectors` now includes the `StringStore` as `Vectors.strings` so that the API can stay consistent for both `default` (which can look up from `str` or `int`) and `ngram` (which requires `str` to calculate the ngrams). * In ngram mode `Vectors` uses a default `Vectors` object as a cache since the ngram vectors lookups are relatively expensive. * The default cache size is the same size as the provided ngram vector table. * Once the cache is full, no more entries are added. The user is responsible for managing the cache in cases where the initial documents are not representative of the texts. * The cache can be resized by setting `Vectors.ngram_cache_size` or cleared with `vectors._ngram_cache.clear()`. * The API ends up a bit split between methods for `default` and for `ngram`, so functions that only make sense for `default` or `ngram` include warnings with custom messages suggesting alternatives where possible. * `Vocab.vectors` becomes a property so that the string stores can be synced when assigning vectors to a vocab. * `Vectors` serializes its own config settings as `vectors.cfg`. * The `Vectors` serialization methods have added support for `exclude` so that the `Vocab` can exclude the `Vectors` strings while serializing. Removed: * The `minn` and `maxn` options and related code from `Vocab.get_vector`, which does not work in a meaningful way for default vector tables. * The unused `GlobalRegistry` in `Vectors`. * Refactor to use reduce_mean Refactor to use reduce_mean and remove the ngram vectors cache. * Rename to floret * Rename to floret in error messages * Use --vectors-mode in CLI, vector init * Fix vectors mode in init * Remove unused var * Minor API and docstrings adjustments * Rename `--vectors-mode` to `--mode` in `init vectors` CLI * Rename `Vectors.get_floret_vectors` to `Vectors.get_batch` and support both modes. * Minor updates to Vectors docstrings. * Update API docs for Vectors and init vectors CLI * Update types for StaticVectors	2021-10-27 14:08:31 +02:00
Adriane Boyd	0c97ed2746	Rename ja morph features to Inflection and Reading (#9520 ) * Rename ja morph features to Inflection and Reading	2021-10-27 13:13:03 +02:00
Adriane Boyd	2ea9b58006	Ignore prefix in suffix matches (#9155 ) * Ignore prefix in suffix matches Ignore the currently matched prefix when looking for suffix matches in the tokenizer. Otherwise a lookbehind in the suffix pattern may match incorrectly due the presence of the prefix in the token string. * Move °[cfkCFK]. to a tokenizer exception * Adjust exceptions for same tokenization as v3.1 * Also update test accordingly * Continue to split . after °CFK if ° is not a prefix * Exclude new ° exceptions for pl * Switch back to default tokenization of "° C ." * Revert "Exclude new ° exceptions for pl" This reverts commit `952013a5b4`. * Add exceptions for °C for hu	2021-10-27 13:02:25 +02:00
Adriane Boyd	4170110ce7	Merge pull request #9540 from adrianeboyd/chore/update-develop-from-master-v3.2-1 Update develop from master for v3.2	2021-10-27 08:23:57 +02:00
Adriane Boyd	386dcada1c	Address random results in slow readers tests (#9544 ) * Set random seed for dataset shuffling * Use more dev examples for non-zero scores	2021-10-26 16:53:10 +02:00
Adriane Boyd	a803af9dfa	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
Matthew Honnibal	c538eaf1c8	Work through tests	2021-10-26 01:21:51 +02:00
Matthew Honnibal	d765a4f8ee	Cleaner handling of unseen classes	2021-10-25 22:34:29 +02:00
Matthew Honnibal	07a3581ff8	Support unseen classes in parser	2021-10-25 22:26:52 +02:00
Matthew Honnibal	4b5d1b53f6	Support unseen_classes in parser model	2021-10-25 22:21:17 +02:00
Matthew Honnibal	03018904ef	Work on parser model	2021-10-25 16:11:58 +02:00
Matthew Honnibal	9c4a04d0c5	Uncython	2021-10-25 12:51:32 +02:00
Matthew Honnibal	1921e86813	Uncython ner.pyx and dep_parser.pyx	2021-10-25 12:51:14 +02:00
Matthew Honnibal	45ca12f07a	Wire up parser model	2021-10-25 12:50:33 +02:00
Matthew Honnibal	71abe2e42d	Wire up tb_framework to new parser model	2021-10-25 12:50:20 +02:00
Matthew Honnibal	0279aa036a	Delete _precomputable_affine module	2021-10-25 12:28:57 +02:00
Matthew Honnibal	9b459f9ef2	Delete spacy.ml.parser_model	2021-10-25 12:28:31 +02:00
Matthew Honnibal	7b9c282469	Convert parser from cdef class	2021-10-25 12:28:13 +02:00
Matthew Honnibal	34aab9899f	Prepare to remove parser_model.pyx	2021-10-25 12:22:46 +02:00
Matthew Honnibal	de8c88babb	New progress on parser model refactor	2021-10-25 03:13:31 +02:00

... 14 15 16 17 18 ...

15952 Commits