spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-06 07:19:45 +03:00

Author	SHA1	Message	Date
Ryn Daniels	d30ee14ab3	Pass the matrix branch to the checkout action (#10304 )	2022-02-16 15:39:42 +01:00
Adriane Boyd	22066f4e0f	Also exclude workflows from non-PR CI runs (#10305 )	2022-02-16 13:45:30 +01:00
Ryn Daniels	f6250015ab	Fix the datemath for reals (#10294 ) * add debugging branch and quotes to daily slowtest action * Apparently the quotes fixed it	2022-02-15 14:18:36 +01:00
Paul O'Leary McCann	23bd103d89	Add tmtoolkit setup steps	2022-02-14 15:17:25 +09:00
Markus Konrad	8818a44a39	add tmtoolkit package to spaCy universe (#10245 )	2022-02-14 15:16:43 +09:00
github-actions[bot]	5adedb8587	Auto-format code with black (#10260 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-02-11 14:23:01 +01:00
Adriane Boyd	9a06a210ec	Exclude github workflow edits from CI (#10261 )	2022-02-11 14:22:43 +01:00
Adriane Boyd	bbaf41fb3b	Set version to v3.2.2 (#10262 )	2022-02-11 11:45:26 +01:00
Edward	7961a0a959	Fix typo in errors (#10256 )	2022-02-10 13:45:46 +01:00
Ryn Daniels	2d6cabb23c	Fix the date command and the matrix failure mode (#10254 )	2022-02-10 12:06:30 +01:00
Peter Baumgartner	ee662ec381	Raise error in spacy package when model name is not a valid python identifier (#10192 ) * MultiHashEmbed vector docs correction * raise error for invalid identifier as model name * more succinct error message * update success message * permitted package name + double underscore * clarify package name error * clarify underscore run message * tweak language + simplify underscore run * cleanup underscore run warning * spacing correction * Update spacy/tests/test_cli.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-02-10 08:15:23 +01:00
Ryn Daniels	3877f78ff9	fix the syntax for the slow/gpu test crons (#10244 )	2022-02-09 11:21:20 +01:00
John Boy	10c77af83d	add textnets to spaCy universe (#10216 ) https://github.com/jboynyc/textnets/issues/38	2022-02-09 15:04:26 +09:00
Ines Montani	7b883da9fd	Merge pull request #10239 from explosion/docs/spacy-tailored-pipelines [ci skip]	2022-02-08 18:04:01 +01:00
Ramon Ziai	6477dafac2	fix(phrasematcher.pyi): change type annotation of `docs` in `add()` to `List[Doc]` (#10235 ) https://github.com/explosion/spaCy/issues/10234	2022-02-08 13:37:27 +01:00
Ines Montani	f2c2b97e56	Add spaCy Tailored Pipelines	2022-02-08 11:46:42 +01:00
Adriane Boyd	a9ee5bff98	Support mixed case model package names (#10223 )	2022-02-08 10:52:46 +01:00
Ryn Daniels	f939da0bfa	Add github actions for slow and gpu tests (#10225 ) * Add github actions for slow and gpu tests * change weekly GPU tests to also run slow tests, and change the time * only run the tests if there were commits in the past day	2022-02-08 10:05:35 +01:00
Sofie Van Landeghem	deb143fa70	Token sent attributes more consistent (#10164 ) * remove duplicate line * add sent start/end token attributes to the docs * let has_annotation work with IS_SENT_END * elif instead of if * add has_annotation test for sent attributes * fix typo * remove duplicate is_sent_start entry in docs	2022-02-08 08:35:37 +01:00
Peter Baumgartner	836f689cc7	YAML multiline tip for project.yml files (#10187 ) * MultiHashEmbed vector docs correction * add in multi-line tip * convert to sidebar tip	2022-02-08 08:35:09 +01:00
Kenneth Enevoldsen	e4625d2fc3	Added Augmenty to universe (#10229 ) * Added Augmenty to universe * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-02-08 08:32:11 +01:00
Lj Miranda	42072f4468	Add spancat pipeline in spacy debug data (#10070 ) * Setup debug data for spancat * Add check for missing labels * Add low-level data warning error * Improve logic when compiling the gold train data * Implement check for negative examples * Remove breakpoint * Remove ws_ents and missing entity checks * Fix mypy errors * Make variable name spans_key consistent * Rename pipeline -> component for consistency * Account for missing labels per spans_key * Cleanup variable names for consistency * Improve brevity of conditional statements * Remove unused variables * Include spans_key as an argument for _get_examples * Add a conditional check for spans_key * Update spancat debug data based on new API - Instead of using _get_labels_from_model(), I'm now using _get_labels_from_spancat() (cf. https://github.com/explosion/spaCy/pull10079) - The way information is displayed was also changed (text -> table) * Rename model_labels to ensure mypy works * Update wording on warning messages Use "span type" instead of "entity type" in wording the warning messages. This is because Spans aren't necessarily entities. * Update component type into a Literal This is to make it clear that the component parameter should only accept either 'spancat' or 'ner'. * Update checks to include actual model span_keys Instead of looking at everything in the data, we only check those span_keys from the actual spancat component. Instead of doing the filter inside the for-loop, I just made another dictionary, data_labels_in_component to hold this value. * Update spacy/cli/debug_data.py * Show label counts only when verbose is True Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-02-07 15:03:36 +01:00
Lj Miranda	72fece712f	Add shuffle parameter to Corpus API docs (#10220 ) * Add shuffle parameter to Corpus API docs * Update website/docs/api/corpus.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-02-07 14:55:53 +01:00
Adriane Boyd	63e1e4e8f6	Fix debug data check for ents that cross sents (#10188 ) * Fix debug data check for ents that cross sents * Use aligned sent starts to have the same indices for the NER and sent start annotation * Add a temporary, insufficient hack for the case where a sentence-initial reference token is split into multiple tokens in the predicted doc, since `Example.get_aligned("SENT_START")` currently aligns `True` to all the split tokens. * Improve test example * Use Example.get_aligned_sent_starts * Add test for crossing entity	2022-02-07 08:53:30 +01:00
github-actions[bot]	91ccacea12	Auto-format code with black (#10209 ) * Auto-format code with black * add black requirement to dev dependencies and pin to 22.x * ignore black dependency for comparison with setup.cfg Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>	2022-02-06 16:30:30 +01:00
Adriane Boyd	0668a449ba	Add Pipe.hide_labels to omit labels from pipeline meta (#10175 )	2022-02-05 17:59:24 +01:00
Adriane Boyd	6f551043e4	Use paths.vectors for vectors in init config (#10146 ) So that overriding `paths.vectors` works consistently in generated configs, set vectors model in `paths.vectors` and always refer to this path in `initialize.vectors`.	2022-02-04 21:09:48 +01:00
Kenneth Enevoldsen	a2f27ff83a	Added spacy-wrap to universe (#10168 ) * Added spacy-wrap to universe Added spacy-wrap to universe a small package for wrapping fine-tuned huggingface transformers to a spacy pipeline following the same API as spacy-transformers. (Currently limited to classification models) * Update website/meta/universe.json * Update website/meta/universe.json * Update website/meta/universe.json * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-02-03 12:30:09 +01:00
Lj Miranda	345e7f6bc4	Clarify Span.ents documentation (#10154 ) * Clarify Span.ents documentation Ref: #10135 Retain current behaviour. Span.ents will only include entities within said span. You can't get tokens outside of the original span. * Reword docstrings Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update API docs in the website Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-01-31 08:41:42 +01:00
Marek Šuppa	f09c799a96	fix: Add missing comma to `_eleven_to_beyond` (#10166 ) * This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.	2022-01-30 16:45:06 +09:00
Marek Šuppa	67ecac633f	fix: Add missing comma to `examples.py` (#10167 ) * This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.	2022-01-30 16:43:29 +09:00
Adriane Boyd	4f441dfa24	Fix infix as prefix in Tokenizer.explain (#10140 ) * Fix infix as prefix in Tokenizer.explain Update `Tokenizer.explain` to align with the `Tokenizer` algorithm: * skip infix matches that are prefixes in the current substring * Update tokenizer pseudocode in docs	2022-01-28 17:00:54 +01:00
Eduard Zorita	30cf9d6a05	Update typing hints (#10109 ) * Improve typing hints for Matcher.__call__ * Add typing hints for DependencyMatcher * Add typing hints to underscore extensions * Update Doc.tensor type (requires numpy 1.21) * Fix typing hints for Language.component decorator * Use generic np.ndarray type in Doc to avoid numpy version update * Fix mypy errors * Fix cyclic import caused by Underscore typing hints * Use Literal type from spacy.compat * Update matcher.pyi import format Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-28 16:59:54 +01:00
Adriane Boyd	09734c56fc	Use simple suggester for spancat initialization (#10143 ) Instead of the running the actual suggester, which may require annotation from annotating components that is not necessarily present in the reference docs, use the built-in 1-gram suggester.	2022-01-28 09:34:23 +01:00
github-actions[bot]	6d4db5c3c7	Auto-format code with black (#10106 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-01-21 10:01:10 +01:00
Ines Montani	34ed93ef68	Support version tags in universe and add note about reporting (#10093 ) * Support version tags in universe and add note about reporting * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-20 23:21:26 +01:00
Peter Baumgartner	a69005037a	Docker Image for Website Dev (#10098 ) * add docker instructions * Update website/README.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/README.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * clarifying language on docker image * fix markdown formatting Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-20 23:02:13 +01:00
Duygu Altinok	47a2916801	Intify IOB (#9738 ) * added iob to int * added tests * added iob strings * added error * blacked attrs * Update spacy/tests/lang/test_attrs.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/attrs.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * added iob strings as global * minor refinement with iob * removed iob strings from token * changed to uppercase * cleaned and went back to master version * imported iob from attrs * Update and format errors * Support and test both str and int ENT_IOB key Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-01-20 13:19:38 +01:00
Duygu Altinok	268ddf8a06	Add ENT_IOB key to Matcher (#9649 ) * added new field * added exception for IOb strings * minor refinement to schema * removed field * fixed typo * imported numeriacla val * changed the code bit * cosmetics * added test for matcher * set ents of moc docs * added invalid pattern * minor update to documentation * blacked matcher * added pattern validation * add IOB vals to schema * changed into test * mypy compat * cleaned left over * added compat import * changed type * added compat import * changed literal a bit * went back to old * made explicit type * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-01-20 13:18:39 +01:00
Paul O'Leary McCann	32bd3856b3	Rename FACILITY to FAC in color list (#10067 ) This matches the English models	2022-01-20 12:00:28 +01:00
Adriane Boyd	a55212fca0	Determine labels by factory name in debug data (#10079 ) * Determine labels by factory name in debug data For all components, return labels for all components with the corresponding factory name rather than for only the default name. For `spancat`, return labels as a dict keyed by `spans_key`. * Refactor for typing * Add test * Use assert instead of cast, removed unneeded arg * Mark test as slow	2022-01-20 11:42:52 +01:00
Richard Hudson	e9c6314539	Bugfix for similarity return types (#10051 )	2022-01-20 11:40:46 +01:00
Adriane Boyd	7d528e607c	Update quickstart install steps (#10092 ) * For conda: * Use conda environment rather than venv * Install `spacy-transformers` as a conda package * For pip: * Add quotes if extras are included	2022-01-20 10:53:40 +01:00
Paul O'Leary McCann	2ff53834bb	Add link to pattern file info in EntityRuler.initialize docs (#10091 ) * Add link to pattern file info in EntityRuler.initialize docs * Update website/docs/api/entityruler.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-19 10:45:11 +01:00
Daniël de Kok	50d2a2c930	User fewer Vector internals (#9879 ) * Use Vectors.shape rather than Vectors.data.shape * Use Vectors.size rather than Vectors.data.size * Add Vectors.to_ops to move data between different ops * Add documentation for Vector.to_ops	2022-01-18 17:14:35 +01:00
Adriane Boyd	4dfd559e55	Fix spaces in Doc.from_docs for empty docs (#10052 ) Fix spaces in `Doc.from_docs(ensure_whitespace=True)` for cases where an doc ending in whitespace is followed by an empty doc.	2022-01-18 17:12:42 +01:00
Paul O'Leary McCann	c28e33637b	Mark flaky spancat test so it doesn't fail the build (#10075 ) * Mark flaky spancat test so it doesn't fail the build * Skip, don't run and ignore	2022-01-18 09:36:28 +01:00
Adriane Boyd	39f1b13e77	Update sudachipy extras (#10072 ) By @polm, redone from #9917 after incorrect (reverted) rebase. `sudachipy>=0.5.2` is needed for newer dictionaries. `sudachipy<0.6.0` is kept for users who might still prefer the older version, in particular to be able to compile it without rust.	2022-01-17 11:48:39 +01:00
Adriane Boyd	add52935ff	Revert "Bump sudachipy version (#9917 )" (#10071 ) This reverts commit `58bdd8607b`.	2022-01-17 10:38:37 +01:00
Tuomo Hiippala	6a8619dd73	Update the entry for Applied Language Technology in spaCy Universe (#10068 ) * add entry for Applied Language Technology under "Courses" Added the following entry into `universe.json`: ``` { "type": "education", "id": "applt-course", "title": "Applied Language Technology", "slogan": "NLP for newcomers using spaCy and Stanza", "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.", "url": "https://applied-language-technology.readthedocs.io/", "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg", "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png", "author": "Tuomo Hiippala", "author_links": { "twitter": "tuomo_h", "github": "thiippal", "website": "https://www.mv.helsinki.fi/home/thiippal/" }, "category": ["courses"] }, ``` * Update the entry for "Applied Language Technology"	2022-01-17 08:28:51 +01:00

1 2 3 4 5 ...

15276 Commits