spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-14 19:20:39 +03:00

Author	SHA1	Message	Date
Adriane Boyd	297dd82c86	Fix initial special cases for Tokenizer.explain (#10460 ) Add the missing initial check for special cases to `Tokenizer.explain` to align with `Tokenizer._tokenize_affixes`.	2022-03-11 10:50:47 +01:00
Peter Baumgartner	01ec6349ea	Add `path.mkdir` to custom component examples of `to_disk` (#10348 ) * add `path.mkdir` to examples * add ensure_path + mkdir * update highlights	2022-03-08 16:04:10 +01:00
Adriane Boyd	60520d8669	Fix types in API docs for moves in parser and ner (#10464 )	2022-03-08 13:51:11 +01:00
Adriane Boyd	b2bbefd0b5	Add Finnish, Korean, and Swedish models and Korean support notes (#10355 ) * Add Finnish, Korean, and Swedish models to website * Add Korean language support notes	2022-03-07 17:03:45 +01:00
David Berenstein	a6d5824e5f	added classy-classification package to spacy universe (#10393 ) * Update universe.json added classy-classification to Spacy universe * Update universe.json added classy-classification to the spacy universe resources * Update universe.json corrected a small typo in json * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update universe.json processed merge feedback * Update universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-03-07 12:47:26 +01:00
Paul O'Leary McCann	91acc3ea75	Fix entity linker batching (#9669 ) * Partial fix of entity linker batching * Add import * Better name * Add `use_gold_ents` option, docs * Change to v2, create stub v1, update docs etc. * Fix error type Honestly no idea what the right type to use here is. ConfigValidationError seems wrong. Maybe a NotImplementedError? * Make mypy happy * Add hacky fix for init issue * Add legacy pipeline entity linker * Fix references to class name * Add __init__.py for legacy * Attempted fix for loss issue * Remove placeholder V1 * formatting * slightly more interesting train data * Handle batches with no usable examples This adds a test for batches that have docs but not entities, and a check in the component that detects such cases and skips the update step as thought the batch were empty. * Remove todo about data verification Check for empty data was moved further up so this should be OK now - the case in question shouldn't be possible. * Fix gradient calculation The model doesn't know which entities are not in the kb, so it generates embeddings for the context of all of them. However, the loss does know which entities aren't in the kb, and it ignores them, as there's no sensible gradient. This has the issue that the gradient will not be calculated for some of the input embeddings, which causes a dimension mismatch in backprop. That should have caused a clear error, but with numpyops it was causing nans to happen, which is another problem that should be addressed separately. This commit changes the loss to give a zero gradient for entities not in the kb. * add failing test for v1 EL legacy architecture * Add nasty but simple working check for legacy arch * Clarify why init hack works the way it does * Clarify use_gold_ents use case * Fix use gold ents related handling * Add tests for no gold ents and fix other tests * Use aligned ents function (not working) This doesn't actually work because the "aligned" ents are gold-only. But if I have a different function that returns the intersection, then this will work as desired. * Use proper matching ent check This changes the process when gold ents are not used so that the intersection of ents in the pred and gold is used. * Move get_matching_ents to Example * Use model attribute to check for legacy arch * Rename flag * bump spacy-legacy to lower 3.0.9 Co-authored-by: svlandeg <svlandeg@github.com>	2022-03-04 09:17:36 +01:00
Adriane Boyd	8e93fa8507	Fix Vectors.n_keys for floret vectors (#10394 ) Fix `Vectors.n_keys` for floret vectors to match docstring description and avoid W007 warnings in similarity methods.	2022-03-01 09:21:25 +01:00
Sofie Van Landeghem	3f68bbcfec	Clean up loggers docs (#10351 ) * update docs to point to spacy-loggers docs * remove unused error code	2022-02-25 16:29:12 +01:00
Sam Edwardes	5f568f7e41	Updated spaCy universe for spacytextblob (#10335 ) * Updated spacytextblob in universe.json * Fixed json * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Added spacy_version tag to spacytextblob Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-02-24 14:18:10 +09:00
Sofie Van Landeghem	a16b14e591	Merge branch 'master' into copy/develop	2022-02-16 14:04:59 +01:00
Paul O'Leary McCann	23bd103d89	Add tmtoolkit setup steps	2022-02-14 15:17:25 +09:00
Markus Konrad	8818a44a39	add tmtoolkit package to spaCy universe (#10245 )	2022-02-14 15:16:43 +09:00
John Boy	10c77af83d	add textnets to spaCy universe (#10216 ) https://github.com/jboynyc/textnets/issues/38	2022-02-09 15:04:26 +09:00
Ines Montani	7b883da9fd	Merge pull request #10239 from explosion/docs/spacy-tailored-pipelines [ci skip]	2022-02-08 18:04:01 +01:00
Ines Montani	f2c2b97e56	Add spaCy Tailored Pipelines	2022-02-08 11:46:42 +01:00
Sofie Van Landeghem	deb143fa70	Token sent attributes more consistent (#10164 ) * remove duplicate line * add sent start/end token attributes to the docs * let has_annotation work with IS_SENT_END * elif instead of if * add has_annotation test for sent attributes * fix typo * remove duplicate is_sent_start entry in docs	2022-02-08 08:35:37 +01:00
Peter Baumgartner	836f689cc7	YAML multiline tip for project.yml files (#10187 ) * MultiHashEmbed vector docs correction * add in multi-line tip * convert to sidebar tip	2022-02-08 08:35:09 +01:00
Kenneth Enevoldsen	e4625d2fc3	Added Augmenty to universe (#10229 ) * Added Augmenty to universe * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-02-08 08:32:11 +01:00
Lj Miranda	72fece712f	Add shuffle parameter to Corpus API docs (#10220 ) * Add shuffle parameter to Corpus API docs * Update website/docs/api/corpus.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-02-07 14:55:53 +01:00
Sofie Van Landeghem	14513f82da	Merge pull request #10215 from explosion/master update develop	2022-02-06 13:45:41 +01:00
Kenneth Enevoldsen	a2f27ff83a	Added spacy-wrap to universe (#10168 ) * Added spacy-wrap to universe Added spacy-wrap to universe a small package for wrapping fine-tuned huggingface transformers to a spacy pipeline following the same API as spacy-transformers. (Currently limited to classification models) * Update website/meta/universe.json * Update website/meta/universe.json * Update website/meta/universe.json * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-02-03 12:30:09 +01:00
Lj Miranda	345e7f6bc4	Clarify Span.ents documentation (#10154 ) * Clarify Span.ents documentation Ref: #10135 Retain current behaviour. Span.ents will only include entities within said span. You can't get tokens outside of the original span. * Reword docstrings Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update API docs in the website Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-01-31 08:41:42 +01:00
Adriane Boyd	4f441dfa24	Fix infix as prefix in Tokenizer.explain (#10140 ) * Fix infix as prefix in Tokenizer.explain Update `Tokenizer.explain` to align with the `Tokenizer` algorithm: * skip infix matches that are prefixes in the current substring * Update tokenizer pseudocode in docs	2022-01-28 17:00:54 +01:00
Ines Montani	34ed93ef68	Support version tags in universe and add note about reporting (#10093 ) * Support version tags in universe and add note about reporting * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-20 23:21:26 +01:00
Peter Baumgartner	a69005037a	Docker Image for Website Dev (#10098 ) * add docker instructions * Update website/README.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/README.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * clarifying language on docker image * fix markdown formatting Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-20 23:02:13 +01:00
Sofie Van Landeghem	4465fe0306	Merge branch 'develop' into feature/master_copy	2022-01-20 13:36:17 +01:00
Duygu Altinok	268ddf8a06	Add ENT_IOB key to Matcher (#9649 ) * added new field * added exception for IOb strings * minor refinement to schema * removed field * fixed typo * imported numeriacla val * changed the code bit * cosmetics * added test for matcher * set ents of moc docs * added invalid pattern * minor update to documentation * blacked matcher * added pattern validation * add IOB vals to schema * changed into test * mypy compat * cleaned left over * added compat import * changed type * added compat import * changed literal a bit * went back to old * made explicit type * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-01-20 13:18:39 +01:00
Adriane Boyd	7d528e607c	Update quickstart install steps (#10092 ) * For conda: * Use conda environment rather than venv * Install `spacy-transformers` as a conda package * For pip: * Add quotes if extras are included	2022-01-20 10:53:40 +01:00
Paul O'Leary McCann	2ff53834bb	Add link to pattern file info in EntityRuler.initialize docs (#10091 ) * Add link to pattern file info in EntityRuler.initialize docs * Update website/docs/api/entityruler.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-19 10:45:11 +01:00
Daniël de Kok	50d2a2c930	User fewer Vector internals (#9879 ) * Use Vectors.shape rather than Vectors.data.shape * Use Vectors.size rather than Vectors.data.size * Add Vectors.to_ops to move data between different ops * Add documentation for Vector.to_ops	2022-01-18 17:14:35 +01:00
Tuomo Hiippala	6a8619dd73	Update the entry for Applied Language Technology in spaCy Universe (#10068 ) * add entry for Applied Language Technology under "Courses" Added the following entry into `universe.json`: ``` { "type": "education", "id": "applt-course", "title": "Applied Language Technology", "slogan": "NLP for newcomers using spaCy and Stanza", "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.", "url": "https://applied-language-technology.readthedocs.io/", "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg", "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png", "author": "Tuomo Hiippala", "author_links": { "twitter": "tuomo_h", "github": "thiippal", "website": "https://www.mv.helsinki.fi/home/thiippal/" }, "category": ["courses"] }, ``` * Update the entry for "Applied Language Technology"	2022-01-17 08:28:51 +01:00
ColleterVi	a784b12eff	fix: new restcountries url (#10043 ) Url extension "eu" and path "rest" are no longer available. Replacing them for a working url.	2022-01-13 20:25:06 +09:00
Sofie Van Landeghem	d8a3012539	Merge pull request #10037 from explosion/master Update develop with master	2022-01-12 12:29:23 +01:00
Ines Montani	a437ca6737	Update website to use new Algolia search API	2022-01-05 13:21:06 +01:00
Sofie Van Landeghem	067a44a417	Merge pull request #9987 from explosion/master Update develop with commits from master	2022-01-05 11:49:50 +01:00
Sofie Van Landeghem	56dcb39fb7	Fix references to config file in the docs & UX (#9961 ) * doc fixes around config file * fix typo * clarify default	2022-01-04 14:31:26 +01:00
Sam Edwardes	6f65e2b544	Added spacypdfreader to universe.json (#9963 )	2022-01-03 16:34:36 +09:00
Paul O'Leary McCann	f40e237c5a	Remove denomme from universe (#9952 ) Package seems to have been deleted.	2021-12-29 11:41:29 +01:00
Florian Cäsar	86e71e7b19	Fix Scorer.score_cats for missing labels (#9443 ) * Fix Scorer.score_cats for missing labels * Add test case for Scorer.score_cats missing labels * semantic nitpick * black formatting * adjust test to give different results depending on multi_label setting * fix loss function according to whether or not missing values are supported * add note to docs * small fixes * make mypy happy * Update spacy/pipeline/textcat.py Co-authored-by: Florian Cäsar <florian.caesar@pm.me> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>	2021-12-29 11:04:39 +01:00
Yoav Vollansky	9d63dfacfc	Update UNIVERSE.md (#9941 ) typo	2021-12-27 13:46:04 +01:00
Peter Baumgartner	72abf9e102	MultiHashEmbed vector docs correction (#9918 )	2021-12-27 11:18:08 +01:00
Edward	018827e9fd	Add healthsea to universe (#9838 ) * Add healthsea to universe * Update website/meta/universe.json * Add thumbnail * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-12-15 17:57:19 +01:00
Ines Montani	ba0fa7a64e	Support Google Sheets embeds in docs (#9861 )	2021-12-15 09:27:08 +01:00
Adriane Boyd	51a3b60027	Document Tagger neg_prefix, fix typo (#9821 )	2021-12-07 09:42:40 +01:00
Duygu Altinok	b56b9e7f31	Entity ruler remove pattern (#9685 ) * added ruler coe * added error for none existing pattern * changed error to warning * changed error to warning * added basic tests * fixed place * added test files * went back to error * went back to pattern error * minor change to docs * changed style * changed doc * changed error slightly * added remove to phrasem api * error key already existed * phrase matcher match code to api * blacked tests * moved comments before expr * corrected error no * Update website/docs/api/entityruler.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/entityruler.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-12-06 15:32:49 +01:00
Natalia Rodnova	472740d613	Added sents property to Span for Spans spanning over several sentences (#9699 ) * Added sents property to Span class that returns a generator of sentences the Span belongs to * Added description to Span.sents property * Update test_span to clarify the difference between span.sent and span.sents Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/tests/doc/test_span.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix documentation typos in spacy/tokens/span.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update Span.sents doc string in spacy/tokens/span.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Parametrized test_span_spans * Corrected Span.sents to check for span-level hook first. Also, made Span.sent respect doc-level sents hook if no span-level hook is provided * Corrected Span ocumentation copy/paste issue * Put back accidentally deleted lines * Fixed formatting in span.pyx * Moved check for SENT_START annotation after user hooks in Span.sents * add version where the property was introduced Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-12-06 09:58:01 +01:00
Narayan Acharya	1be8a4dab3	Displacy serve entity linking support without `manual=True` support. (#9748 ) * Add support for kb_id to be displayed via displacy.serve. The current support is only limited to the manual option in displacy.render * Commit to check pre-commit hooks are run. * Update spacy/displacy/__init__.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Changes as per suggestions on the PR. * Update website/docs/api/top-level.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/top-level.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * tag option as new from 3.2.1 onwards Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-11-29 17:13:26 +01:00
Adriane Boyd	6763cbfdc0	Update Catalan acknowledgements for v3.2 (#9763 )	2021-11-29 14:14:21 +01:00
Tuomo Hiippala	5c44533263	add entry for Applied Language Technology under "Courses" (#9755 ) Added the following entry into `universe.json`: ``` { "type": "education", "id": "applt-course", "title": "Applied Language Technology", "slogan": "NLP for newcomers using spaCy and Stanza", "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.", "url": "https://applied-language-technology.readthedocs.io/", "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg", "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png", "author": "Tuomo Hiippala", "author_links": { "twitter": "tuomo_h", "github": "thiippal", "website": "https://www.mv.helsinki.fi/home/thiippal/" }, "category": ["courses"] }, ```	2021-11-28 19:33:16 +09:00
Natalia Rodnova	a4c43e5c57	Allow Matcher to match on ENT_ID and ENT_KB_ID (#9688 ) * Added ENT_ID and ENT_KB_ID into the list of the attributes that Matcher matches on * Added ENT_ID and ENT_KB_ID to TEST_PATTERNS in test_pattern_validation.py. Disabled tests that I added before * Update website/docs/api/matcher.md * Format * Remove skipped tests Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-11-24 10:37:10 +01:00

1 2 3 4 5 ...

2818 Commits