spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-13 14:25:52 +03:00

Author	SHA1	Message	Date
Richard Hudson	4b227f4861	Merge pull request #10669 from mgrojo/develop Fix some issues in Spanish stop-word list and examples	2022-04-19 09:37:34 +02:00
mgr	3d50b1a989	Fix some issues in Spanish examples - Spelling: nationalities in lowercase, accent. - Incorrect verb composition - Untranslated word	2022-04-18 22:12:57 +02:00
mgr	2a2654c756	Remove significant or not very frequent words from stop word list [es] The list of stop words for Spanish contained many inadequate words, see: https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100 Removed words: - verb forms of 'trabajar' (work) and intentar (try) - words related to 'empleo' (employment) - incorrect words: ampleamos, arribaabajo, soyos, paìs - miscellaneous words due to being too significant of too infrequent: actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general, pais, principalmente, raras Added other stop words for completion: - Spanish one-letter words - numbers up to twelve Some reformatting to 79 columns. When in doubt, the English and German lists have been consulted as good examples.	2022-04-18 22:04:02 +02:00
Madeesh Kannan	aa6780eb27	`Matcher`: Remove superfluous GIL-acquiring check in `get_is_final` (#10659 ) * `Matcher`: Remove superfluous GIL-acquiring check in `get_is_final` This check incurred a significant performance penalty due to implict interactions between the GIL and Cython ref-counting code. * `Matcher`: Inline `PatternStateC` accessors	2022-04-18 12:59:34 +02:00
Duy Ngo	229ecaf0ea	Add numbers and definitions (#10665 )	2022-04-18 12:58:32 +02:00
Paul O'Leary McCann	683f470852	Merge branch 'master' into feature/coref	2022-04-18 18:39:08 +09:00
Schero1994	d622883a42	Adding and updating content in the spacy universe (#10493 ) * signing contributor agreement * adding new content to the spaCy universe * updating outdated example codes * resolving issues for the PR * resolve review for klayers * remove contributor-agreement file from the PR * Update code example of spaCySentiWS Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy-sentiws code example Co-authored-by: schaeran <schaeran1994@gmail.com> Co-authored-by: schaeran <schaeran@explosion.ai> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-04-15 15:36:54 +02:00
Joachim Fainberg	4e1716223c	displaCy: Avoid increasing levels for identical arcs (#10639 ) * Test for arc levels for identical arcs Also moves the test in order with the other numbered tests. * displaCy: filter identical arcs Avoid increased levels due to identical arcs by first filtering any identical arcs. * Sort keys before filtering Manual entry with keys out of order would previously become different tuples and therefore not filtered correctly. Co-authored-by: Joachim Fainberg <joachimfainberg@Joachims-MBP.lan>	2022-04-14 16:48:00 +02:00
Philip Vollet	e63a5d4888	Update newsletter id (#10655 )	2022-04-14 13:34:01 +02:00
Paul O'Leary McCann	afd255c0ed	Undo multiply by 100 This was mistaken, not sure why my score seemed to be off before.	2022-04-14 18:42:09 +09:00
Paul O'Leary McCann	08729e0fbd	Remove end adjustment The difference in environments was due to a change in Thinc, the code here is fine.	2022-04-14 18:31:30 +09:00
fonfonx	028cbad05e	Add feminine form of word "one" in French (#10653 ) * Add French number * Add fonfonx.md * Add feminine ordinal words for French	2022-04-14 10:21:27 +02:00
Schero1994	caf8528af7	Batch #1 \| spaCy universe cleanup (#10642 ) * delete universe object: wmd-relax * delete universe object: spaCy.jl * delete universe object: saber * delete universe object: languagecrunch * delete universe object: gracyql * delete universe object: ExcelCy * delete universe object: EpiTator Co-authored-by: schaeran <schaeran1994@gmail.com>	2022-04-14 10:08:19 +02:00
single-fingal	4228f3c757	Fix a few minor bugs in the SpanGroup API web docs (#10650 ) * Fix a few minor bugs in the SpanGroup API web docs * Update SpanGroup docs examples to have Spans reflect intended "errors"	2022-04-14 09:59:48 +02:00
Paul O'Leary McCann	8181d4570c	Multiply accuracy by 100 This seems to match with the scorer expectations better	2022-04-14 15:56:38 +09:00
Paul O'Leary McCann	e8af02700f	Remove all coref scoring exept LEA This is necessary because one of the three old methods relied on scipy for some complex problem solving. LEA is generally better for evaluations. The downside is that this means evaluations aren't comparable with many papers, but canonical scoring can be supported using external eval scripts or other methods.	2022-04-13 21:02:18 +09:00
Paul O'Leary McCann	2300f4df3d	Fix span score logging	2022-04-13 20:37:06 +09:00
Paul O'Leary McCann	d470fa03c1	Adjust end indices It's not clear if this is technically correct or not but it won't run without it for me.	2022-04-13 20:19:21 +09:00
kadarakos	b53113e3b8	Preparing span predictor for predicting from gold (#10547 ) Note this is squashed because rebasing had conflicts. * remove unnecessary .device * span predictor debug start * gearing up SpanPredictor for gold-heads * merge SpanPredictor attributes * remove useless extra prefix and device from spanpredictor * make sure predicted and reference keeps aligned * handle empty head_ids * handle empty clusters * addressing suggestions by @polm * nicer restore * fix score overwriting bug * prepare for aligned heads-spans training * span accuracy score * update with eg.predited as other components * add backprop callback to spanpredictor * report start- and end-accuracies separately * fixing scorer Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl>	2022-04-13 19:42:49 +09:00
Adriane Boyd	64602d997d	Require srsly v2.4.3+ due to buffer overflow vulnerability (#10651 )	2022-04-13 11:41:40 +02:00
Richard Hudson	75fbbcdc18	Display warning when spacy.explain() finds no term (#10645 ) * Display warning when spacy.explain() finds no term * Updated warning message text	2022-04-12 10:48:28 +02:00
Kádár Ákos	6aedd98d02	fixing scorer	2022-04-11 16:10:14 +02:00
Kádár Ákos	7a239f2ec7	report start- and end-accuracies separately	2022-04-08 14:57:19 +02:00
Kádár Ákos	2a1ad4c5d2	add backprop callback to spanpredictor	2022-04-08 14:56:44 +02:00
David Berenstein	d4196a62f1	added crosslingual coreference to spacy universe without additional commits (#10580 ) * added crosslingual coreference to spacy universe * Updated example to introduce batching example. Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>	2022-04-08 08:23:58 +02:00
Madeesh Kannan	9ba3e1cb2f	Basic tests for the Tamil language (#10629 ) * Add basic tests for Tamil (ta) * Add comment Remove superfluous condition * Remove superfluous call to `pipe` Instantiate new tokenizer for special case	2022-04-07 14:47:37 +02:00
Kádár Ákos	3ba913109d	update with eg.predited as other components	2022-04-07 13:20:12 +02:00
Lj Miranda	02dafa3a84	Add debug diff command in spaCy CLI (#10502 ) * Add initial design for diff command For now, the diffing process looks like this: - The default config is created based from some values in the user config (e.g. which pipeline components were used, the lang, etc.) - The user must supply manually if it was optimized for acc/efficiency and if pretraining was involved. * Make diff command structure similar to siblings * Include gpu as a user option for CLI * Make variables more explicit * Fix type declaration for optimize enum * Improve docstrings for diff CLI * Add debug-diff to website API docs * Switch position of configs so that user config is modded * Add markdown flag for debug diff This commit adds a --markdown (--md) flag that allows easier copy-pasting to Github issues. Please note that this commit is dependent on an unreleased version of wasabi (for the time being). For posterity, the related PR is found here: https://github.com/ines/wasabi/pull/20 * Bump version of wasabi to 0.9.1 So that we can use the add_symbols parameter. * Apply suggestions from code review Co-authored-by: Ines Montani <ines@ines.io> * Update docs based on code review suggestions Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Change command name from diff -> diff-config * Clarify when options are relevant or not * Rerun prettier on cli.md Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-04-07 10:48:45 +02:00
Joachim Fainberg	b91255a454	displacy: avoid overlapping arcs in manual mode (#10534 ) * Added test for overlapping arcs * Provide distinct levels to overlapping arcs * Update return type hint for get_levels * Improved formatting spacy/displacy/render.py Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Joachim Fainberg <joachimfainberg@Joachims-MacBook-Pro.local> Co-authored-by: Ines Montani <ines@ines.io>	2022-04-05 09:08:02 +02:00
Kádár Ákos	ef141ad399	span accuracy score	2022-04-04 18:10:09 +02:00
Adriane Boyd	0d0153db63	Update default spans_key to sc in API docs (#10616 )	2022-04-04 18:09:15 +02:00
Bram Vanroy	f966bf6a15	Update to spacy_conll in universe (#10617 ) * update to spacy_conll * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-04-04 17:57:52 +02:00
Madeesh Kannan	cfd9217bae	Update link to flake8 config (#10620 ) * Update link to flake8 config * Run prettier	2022-04-04 17:35:37 +02:00
Kádár Ákos	a1d0219903	prepare for aligned heads-spans training	2022-04-04 15:26:15 +02:00
Adriane Boyd	849bef2de6	Merge pull request #10596 from adrianeboyd/chore/v3.3.0.dev0 Set version to v3.3.0.dev0	2022-04-04 09:18:07 +02:00
Adriane Boyd	e422101e00	Temporarily skip tests that require models/compat	2022-04-01 11:09:28 +02:00
Adriane Boyd	ca54de27bb	Support more internal methods for SpanGroup (#10476 ) * Added new convenience cython functions to SpanGroup to avoid unnecessary allocation/deallocation of objects * Replaced sorting in has_overlap with C++ for efficiency. Also, added a test for has_overlap * Added a method to efficiently merge SpanGroups * Added __delitem__, __add__ and __iadd__. Also, allowed to pass span lists to merge function. Replaced extend() body with call to merge * Renamed merge to concat and added missing things to documentation * Added operator+ and operator += in the documentation * Added a test for Doc deallocation * Update spacy/tokens/span_group.pyx * Updated SpanGroup tests to use new span list comparison function rather than assert_span_list_equal, eliminating the need to have a separate assert_not_equal fnction * Fixed typos in SpanGroup documentation Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Minor changes requested by Sofie: rearranged import statements. Added new=3.2.1 tag to SpanGroup.__setitem__ documentation * SpanGroup: moved repetitive list index check/adjustment in a separate function * Turn off formatting that hurts readability spacy/tests/doc/test_span_group.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Remove formatting that hurts readability spacy/tests/doc/test_span_group.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Turn off formatting that hurts readability in spacy/tests/doc/test_span_group.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Support more internal methods for SpanGroup Add support for: * `__setitem__` * `__delitem__` * `__iadd__`: for `SpanGroup` or `Iterable[Span]` * `__add__`: for `SpanGroup` only Adapted from #9698 with the scope limited to the magic methods. * Use v3.3 as new version in docs * Add new tag to SpanGroup.copy in API docs * Remove duplicate import * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Remaining suggestions and formatting Co-authored-by: nrodnova <nrodnova@hotmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Natalia Rodnova <4512370+nrodnova@users.noreply.github.com>	2022-04-01 09:56:26 +02:00
Adriane Boyd	d56b1400d2	Set version to v3.3.0.dev0	2022-04-01 09:54:52 +02:00
Daniël de Kok	c90dd6f265	Alignment: use a simplified ragged type for performance (#10319 ) * Alignment: use a simplified ragged type for performance This introduces the AlignmentArray type, which is a simplified version of Ragged that performs better on the simple(r) indexing performed for alignment. * AlignmentArray: raise an error when using unsupported index * AlignmentArray: move error messages to Errors * AlignmentArray: remove simlified ... with simplifications * AlignmentArray: fix typo that broke a[n:n] indexing	2022-04-01 09:02:06 +02:00
Adriane Boyd	03762b4b92	Add spancat, trainable_lemmatizer to quickstart (#10524 ) * Add `SPACY` and `IS_SPACE` as default `tok2vec` features	2022-04-01 09:01:04 +02:00
Adriane Boyd	7d1edc0c25	Merge pull request #10593 from adrianeboyd/chore/undo-click-pin Revert "Add click pin to avoid typer issues (#10573)"	2022-04-01 09:00:38 +02:00
Adriane Boyd	e3ccc1973b	Provide debug data info for floret vectors (#10592 )	2022-03-31 15:11:32 +02:00
Adriane Boyd	88933ca878	Revert "Add click pin to avoid typer issues (#10573 )" This reverts commit `9966e08f32`.	2022-03-31 14:16:21 +02:00
Kádár Ákos	63a41ba50a	fix score overwriting bug	2022-03-30 17:28:20 +02:00
Yunus Atahan	36d3af3013	Fixed typo in Turkish lang. (#10582 ) * added failing test case for the issue. * Fixed typo. * fixed typo in test. * added corrected typo word into test_tr_lex_attrs_capitals as param. Test passes. Also tried and confirmed that test is failing after fixing the typo in the test case I wrote. Deleted the test case for typo. Co-authored-by: Yunus Atahan <yunus.atahan@trmotor.local>	2022-03-30 13:16:08 +02:00
Adriane Boyd	f98b41c390	Add vector deduplication (#10551 ) * Add vector deduplication * Add `Vocab.deduplicate_vectors()` * Always run deduplication in `spacy init vectors` * Clean up a few vector-related error messages and docs examples * Always unique with numpy * Fix types	2022-03-30 08:54:23 +02:00
Adriane Boyd	9966e08f32	Add click pin to avoid typer issues (#10573 )	2022-03-29 11:15:24 +02:00
Kádár Ákos	7ff99a3acc	nicer restore	2022-03-28 18:16:41 +02:00
Kádár Ákos	06d680b269	addressing suggestions by @polm	2022-03-28 14:31:51 +02:00
Kádár Ákos	e4b4b67ef6	handle empty clusters	2022-03-28 11:29:00 +02:00

... 2 3 4 5 6 ...

15679 Commits