spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-22 14:10:02 +03:00

Author	SHA1	Message	Date
Ines Montani	15e6578f7d	Adjust formatting	2021-07-17 10:49:13 +10:00
Mario Šaško	47c5a63a83	Add TakeLab/spacy-udpipe to Universe (#8698 ) * Add TakeLab/spacy-udpipe to universe * Add SCA * Sign SCA	2021-07-16 11:18:09 +02:00
Mario Šaško	1ba2e8a646	Add TakeLab/spacy-udpipe to Universe (#8698 ) * Add TakeLab/spacy-udpipe to universe * Add SCA * Sign SCA	2021-07-16 11:15:52 +02:00
explosion-bot	eff3d1088b	Auto-format code with black	2021-07-16 08:03:36 +00:00
Adriane Boyd	e76e2addd1	Remove TrainablePipe as base class for Lemmatizer in API docs (#8725 )	2021-07-15 16:42:14 +02:00
Adriane Boyd	f5acc48111	Remove TrainablePipe as base class for Lemmatizer in API docs (#8725 )	2021-07-15 16:41:36 +02:00
Adriane Boyd	ac45c7c045	Add pre-commit to ignored requirements (#8728 )	2021-07-15 16:41:15 +02:00
Paul O'Leary McCann	9b63cbb775	Add extract spans import	2021-07-15 18:16:53 +09:00
jmyerston	993b0fab0e	Added ancient Greek language support (#8606 ) * Add ancient Greek language support Initial commit * Contributor Agreement * grc tokenizer test added and files formatted with black, unnecessary import removed Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Commas in lists fixed. __init__py added to test * Update lex_attrs.py * Update stop_words.py * Update stop_words.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-07-15 10:27:17 +02:00
Sofie Van Landeghem	77859beb99	spacy.ngram_range_suggester.v1 (#8699 )	2021-07-15 10:01:22 +02:00
Julien Rossi	e117573822	Adding noun_chunks to the DUTCH language model (nl) (#8529 ) * ✨ implement noun_chunks for dutch language * copy/paste FR and SV syntax iterators to accomodate UD tags * added tests with dutch text * signed contributor agreement * 🐛 fix noun chunks generator * built from scratch * define noun chunk as a single Noun-Phrase * includes some corner cases debugging (incorrect POS tagging) * test with provided annotated sample (POS, DEP) * ✅ fix failing test * CI pipeline did not like the added sample file * add the sample as a pytest fixture * Update spacy/lang/nl/syntax_iterators.py * Update spacy/lang/nl/syntax_iterators.py Code readability Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/tests/lang/nl/test_noun_chunks.py correct comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * finalize code * change "if next_word" into "if next_word is not None" Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-07-14 14:01:02 +02:00
Paul O'Leary McCann	e9626e38c1	Fix serialization test This test was failing not because the thing it was testing wasn't working, but because of the way span equality works. Span equality relies on doc equality, and doc equality is object identity, so spans from different docs will never be equal.	2021-07-14 18:37:34 +09:00
Paul O'Leary McCann	4a9dc00d86	Use relative indices for mentions Was using batch absolute indices to manage mentions, but extract_spans expects doc-relative ones.	2021-07-14 18:36:18 +09:00
Paul O'Leary McCann	3684f7fdfd	Remove comment from fixed test	2021-07-14 18:22:14 +09:00
Paul O'Leary McCann	f1796e4af7	Fix mention list bug There was an off-by-one error in how mentions are generated that would affect mentions at the end of a sentence. This was pretty nasty.	2021-07-14 18:19:00 +09:00
Ines Montani	8ca6c58625	Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip] Update spacy-stanza universe.json	2021-07-13 19:03:56 +10:00
Ines Montani	2a8eeed5da	Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip] Update spacy-stanza universe.json	2021-07-13 19:03:42 +10:00
thomashacker	aafb89df78	Update universe.json code_example	2021-07-13 10:22:49 +02:00
KennethEnevoldsen	e5127992a0	added agreement	2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen	94ce904e10	added missing comma	2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen	a81fcc81b0	added dacy to universe	2021-07-13 09:54:08 +02:00
Adriane Boyd	f9fd2889b7	Use 0-vector for OOV lexemes (#8639 )	2021-07-13 14:48:12 +10:00
Edward	8233359225	Fix preservation of spacy package meta (#8663 ) * update package meta with existing_meta and nlp_meta * Add spaCy contributor agreement * Added more info when creating readme	2021-07-12 11:18:52 +02:00
Paul O'Leary McCann	80a17071d3	Remove unused code	2021-07-11 18:46:39 +09:00
Paul O'Leary McCann	447c7070e3	Fix loss Accidentally deleted it	2021-07-10 22:45:25 +09:00
Paul O'Leary McCann	c25ec292a9	Cleanup	2021-07-10 22:42:55 +09:00
Paul O'Leary McCann	e00bd422d9	Fix span embeds Some of the lengths and backprop weren't right. Also various cleanup.	2021-07-10 21:38:53 +09:00
Paul O'Leary McCann	d7d317a1b5	Clean up span embedding code This is now cleaner and significantly faster. There's still some messy parts in the code (particularly variable names), will get to that later.	2021-07-10 19:59:08 +09:00
Paul O'Leary McCann	dc1f974d39	Merge branch 'master' into feature/coref	2021-07-10 18:10:40 +09:00
Paul O'Leary McCann	f34915c1e8	Use scatter_add to speed up span embed backprop This was the slowest part of the code, and using scatter_add here probably reduces the runtime by 50%.	2021-07-10 18:08:51 +09:00
Paul O'Leary McCann	1c70c87daf	Fix autoblack The conditional needs double equals.	2021-07-10 16:02:39 +09:00
Ines Montani	616f4de034	Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip] Make the autoblack job not run on forks	2021-07-10 16:41:59 +10:00
Paul O'Leary McCann	b8cdbb4bb6	Make the autoblack job not run on forks The autoblack job is an occasional cleanup job. If it runs on forks and those PRs are accepted the git history will be weird and that doesn't help anyone. The way to make the job not run on forks is a little non-obvious but based on this thread. https://github.com/prisma/prisma/issues/3539	2021-07-10 15:38:20 +09:00
Ines Montani	d8ae5750a6	Merge pull request #8665 from rynoV/patch-1 [ci skip]	2021-07-10 10:52:39 +10:00
Ines Montani	d4fecdfb82	Merge pull request #8665 from rynoV/patch-1 [ci skip]	2021-07-10 10:52:15 +10:00
Ines Montani	50000d37e4	Avoid double parentheses [ci skip]	2021-07-10 10:52:01 +10:00
Calum Sieppert	e2d53aa1a6	Typo fixes	2021-07-09 10:25:56 -06:00
Adriane Boyd	d8805a1073	Fix ru/uk lemmatizer mp with spawn (#8657 ) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible.	2021-07-09 15:36:56 +02:00
Adriane Boyd	b8e720fdb9	Fix Azerbaijani init, extend lang init tests (#8656 ) * Extend langs in initialize tests * Fix az init	2021-07-09 15:36:35 +02:00
Ines Montani	1c0ed22d1e	Merge pull request #8573 from julien-talkair/code-quality-pre-commit	2021-07-09 23:09:24 +10:00
Ines Montani	bbca56687f	Merge pull request #8655 from explosion/autoblack Auto-format code with black	2021-07-09 23:08:05 +10:00
explosion-bot	334f1f98d8	Auto-format code with black	2021-07-09 08:06:06 +00:00
Adriane Boyd	363230de19	Add Macedonian models to website (#8637 )	2021-07-08 09:32:41 +02:00
Adriane Boyd	1ee5bee29d	Add Macedonian models to website (#8637 )	2021-07-08 09:32:14 +02:00
Paul O'Leary McCann	d0b041aff4	Switch to using Thinc tuplify The tuplify code here was added to Thinc proper and that's been released, so no need to have it here any more.	2021-07-08 16:08:36 +09:00
Suqi Sun	20a2beafb5	Update pip	2021-07-08 15:09:52 +09:00
Suqi Sun	c61ecb6f7c	Update pip and code example	2021-07-08 15:09:52 +09:00
Suqi Sun	f011126ebd	Add forte to universe.json	2021-07-08 15:09:52 +09:00
Paul O'Leary McCann	1d9209d43a	Merge pull request #8547 from mylibrar/update-universe Add forte to universe.json	2021-07-08 14:59:49 +09:00
Ines Montani	cdc0d669c1	Add code preview for textcat_multilabel [ci skip]	2021-07-08 13:33:33 +10:00

... 8 9 10 11 12 ...

15355 Commits