spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-17 04:30:49 +03:00

History

Paul O'Leary McCann 58bdd8607b Bump sudachipy version (#9917 ) * Edited Slovenian stop words list (#9707) * Noun chunks for Italian (#9662) * added it vocab * copied portuguese * added possessive determiner * added conjed Nps * added nmoded Nps * test misc * more examples * fixed typo * fixed parenth * fixed comma * comma fix * added syntax iters * fix some index problems * fixed index * corrected heads for test case * fixed tets case * fixed determiner gender * cleaned left over * added example with apostophe * French NP review (#9667) * adapted from pt * added basic tests * added fr vocab * fixed noun chunks * more examples * typo fix * changed naming * changed the naming * typo fix * Add Japanese kana characters to default exceptions (fix #9693) (#9742) This includes the main kana, or phonetic characters, used in Japanese. There are some supplemental kana blocks in Unicode outside the BMP that could also be included, but because their actual use is rare I omitted them for now, but maybe they should be added. The omitted blocks are: - Kana Supplement - Kana Extended (A and B) - Small Kana Extension * Remove NER words from stop words in Norwegian (#9820) Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations. Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data. See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831 * Bump sudachipy version * Update sudachipy versions * Bump versions Bumping to the most recent dictionary just to keep thing current. Bumping sudachipy to 5.2 because older versions don't support recent dictionaries. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Richard Hudson <richard@explosion.ai> Co-authored-by: Duygu Altinok <duygu@explosion.ai> Co-authored-by: Haakon Meland Eriksen <haakon.eriksen@far.no>		2022-01-17 08:16:22 +01:00
..
cli	Check for assets with size of 0 bytes (#10026 )	2022-01-12 10:34:23 +01:00
displacy	Displacy serve entity linking support without `manual=True` support. (#9748 )	2021-11-29 17:13:26 +01:00
lang	Bump sudachipy version (#9917 )	2022-01-17 08:16:22 +01:00
matcher	Entity ruler remove pattern (#9685 )	2021-12-06 15:32:49 +01:00
ml	MultiHashEmbed vector docs correction (#9918 )	2021-12-27 11:18:08 +01:00
pipeline	Speed up the StateC::L feature function (#10019 )	2022-01-13 09:03:55 +01:00
tests	Bump sudachipy version (#9917 )	2022-01-17 08:16:22 +01:00
tokens	Added sents property to Span for Spans spanning over several sentences (#9699 )	2021-12-06 09:58:01 +01:00
training	Exclude strings from v3.2+ source vector checks (#9697 )	2021-11-19 08:51:19 +01:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Tidy up and auto-format	2021-07-18 15:44:56 +10:00
__main__.py	Tidy up	2020-06-22 00:45:40 +02:00
about.py	Set version to v3.2.1 (#9823 )	2021-12-07 10:51:45 +01:00
attrs.pxd	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
attrs.pyx	Update Cython string types (#9143 )	2021-09-13 17:02:17 +02:00
compat.py	Custom component types in spacy.ty (#9469 )	2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg	Add new parameter for saving every n epoch in pretraining (#8912 )	2021-08-12 11:14:48 +02:00
default_config.cfg	Add a few docs to the default_config.cfg (#9981 )	2022-01-05 09:16:40 +01:00
errors.py	Fix references to config file in the docs & UX (#9961 )	2022-01-04 14:31:26 +01:00
glossary.py	Add glossary entry for _SP (#8983 )	2021-08-20 12:04:02 +02:00
kb.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
kb.pyx	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
language.py	Use Language.pipe in evaluate (#9800 )	2021-12-06 20:39:15 +01:00
lexeme.pxd	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
lexeme.pyi	fix type of lexeme.rank (#9979 )	2022-01-04 13:15:25 +01:00
lexeme.pyx	Update Cython string types (#9143 )	2021-09-13 17:02:17 +02:00
lookups.py	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
morphology.pxd	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
morphology.pyx	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
pipe_analysis.py	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
py.typed	Add py.typed	2021-03-16 09:48:31 +01:00
schemas.py	Allow Matcher to match on ENT_ID and ENT_KB_ID (#9688 )	2021-11-24 10:37:10 +01:00
scorer.py	Allow Scorer.score_spans to handle pred docs with missing annotation (#9701 )	2021-11-23 15:17:19 +01:00
strings.pxd	Update Cython string types (#9143 )	2021-09-13 17:02:17 +02:00
strings.pyi	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
strings.pyx	Update Cython string types (#9143 )	2021-09-13 17:02:17 +02:00
structs.pxd	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 )	2021-01-14 17:30:41 +11:00
symbols.pxd	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
symbols.pyx	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
tokenizer.pxd	Remove two attributes marked for removal in 3.1 (#9150 )	2021-09-15 23:07:21 +02:00
tokenizer.pyx	Update Tokenizer documentation to reflect token_match and url_match signatures (#9859 )	2021-12-15 09:34:33 +01:00
ty.py	Custom component types in spacy.ty (#9469 )	2021-10-21 15:31:06 +02:00
typedefs.pxd	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master	2020-11-25 11:49:34 +01:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Fix references to config file in the docs & UX (#9961 )	2022-01-04 14:31:26 +01:00
vectors.pyx	Make floret murmurhash endian-neutral (#9735 )	2021-12-20 17:11:31 +01:00
vocab.pxd	Add support for floret vectors (#8909 )	2021-10-27 14:08:31 +02:00
vocab.pyi	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
vocab.pyx	Add support for floret vectors (#8909 )	2021-10-27 14:08:31 +02:00