spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-09 00:39:45 +03:00

Author	SHA1	Message	Date
Ines Montani	cce428298b	Merge branch 'v2.x' into spacy.io	2021-02-01 11:48:56 +11:00
Ines Montani	c70e6ee72d	Fix code branch for v2.x site [ci skip]	2021-02-01 11:48:35 +11:00
Ines Montani	6daf2381fa	Update meta [ci skip]	2021-01-30 20:18:01 +11:00
Ines Montani	7d28fc121f	Update netlify.toml [ci skip]	2021-01-30 19:59:47 +11:00
Ines Montani	fba7550537	Set to legacy [ci skip]	2021-01-30 19:57:14 +11:00
Ines Montani	44dc987d85	Fix icon [ci skip]	2021-01-30 18:27:55 +11:00
Ines Montani	8d293a4c4b	Update website to support legacy state [ci skip]	2021-01-30 18:27:31 +11:00
Ines Montani	8ddf53f8e1	Merge pull request #6857 from tupui/patch-1	2021-01-30 12:07:05 +11:00
Pamphile ROY	e496b8623f	SCA tupui	2021-01-29 15:46:53 +01:00
Pamphile ROY	41ee75ac6d	Remove --no-cache-dir when downloading models When `--no-cache-dir` is present, it prevents caching to properly function. If the user still wants to do this, there is the possibility to pass options with `user_pip_args`. But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models.	2021-01-29 15:37:44 +01:00
Adriane Boyd	4096a79de7	Add alignment mode error and fix Doc.char_span docs (#6820 ) * Raise an error on an unrecognized alignment mode rather than defaulting to `strict` * Fix the `Doc.char_span` API doc alignment mode details	2021-01-27 23:40:42 +11:00
Ines Montani	d5ef245bb1	Merge pull request #6822 from jganseman/master [ci skip]	2021-01-27 13:04:30 +11:00
Ines Montani	560b7acece	Merge pull request #6802 from jumasheff/add-ky	2021-01-27 13:02:54 +11:00
jganseman	907bce7a78	Merge pull request #1 from jganseman/patch-1 Patch 1	2021-01-26 11:12:30 +01:00
jganseman	8bc57ec372	also update is_oov in lexeme docs	2021-01-26 11:09:16 +01:00
jganseman	c9103d60fa	Create jganseman.md	2021-01-26 11:02:31 +01:00
jganseman	1f2b0ec168	proposing a more concise explanation for is_oov proposing a more concise explanation for is_oov	2021-01-26 10:53:39 +01:00
muratjumashev	2b19ebad59	Remove Kyrgyz chars fr. char_classes since Tatar ones already cover	2021-01-25 00:46:45 +06:00
muratjumashev	7d0154a36e	Added language meta data	2021-01-25 00:42:19 +06:00
muratjumashev	79327197d1	Add contributor agreement	2021-01-25 00:34:12 +06:00
muratjumashev	87168eb81f	Add tests	2021-01-24 20:56:16 +06:00
muratjumashev	53abf759ad	Fix punctuation	2021-01-24 20:54:22 +06:00
muratjumashev	2a2646362b	Fix language subclass	2021-01-23 22:00:50 +06:00
muratjumashev	fe3b5b8ff5	Add kyrgyz to char_classes	2021-01-23 21:53:41 +06:00
muratjumashev	e30bbf5432	Add examples	2021-01-23 21:49:08 +06:00
muratjumashev	2f385385a9	Remove comment	2021-01-23 21:36:28 +06:00
muratjumashev	d53724ba1d	Add lex_attrs	2021-01-23 21:35:25 +06:00
muratjumashev	4418ec2eee	Add punctuation	2021-01-23 21:31:31 +06:00
muratjumashev	101d265778	Add stopwords	2021-01-23 21:25:28 +06:00
muratjumashev	28d06ab860	Add tokenizer_exceptions	2021-01-22 23:08:41 +06:00
Sofie Van Landeghem	5ace559201	ensure span.text works for an empty span (#6772 )	2021-01-21 23:18:46 +08:00
Sofie Van Landeghem	fdf8c77630	support IS_SENT_START in PhraseMatcher (#6771 ) * support IS_SENT_START in PhraseMatcher * add unit test and friendlier error * use IDS.get instead	2021-01-21 09:59:17 +01:00
Adriane Boyd	bc7d83d4be	Skip 0-length matches (#6759 ) Add hack to prevent matcher from returning 0-length matches.	2021-01-19 07:38:11 +08:00
Santiago Castro	28256522c8	Fix `spacy.util.minibatch` when the size iterator is finished (#6745 )	2021-01-17 19:48:43 +08:00
Adriane Boyd	e649242927	Prevent overlapping noun chunks for Spanish (#6712 ) * Prevent overlapping noun chunks in Spanish noun chunk iterator * Clean up similar code in Danish noun chunk iterator	2021-01-14 17:33:31 +11:00
Adriane Boyd	9957ed7897	Override language defaults for null token and URL match (#6705 ) * Override language defaults for null token and URL match When the serialized `token_match` or `url_match` is `None`, override the language defaults to preserve `None` on deserialization. * Fix fixtures in tests	2021-01-14 17:31:29 +11:00
Ines Montani	06c2eae08f	Merge branch 'master' into spacy.io	2021-01-14 13:38:59 +11:00
Ines Montani	29c3ca7e34	Fix SVG integration [ci skip]	2021-01-14 13:33:41 +11:00
Ines Montani	30cb6ced1e	Merge branch 'master' into spacy.io	2021-01-14 11:38:24 +11:00
Antonio Miras	b4bd8f347a	spaCy Universe: New project; SpacyDotNet (#6702 ) * Universe: SpacyDotNet a .NET Core spaCy wrapper * Signed contributor agreement Co-authored-by: Antonio Miras <antonio@amiras.net>	2021-01-13 12:47:30 +11:00
Alex Combessie	9cc880014c	Remove questionable French stopwords (#6310 ) * Remove questionable French stopwords * Create alexcombessie.md	2021-01-08 11:36:22 +11:00
Cristiana S Parada	7a0222f260	Update stop_words.py in Portuguese (a,o,e) (#6345 ) * Update stop_words.py Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and" * Create cristianasp.md * zero edit to push CI Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-01-08 11:35:38 +11:00
Lorena Ciutacu	f11002f1f1	add new Romanian stopwords (#6621 ) * add contributor agreement * update ro stopwords list * add new stopwords	2021-01-08 11:34:47 +11:00
ophelielacroix	e3222fdec9	Add (noun chunks) syntax iterators for Danish (#6246 ) * add syntax iterators for danish * add test noun chunks for danish syntax iterators * add contributor agreement * update da syntax iterators to remove nested chunks * add tests for da noun chunks * Fix test * add missing import * fix example * Prevent overlapping noun chunks Prevent overlapping noun chunks by tracking the end index of the previous noun chunk span. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-01-07 16:33:00 +11:00
Sofie Van Landeghem	6f7e7d88b9	remove cause without apostrophe from norm exceptions (#6636 )	2021-01-06 12:30:30 +08:00
Sofie Van Landeghem	fa3b374c8a	fix backticks in docs (#6635 )	2020-12-27 22:13:34 +01:00
Sofie Van Landeghem	87562e470d	fix backticks in docs (#6635 )	2020-12-27 22:12:37 +01:00
Sofie Van Landeghem	aa50aca519	fix documentation of 'path' in tokenizer.to_disk (#6634 )	2020-12-27 22:05:05 +01:00
Sofie Van Landeghem	8df5b7f513	fix documentation of 'path' in tokenizer.to_disk (#6634 )	2020-12-27 22:01:06 +01:00
Yosi	cf52510631	Add Amharic አማርኛ Language support (#6583 ) * Add Amharic to space * clean up * Add some PRON_LEMMA * add Tigrinya support * remove text_noun_chunks * Tigrinya Support * added some more details for ti * fix unit test * add amharic char range * changes from review * amharic and tigrinya share same unicode block * get rid of _amharic/_tigrinya in char_classes Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>	2020-12-22 16:50:34 +01:00

1 2 3 4 5 ...

11965 Commits