spaCy/.github/contributors
Connor Brinton 657af5f91f
🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167)
* 🚨 Ignore all existing Mypy errors

* 🏗 Add Mypy check to CI

* Add types-mock and types-requests as dev requirements

* Add additional type ignore directives

* Add types packages to dev-only list in reqs test

* Add types-dataclasses for python 3.6

* Add ignore to pretrain

* 🏷 Improve type annotation on `run_command` helper

The `run_command` helper previously declared that it returned an
`Optional[subprocess.CompletedProcess]`, but it isn't actually possible
for the function to return `None`. These changes modify the type
annotation of the `run_command` helper and remove all now-unnecessary
`# type: ignore` directives.

* 🔧 Allow variable type redefinition in limited contexts

These changes modify how Mypy is configured to allow variables to have
their type automatically redefined under certain conditions. The Mypy
documentation contains the following example:

```python
def process(items: List[str]) -> None:
    # 'items' has type List[str]
    items = [item.split() for item in items]
    # 'items' now has type List[List[str]]
    ...
```

This configuration change is especially helpful in reducing the number
of `# type: ignore` directives needed to handle the common pattern of:
* Accepting a filepath as a string
* Overwriting the variable using `filepath = ensure_path(filepath)`

These changes enable redefinition and remove all `# type: ignore`
directives rendered redundant by this change.

* 🏷 Add type annotation to converters mapping

* 🚨 Fix Mypy error in convert CLI argument verification

* 🏷 Improve type annotation on `resolve_dot_names` helper

* 🏷 Add type annotations for `Vocab` attributes `strings` and `vectors`

* 🏷 Add type annotations for more `Vocab` attributes

* 🏷 Add loose type annotation for gold data compilation

* 🏷 Improve `_format_labels` type annotation

* 🏷 Fix `get_lang_class` type annotation

* 🏷 Loosen return type of `Language.evaluate`

* 🏷 Don't accept `Scorer` in `handle_scores_per_type`

* 🏷 Add `string_to_list` overloads

* 🏷 Fix non-Optional command-line options

* 🙈 Ignore redefinition of `wandb_logger` in `loggers.py`

*  Install `typing_extensions` in Python 3.8+

The `typing_extensions` package states that it should be used when
"writing code that must be compatible with multiple Python versions".
Since SpaCy needs to support multiple Python versions, it should be used
when newer `typing` module members are required. One example of this is
`Literal`, which is available starting with Python 3.8.

Previously SpaCy tried to import `Literal` from `typing`, falling back
to `typing_extensions` if the import failed. However, Mypy doesn't seem
to be able to understand what `Literal` means when the initial import
means. Therefore, these changes modify how `compat` imports `Literal` by
always importing it from `typing_extensions`.

These changes also modify how `typing_extensions` is installed, so that
it is a requirement for all Python versions, including those greater
than or equal to 3.8.

* 🏷 Improve type annotation for `Language.pipe`

These changes add a missing overload variant to the type signature of
`Language.pipe`. Additionally, the type signature is enhanced to allow
type checkers to differentiate between the two overload variants based
on the `as_tuple` parameter.

Fixes #8772

*  Don't install `typing-extensions` in Python 3.8+

After more detailed analysis of how to implement Python version-specific
type annotations using SpaCy, it has been determined that by branching
on a comparison against `sys.version_info` can be statically analyzed by
Mypy well enough to enable us to conditionally use
`typing_extensions.Literal`. This means that we no longer need to
install `typing_extensions` for Python versions greater than or equal to
3.8! 🎉

These changes revert previous changes installing `typing-extensions`
regardless of Python version and modify how we import the `Literal` type
to ensure that Mypy treats it properly.

* resolve mypy errors for Strict pydantic types

* refactor code to avoid missing return statement

* fix types of convert CLI command

* avoid list-set confustion in debug_data

* fix typo and formatting

* small fixes to avoid type ignores

* fix types in profile CLI command and make it more efficient

* type fixes in projects CLI

* put one ignore back

* type fixes for render

* fix render types - the sequel

* fix BaseDefault in language definitions

* fix type of noun_chunks iterator - yields tuple instead of span

* fix types in language-specific modules

* 🏷 Expand accepted inputs of `get_string_id`

`get_string_id` accepts either a string (in which case it returns its 
ID) or an ID (in which case it immediately returns the ID). These 
changes extend the type annotation of `get_string_id` to indicate that 
it can accept either strings or IDs.

* 🏷 Handle override types in `combine_score_weights`

The `combine_score_weights` function allows users to pass an `overrides` 
mapping to override data extracted from the `weights` argument. Since it 
allows `Optional` dictionary values, the return value may also include 
`Optional` dictionary values.

These changes update the type annotations for `combine_score_weights` to 
reflect this fact.

* 🏷 Fix tokenizer serialization method signatures in `DummyTokenizer`

* 🏷 Fix redefinition of `wandb_logger`

These changes fix the redefinition of `wandb_logger` by giving a 
separate name to each `WandbLogger` version. For 
backwards-compatibility, `spacy.train` still exports `wandb_logger_v3` 
as `wandb_logger` for now.

* more fixes for typing in language

* type fixes in model definitions

* 🏷 Annotate `_RandomWords.probs` as `NDArray`

* 🏷 Annotate `tok2vec` layers to help Mypy

* 🐛 Fix `_RandomWords.probs` type annotations for Python 3.6

Also remove an import that I forgot to move to the top of the module 😅

* more fixes for matchers and other pipeline components

* quick fix for entity linker

* fixing types for spancat, textcat, etc

* bugfix for tok2vec

* type annotations for scorer

* add runtime_checkable for Protocol

* type and import fixes in tests

* mypy fixes for training utilities

* few fixes in util

* fix import

* 🐵 Remove unused `# type: ignore` directives

* 🏷 Annotate `Language._components`

* 🏷 Annotate `spacy.pipeline.Pipe`

* add doc as property to span.pyi

* small fixes and cleanup

* explicit type annotations instead of via comment

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2021-10-14 15:21:40 +02:00
..
5hirish.md Added Adam project to spaCy Universe (#2275) 2018-04-30 22:25:01 +02:00
0x2b3bfa0.md Create 0x2b3bfa0.md (#6916) 2021-02-04 23:25:11 +01:00
aajanki.md Improvements to the Finnish language data (#4738) 2019-12-03 12:55:28 +01:00
aaronkub.md fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
aashishg.md Added numbers to ../lang/hi/lex_attrs.py (#2629) 2018-08-08 16:06:11 +02:00
abchapman93.md Add VA COVID-19 NLP project to spaCy Universe (#5777) 2020-07-19 13:35:31 +02:00
abhi18av.md Create abhi18av.md 2017-11-13 17:23:05 +05:30
adrianeboyd.md Update TIGER/German dependency relations in documentation (#3204) 2019-01-30 14:23:12 +01:00
adrienball.md Fix egg fragments in direct download (#3369) 2019-03-07 21:07:19 +01:00
ajrader.md Correction of default lemmatizer lookup in English (Issue # 4104) (#4110) 2019-08-15 11:39:10 +02:00
akki2825.md add kannada support (#3264) 2019-02-12 18:28:39 +01:00
akornilo.md Update gold corpus code to properly ingest a directory of jsonl… (#4067) 2019-08-02 09:58:51 +02:00
alexcombessie.md Remove questionable French stopwords (#6310) 2021-01-08 11:36:22 +11:00
alexvy86.md Fix code sample for Doc.set_extension (#2282) 2018-05-02 10:16:05 +02:00
aliiae.md Add Tatar Language Support (#2444) 2018-06-19 10:17:53 +02:00
AlJohri.md sign contributor agreement for AlJohri (#4839) [ci skip] 2019-12-29 14:17:28 +01:00
alldefector.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
ALSchwalm.md Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) 2018-11-28 19:49:33 +01:00
alvaroabascar.md Fix issue 2396 (#3089) 2018-12-29 18:05:52 +01:00
alvations.md Create alvations.md (#3119) 2019-01-05 13:11:06 +01:00
AMArostegui.md spaCy Universe: New project; SpacyDotNet (#6702) 2021-01-13 12:47:30 +11:00
ameyuuno.md added contributor agreement ameyuuno.md (#3925) 2019-07-09 10:09:52 +02:00
amitness.md Fix broken link to Dive Into Python 3 website (#3656) 2019-04-29 19:44:00 +02:00
amperinet.md add small fix for French lemmatizer (#3206) 2019-01-31 23:44:10 +01:00
aniruddha-adhikary.md update bengali token rules for hyphen and digits (#2731) 2018-09-05 21:49:00 +02:00
ansgar-t.md escape html in displacy.render (#2378) (closes #2361) 2018-05-28 18:36:41 +02:00
aongko.md Update Indonesian model (#2752) 2018-09-14 12:30:32 +02:00
aristorinjuang.md adding more words and rephrasing (#2351) 2018-05-24 11:40:57 +02:00
armsp.md Default code for Setting Entity annotations on the website errors (#7738) 2021-04-21 09:16:32 +02:00
Arvindcheenu.md Added Tamil Example Sentences (#5583) 2020-06-13 15:56:26 +02:00
aryaprabhudesai.md Create aryaprabhudesai.md (#2681) 2018-08-20 18:56:14 +02:00
askhogan.md Update example and sign contributor agreement (#3916) 2019-07-08 10:27:20 +02:00
avadhpatel.md Signed contributor agreement 2018-01-17 06:33:37 -06:00
avramandrei.md Added RONEC to spaCy Universe (#4151) 2019-08-20 14:46:07 +02:00
AyushExel.md W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429) 2021-04-01 19:36:23 +02:00
Azagh3l.md Create Azagh3l.md (#3836) 2019-06-11 10:58:32 +02:00
azarezade.md add contributors.md 2018-01-23 13:47:30 +03:30
b1uec0in.md Fix error when Korean text contains regexp special characters. (#4022) 2019-07-25 17:53:33 +02:00
Baciccin.md Add Ligurian language 2020-03-19 21:37:01 -07:00
bbieniek.md added contribution license 2021-08-19 21:45:18 +02:00
bdewilde.md Add contributor agreement 2017-11-20 11:28:31 -06:00
beatesi.md Updated wordforms for Norwegian lemmatizer (#3007) 2018-12-06 15:46:18 +01:00
bellabie.md Fix filename 2019-03-16 13:46:58 +01:00
Bharat123rox.md Made changes suggested by @ines 2019-03-20 07:43:19 +05:30
BigstickCarpet.md Better formatting for spacy train CLI (#2357) 2018-05-25 13:08:45 +02:00
bintay.md most_similar() return the k most similar vectors (#4364) 2019-10-03 14:09:44 +02:00
bittlingmayer.md Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct (#5910) 2020-08-12 15:36:14 +02:00
bjascob.md Update Universe Website for pyInflect (#3641) 2019-04-26 13:17:36 +02:00
bodak.md Add hmrb to spaCy Universe (#8129) 2021-05-31 18:40:48 +10:00
boena.md Updates to Swedish Language (#3164) 2019-01-16 13:45:50 +01:00
borijang.md Include Macedonian language (#6230) 2020-10-15 15:55:01 +02:00
BramVanroy.md Documentation improvement regarding joblib and SO (#2867) 2018-10-24 15:19:17 +02:00
bratao.md spaCy v3 is not saving the best version in training loop (#6629) 2021-01-06 12:51:30 +11:00
BreakBB.md Fix symlink creation to show error message on failure (#3589) (resolves #3307)) 2019-04-16 11:58:31 +02:00
Bri-Will.md Adds contributor agreement for Bri-Will 2017-12-11 14:38:37 -08:00
Brixjohn.md Added alpha support for Tagalog language (#3062) 2018-12-18 13:08:38 +01:00
broaddeep.md Support match alignments (#7321) 2021-04-08 18:10:14 +10:00
bryant1410.md Fix website docs for Vectors.from_glove (#3565) 2019-04-10 15:23:27 +02:00
bsweileh.md Update _training.md - Fix broken link on backpropagation (#7431) 2021-03-15 09:21:35 +01:00
btrungchi.md Fix loading tokenizer with custom prefix search (#2495) 2018-07-04 12:56:07 +02:00
calumcalder.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
cbilgili.md Adds Canbey Bilgili's Contributor Agreement 2017-12-01 17:27:41 +03:00
cclauss.md Create cclauss.md 2017-11-20 14:57:30 +01:00
cedar101.md Korean support (#3901) 2019-07-09 22:23:16 +02:00
celikomer.md Signed agreement (#3577) 2019-04-11 11:31:27 +02:00
ceteri.md Submitting PyTextRank for inclusion in the spaCy uniVerse (#4942) 2020-01-28 11:37:54 +01:00
charlax.md Add charlax's contributor agreement (#2805) 2018-09-27 12:24:42 +02:00
chezou.md Upadate the document for Unidic link with latest version URL (#3022) 2018-12-07 17:24:48 +01:00
chopeen.md [Closes #5292] Fix typo in option name "--n-save_every" (#5293) 2020-04-11 23:35:01 +02:00
chrisdubois.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
cicorias.md fixes symbolic link on py3 and windows (#2949) 2018-11-24 15:34:23 +01:00
Cinnamy.md Correcting lang/ru/examples.py (#2845) 2018-10-13 15:19:43 +02:00
clarus.md Typo (#3865) 2019-06-20 10:31:19 +02:00
clippered.md issue #3012: add test (#3021) 2018-12-18 15:02:49 +01:00
connorbrinton.md 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
coryhurst.md Silent keyword in info function in init (#2459) 2018-06-18 12:24:21 +02:00
cristianasp.md Update stop_words.py in Portuguese (a,o,e) (#6345) 2021-01-08 11:35:38 +11:00
d99kris.md Rename d99kris to d99kris.md 2017-12-17 13:44:55 +01:00
danielhers.md Signed contributor agreement 2017-11-08 16:28:56 +02:00
danielkingai2.md Don't use numpy directly for similarity (#3362) 2019-03-06 22:58:38 +00:00
danielruf.md chore: cache dependencies (#2418) 2018-06-11 00:22:41 +02:00
danielvasic.md Added Multext-East V5 tagset for Croatian language (#6248) 2020-11-05 12:19:22 +01:00
dardoria.md Bulgarian tokenizer exceptions (#7114) 2021-02-19 19:19:19 +01:00
darindf.md Fix error (#2802) 2018-09-26 21:31:03 +02:00
delzac.md Reflect on usage doc that IS_SENT_START attribute exist (#6114) 2020-10-09 10:14:40 +02:00
demfier.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
demongolem.md Update tokenizer.md for construction example (#3790) 2019-06-16 14:32:56 +02:00
DeNeutoy.md Allow vectors to be optional in init-model, more robust string counting (#3155) 2019-01-14 23:48:30 +01:00
dhpollack.md fix typo in svg file 2020-03-05 17:04:33 +01:00
dhruvrnaik.md Fix Span.char_span bug (#6816) 2021-01-26 15:50:37 +08:00
DimaBryuhanov.md DimaBryuhanov.md (#2590) 2018-07-24 18:43:27 +02:00
Dobita21.md Create Dobita21.md (#3614) 2019-04-18 12:51:54 +02:00
DoomCoder.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
doug-descombaz.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
drndos.md Add Slovak language tools implementation (#4943) 2020-02-03 13:03:59 +01:00
DuyguA.md added contributor agreement for DuyguA 2017-11-13 15:45:13 +01:00
dvsrepo.md Adds contributor agreement dvsrepo 2017-04-07 11:58:28 +02:00
EARL_GREYT.md fix typo in first token (#4327) 2019-09-27 14:49:36 +02:00
elbaulp.md Changed learning rate by its param name. (#3855) 2019-06-20 10:29:20 +02:00
elben10 Fixes #5413 (#5315) 2020-04-16 13:29:02 +02:00
Eleni170.md Add support for Greek language (#2535) 2018-07-10 13:48:38 +02:00
EmilStenstrom.md Add abbreviations from UD_Swedish-Talbanken (#2613) 2018-08-07 13:53:17 +02:00
emulbreh.md Add contributor agreement for emulbreh 2018-02-13 13:40:33 +01:00
enerrio.md add contributor agreement for @enerrio 2018-02-15 12:43:04 -08:00
er-raoniz.md Fix example sentences in Hindi for grammatical errors (#4343) 2019-09-30 23:32:49 +02:00
erip.md Add initial Korean support (#4660) 2019-11-18 12:56:07 +01:00
estr4ng7d.md Marathi Language Support (#3767) 2019-05-24 14:29:42 +02:00
ezorita.md Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
F0rge1cE.md Fix offset bug in loading pre-trained word2vec. (#3689) 2019-05-06 23:00:38 +02:00
FallakAsad.md Bugfix/issue 3968 (#3982) 2019-07-18 00:20:32 +02:00
filipecaixeta.md Add words to portuguese language _num_words (#2759) 2018-09-14 12:30:16 +02:00
fizban99.md Create fizban99.md (#3601) 2019-04-17 11:22:19 +02:00
florijanstamenkovic.md Fix Issue 6207 (#6208) 2020-10-09 10:14:40 +02:00
forest1988.md Avoid a SyntaxError in self-attentive-parser (#6428) 2020-11-22 21:59:37 +01:00
foufaster.md Create foufaster.md (#3179) 2019-01-21 15:45:54 +01:00
frascuchon.md Include universe spec for spacy-wordnet component (#2919) 2018-11-13 23:54:46 +01:00
free-variation.md Fixed spaCy+Keras example (#2763) 2018-09-15 13:06:39 +02:00
fsonntag.md Add contributer aggreement 2017-11-19 16:30:35 +01:00
fucking-signup.md Add contributor agreement 2018-01-08 03:08:57 +01:00
gandersen101.md Adding spaczz package to universe.json (#5717) 2020-07-07 20:55:24 +02:00
gavrieltal.md Initialize trues to 0.0 in training example (#3004) 2018-12-03 01:33:22 +01:00
giannisdaras.md Greek language optimizations (#2558) 2018-07-18 18:51:38 +02:00
GiorgioPorgio.md Port over contributor agreement from spacy-lookups-data [ci skip] 2019-10-25 13:06:10 +02:00
Gizzio.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
graue70.md Fix typos in comments (#5904) 2020-08-12 15:35:25 +02:00
graus.md adds textpipe to universe (#3500) [ci skip] 2019-03-28 15:13:19 +01:00
greenriverrus.md Added contributor agreement 2017-11-26 22:14:08 +03:00
grivaz.md Introduces a bulk merge function, in order to solve issue #653 (#2696) 2018-09-10 16:41:42 +02:00
gtoffoli.md Added Italian POS-aware lemmatizer. (#8079) 2021-06-16 11:14:45 +02:00
guerda.md Update guerda.md 2020-03-24 10:42:30 +01:00
GuiGel.md Bugfix/fix entity ruler from disk (#4670) 2019-11-21 16:26:37 +01:00
gustavengstrom.md Adding noun_chunks to the Swedish language model (sv) (#4422) 2019-10-21 12:57:06 +02:00
Hazoom.md Improve speed of _merge method (#4300) 2019-09-18 21:34:34 +02:00
henry860916.md update response after calling add_pipe (#3661) 2019-05-01 12:02:18 +02:00
hertelm.md Website: fixed the token span in the text about the rule-based matching example (#5669) 2020-06-30 19:58:55 +02:00
himkt.md fix wrong indexing (#2416) 2018-06-19 10:20:57 +02:00
HiromuHota.md Tags are joined with a comma and padded with asterisks (#3491) 2019-03-28 16:17:31 +01:00
hiroshi-matsuda-rit.md fix a bug causing mis-alignments (#5560) 2020-06-08 15:49:34 +02:00
hlasse.md add textdescriptives to universe 2021-08-13 14:35:18 +02:00
holubvl3.md Create holubvl3 (#5845) 2020-07-30 17:40:31 +02:00
honnibal.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
howl-anderson.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
hugovk.md CLA 2017-11-29 10:25:20 +02:00
iann0036.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
ICLRandD.md Add entry for Blackstone in universe.json (#4101) 2019-08-09 17:16:51 +02:00
idealley.md Added agrement (#2374) 2018-05-26 18:19:08 +02:00
idoshr.md Hebrew like num (#5952) 2020-08-24 14:30:05 +02:00
iechevarria.md Add n_process to Language.pipe documentation (#4842) [ci skip] 2019-12-29 14:23:33 +01:00
ilivans.md Add ilivans' contributor agreement 2020-05-14 15:59:06 +02:00
ines.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
intrafindBreno.md Create intrafindBreno.md (#3814) 2019-06-03 18:33:09 +02:00
IsaacHaze.md Adds contributor agreement IsaacHaze 2017-12-10 23:15:06 +01:00
isaric.md Issue #1107 - adds examples.py for Croatian language (#4143) 2019-08-18 23:04:41 +02:00
iurshina.md Fixes typos (#4843) 2019-12-29 14:24:13 +01:00
ivigamberdiev.md Update links and http -> https (#3532) 2019-04-02 17:36:22 +02:00
ivyleavedtoadflax.md Add missing comma to NN example in docs (#2255) 2018-04-28 14:56:00 +02:00
jabortell.md Add jabortell to the contributors (#6422) 2020-11-24 16:15:31 +01:00
jacopofar.md Visual C++ link updated (#2842) (closes #2841) [ci skip] 2018-10-12 14:59:45 +02:00
jacse.md Extend and fix Danish examples (#5227) 2020-04-02 10:42:35 +02:00
Jan-711.md Fix/Improve german stop words (#5024) 2020-02-17 18:59:22 +01:00
janimo.md Update Romanian stopword list (#2316) 2018-05-10 12:16:56 +02:00
jankrepl.md Add agreement 2021-03-09 10:57:32 +01:00
JannisTriesToCode.md Documentation Typo Fix (#5492) 2020-05-22 19:50:26 +02:00
jarib.md Add three missing tags from the nb tag map (#3085) 2018-12-27 14:48:40 +01:00
jaydeepborkar.md Update stop_words.py and add name in contributors (#4325) 2019-09-27 11:57:27 +02:00
jbesomi.md Add texthero to universe.json (#5716) 2020-07-07 20:54:22 +02:00
jeannefukumaru.md fix typos in tag_map flagged by python -m debug-data (#3542) 2019-04-05 12:06:09 +02:00
jenojp.md Raise error if annotation dict in simple training style has unexpected keys #4074 (#4079) 2019-08-06 11:01:25 +02:00
jerbob92.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
Jette16.md Add universe test (#9278) 2021-10-11 11:08:46 +02:00
jganseman.md Create jganseman.md 2021-01-26 11:02:31 +01:00
jgutix.md Update suffixes example (#5989) 2020-08-31 12:44:56 +02:00
jimregan.md CLA 2017-06-26 21:32:48 +01:00
JKhakpour.md Add Persian(Farsi) language support (#2797) 2018-10-13 15:31:49 +02:00
jklaise.md Update load_lookups return type and docstring (#7907) 2021-04-27 09:13:39 +02:00
jmargeta.md Add contributor agreement for jmargeta 2020-10-16 00:38:42 +02:00
jmyerston.md Added ancient Greek language support (#8606) 2021-07-15 10:27:17 +02:00
johnhaley81.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
jonesmartins.md Add missing pronoums/determiners (#5569) 2020-06-10 18:47:04 +02:00
juliamakogon.md Ukrainian language added. Small fixes in Russian (#3241) 2019-02-07 21:05:11 +01:00
julien-talkair.md add spacy contributor agreement 2021-07-01 17:41:12 +02:00
juliensalinas.md Sign contributors agreement. 2021-05-14 11:00:27 +02:00
jumasheff.md Add contributor agreement 2021-01-25 00:34:12 +06:00
justindujardin.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
kabirkhan.md Add optional id property to EntityRuler patterns (#3591) 2019-06-16 13:29:04 +02:00
katarkor.md changed tag_map, morph_rules, lemmatizer for Norwegian (#2565) 2018-07-19 19:38:24 +02:00
katrinleinweber.md Formalise citation info (#2167) 2018-03-30 10:34:14 +02:00
kbulygin.md Fix the first nlp call for ja (closes #2901) (#3065) 2018-12-18 15:01:06 +01:00
KennethEnevoldsen.md added agreement 2021-07-13 10:11:02 +02:00
keshan.md Adding basic support for Sinhala language. (#2788) 2018-09-25 12:18:25 +02:00
keshav.md Spacy Cli info method causing backward compatibility issues (#6793) 2021-01-23 11:21:43 +01:00
kevinlu1248.md Create kevinlu1248.md 2020-05-19 20:25:45 -07:00
khellan.md Norwegian tweaks (#3894) 2019-07-08 10:28:47 +02:00
Kimahriman.md Fixed auto linking after download and added simple test to check 2018-01-29 14:25:21 -05:00
kimfalk.md agreeing to the contributor agreement. 2017-12-19 15:31:52 +01:00
KKsharma99.md Adding MindMeld to Universe JSON (#6275) 2020-10-21 18:42:11 +02:00
knoxdw.md Test and fix for Issue #2219 (#2272) 2018-05-03 18:40:46 +02:00
koaning.md add "whatlies" to spaCy universe (#5252) 2020-04-06 11:29:30 +02:00
kognate.md Added support for serializing overwrite and ent_id_sep (#3918) 2019-07-08 17:28:28 +02:00
kororo.md Add ExcelCy into Universe list (#2572) 2018-07-19 19:28:33 +02:00
kowaalczyk.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
kwhumphreys.md add agreement 2018-01-03 13:00:14 -08:00
laszabine.md Amend documentation to Language.evaluate (#5319) 2020-04-16 20:00:18 +02:00
lauraBaakman.md Fix contributor agreement 2019-02-07 20:56:13 +01:00
ldorigo.md Submit contributor agreement (#3705) 2019-05-10 14:19:18 +02:00
leicmi.md Remove duplicated branch in if/else-if statement (#5234) 2020-04-02 14:47:42 +02:00
leomrocha.md contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
leyendecker.md Fix on EntityRendered to support break lines (after last entity) (closes #5838) 2020-07-29 18:48:39 +02:00
lfiedler.md issue5230: added contributors agreement 2020-04-06 21:04:06 +02:00
ligser.md Fill contributer agreement 2017-11-11 11:39:31 +03:00
lizhe2004.md fix the wrong hash url in adding-languages.md file (#5810) 2020-07-25 13:13:38 +02:00
Loghijiaha.md Tamil language support (#3154) 2019-01-14 15:32:30 +01:00
lorenanda.md add new Romanian stopwords (#6621) 2021-01-08 11:34:47 +11:00
louisguitton.md Add mlflow to spaCy universe (#5352) 2020-04-29 10:18:03 +02:00
LRAbbade.md Adding my contributor agreement (#2315) 2018-05-09 21:25:05 +02:00
luvogels.md Update luvogels.md 2017-04-27 10:42:07 +02:00
mabraham.md Tokenizer to_disk and from_disk now ensure paths (#5116) 2020-03-08 13:25:56 +01:00
magnusburton.md Initial commit for Swedish 2016-12-20 11:05:06 +01:00
mahnerak.md Create mahnerak.md (#5615) 2020-06-20 11:14:26 +02:00
mariosasko.md Add TakeLab/spacy-udpipe to Universe (#8698) 2021-07-16 11:15:52 +02:00
markulrich.md Use correct local parameter in example MyComponent (and added markulrich.md contributor file) 2017-11-22 15:59:08 -08:00
MartinoMensio.md adding spacy-universal-sentence-encoder (#5534) 2020-06-08 20:26:30 +02:00
MateuszOlko.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
MathiasDesch.md Add spaCy Contributor Agreement 2017-11-09 11:56:47 +01:00
mauryaland.md Update stop_words.py for French language (#2310) 2018-05-09 12:04:38 +02:00
mbkupfer.md added contributor agreement for mbkupfer (#2738) 2018-09-10 11:32:03 +02:00
mdaudali.md Correct typo for AllenAI url on homepage (#4050) 2019-07-31 00:16:33 +02:00
mdcclv.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
mdda.md Create mdda.md 2017-12-18 18:09:27 +08:00
meghanabhange.md Project Idea : denomme | Multilingual Name Detection (#7845) 2021-04-22 08:48:17 +02:00
melanuria.pdf Add contributor agreement (see #1672) 2017-12-20 22:00:12 +01:00
merrcury.md Create merrcury.md 2020-03-10 15:11:07 +05:30
michael-k.md Add !=3.4.* to python_requires (#5344) 2020-04-27 22:02:09 +02:00
mihaigliga21.md adding Romanian tag_map (#4257) 2019-09-09 11:53:09 +02:00
mikeizbicki.md fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab (#5701) 2020-07-06 17:03:33 +02:00
mikelibg.md Removed space in docs + added contributor indo (#2909) 2018-11-08 14:18:25 +01:00
MiniLau.md Add is_sent_end token property (#5375) 2020-04-29 12:53:16 +02:00
mirfan899.md Add Urdu Language Support (#2430) 2018-06-22 11:14:03 +02:00
miroli.md Remove incorrect lemma lookup gäng->gänga (#2252) 2018-04-28 14:54:41 +02:00
MisterKeefe.md make idx available via to_array (#5030) 2020-02-22 14:13:06 +01:00
Mlawrence95.md [minor doc change] embedding vis. link is broken in website/docs/usage/examples.md (#5325) 2020-04-21 20:35:12 +02:00
mmaybeno.md Agnostic vocab array fix (#4680) 2019-11-23 14:59:52 +01:00
mn3mos.md #2211 - Support for ssl certs config on download command (#2212) 2018-05-03 18:37:02 +02:00
mollerhoj.md Add Danish lemmatizer (#2184) 2018-04-07 19:07:28 +02:00
moreymat.md Support CUDA 10 (#3126) 2019-01-09 03:10:45 +01:00
mpszumowski.md Fix bug in CLI iob and ner converter (#2392) (fixes #2385) 2018-05-30 12:28:44 +02:00
mpuig.md Catalan Language Support (#2940) 2018-11-26 15:25:47 +01:00
mr-bjerre.md Fix link to user hooks in docs (#4778) 2019-12-06 19:17:12 +01:00
msklvsk.md fix UD data file extensions (#2425) 2018-06-08 14:26:11 +02:00
munozbravo.md Overwrites default getter for like_num in Spanish by adding _num_words and like_num to lex_attrs.py (#3810) (closes #3803)) 2019-06-02 12:22:57 +02:00
myavrum.md Create myavrum.md (#5612) 2020-06-19 18:34:27 +02:00
narayanacharya6.md Address missing config overrides post load of models (#8208) 2021-05-31 18:36:52 +10:00
neelkamath.md Add "spaCy Server" to spaCy Universe (#4553) 2019-10-30 13:20:46 +01:00
nikhilsaldanha.md Add kannada examples (#5162) 2020-03-29 13:54:42 +02:00
nipunsadvilkar.md Incorrect Token attribute ent_iob_ description (#3800) 2019-05-31 16:50:45 +02:00
NirantK.md Create NirantK.md (#3807) [ci skip] 2019-06-01 17:36:06 +02:00
njsmith.md When calling getoption() in conftest.py, pass a default option (#2709) 2018-09-03 09:57:52 +02:00
nlptown.md Improved Dutch language resources and Dutch lemmatization (#3409) 2019-04-03 14:13:26 +02:00
nourshalabi.md Additions to Arabic stop words. (#2422) 2018-06-08 02:33:23 +02:00
NSchrading.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
nsorros.md Add logger debug for project push and pull (#8860) 2021-08-02 18:13:53 +02:00
Nuccy90.md Update morph_rules.py (#6102) 2020-10-06 15:14:47 +02:00
ohenrik.md Added contributors agreement 2018-01-25 11:05:29 +01:00
Olamyy.md Adding support for Yoruba Language (#4614) 2019-12-21 14:11:50 +01:00
onlyanegg.md Fix for Issue 4665 - conllu2json (#4953) 2020-02-03 13:01:48 +01:00
ophelielacroix.md Add (noun chunks) syntax iterators for Danish (#6246) 2021-01-07 16:33:00 +11:00
oroszgy.md Accepted contributor agreement. 2016-12-26 22:37:02 +01:00
osori.md Very minor issues in Korean example sentences (#5446) 2020-05-17 13:43:34 +02:00
ottosulin.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
oxinabox.md squashme 2018-02-09 23:19:11 +08:00
ozcankasal.md trilyon forgotten (#3083) 2018-12-27 14:44:23 +01:00
paoloq.md Matcher support for Span as well as Doc (#5113) 2020-04-15 13:51:33 +02:00
Pavle992.md Stopwords for Serbian language. (#4078) 2019-08-05 10:22:27 +02:00
pberba.md Update vocab.get_vector docs to include features on Fasttext ngram (#4464) 2019-10-20 01:28:18 +02:00
pbnsilva.md Adds contributor agreement 2018-01-11 17:40:12 +01:00
peter-exos.md Run PhraseMatcher on Spans (#6918) 2021-02-10 23:43:32 +11:00
PeterGilles.md Initial commit: New language Luxembourgish (lb) (#4424) 2019-10-14 12:27:50 +02:00
phiedulxp.md update lang/zh (#4103) 2019-08-12 10:37:48 +02:00
philipvollet.md Add projects to spaCy Universe (#9269) 2021-09-23 10:56:45 +02:00
phojnacki.md agreement of contributor, may I introduce a tiny pl languge contribution (#2799) 2018-09-27 12:25:22 +02:00
pickfire.md Add myself to contributors (#3575) 2019-04-11 11:31:04 +02:00
pinealan.md Fill in contributor agreement 2020-03-15 03:45:20 +08:00
pktippa.md Added pktippa contributor agreement 2018-02-07 15:37:28 +05:30
plison.md adding skweak to the SpaCy universe 2021-04-22 01:16:34 +02:00
PluieElectrique.md Reduce memory usage of Lookup's BloomFilter (#5606) 2020-06-26 14:09:10 +02:00
pmbaumgartner.md contributor agreement 2019-07-14 20:46:06 -04:00
polm.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
Poluglottos.md Fix typo 2019-03-16 13:45:46 +01:00
PolyglotOpenstreetmap.md Create PolyglotOpenstreetmap.md (#3198) 2019-01-26 14:02:54 +01:00
prilopes.md Bugfix/dep matcher issue 4590 (#4601) 2019-11-07 12:01:06 +01:00
punitvara.md This PR adds Gujarati Language class along with (#5355) 2020-04-27 11:07:37 +02:00
pzelasko.md Less norm computations in token similarity (#2730) 2018-09-05 21:50:23 +02:00
questoph.md Fix basic language support for Luxembourgish (by adding punctuation.py) (#4648) 2019-11-15 16:16:47 +01:00
R1j1t.md update spacy universe with my project (#5497) 2020-05-25 11:30:23 +02:00
rafguns.md Add contributor agreement 2020-12-14 22:01:14 +01:00
rahul1990gupta.md Hindi: Adds tests for lexical attributes (norm and like_num) (#5829) 2020-10-07 10:23:32 +02:00
ramananbalakrishnan.md Support single value for attribute list in doc.to_array 2017-10-19 17:00:41 +05:30
rameshhpathak.md Add Nepali Language (#5622) 2020-06-22 10:25:46 +02:00
rasyidf.md Update Indonesian Example Phrases (#6124) 2020-09-23 14:02:26 +02:00
reneoctavio.md fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911) 2021-02-04 23:37:13 +01:00
retnuh.md Update call to mkdir() to create the parents (#3139) 2019-01-11 03:02:18 +01:00
revuel.md Update universe.json (include PatternOmatic) (#6399) 2020-11-19 13:15:50 +01:00
richardliaw.md contribute (#5632) 2020-06-23 08:53:58 +02:00
richardpaulhudson.md Request to include Holmes in spaCy Universe (#3685) 2019-05-08 02:42:03 +02:00
robertsipek.md Fill contributor agreement by robertsipek (#6285) 2020-10-22 22:13:17 +02:00
rokasramas.md Lithuanian language support (#3895) 2019-07-08 10:25:22 +02:00
roshni-b.md updates for Bengali language (#3286) 2019-02-18 10:02:28 +01:00
RvanNieuwpoort.md Signed Contributer Agreement by Rob van Nieuwpoort 2016-12-15 10:34:19 +01:00
ryanzhe.md biluo_tags_from_offsets throw exception for overlapping entities (#4021) 2019-08-15 18:13:32 +02:00
sabiqueqb.md Gh 5339 language class for malayalam (#5342) 2020-04-27 09:45:08 +02:00
sainathadapa.md Basic support for Telugu language (#2751) 2018-09-10 11:53:18 +02:00
SamEdwardes.md Updates to universe.json for spaCyTextBlob (#7647) 2021-04-04 20:17:57 +02:00
sammous.md Updating description and code snippet spacy-lefff (#2623) 2018-08-02 17:25:27 +02:00
SamuelLKane.md fix(util): fix decaying function output (#3495) 2019-03-28 13:24:47 +01:00
savkov.md Renamed the file 2018-01-11 17:49:29 +00:00
Schibsted.png Add contributor agreement [ci skip] 2019-08-30 17:02:43 +02:00
seanBE.md add return_matches and as_tuples back to Matcher.pipe (#4303) 2019-09-18 22:00:33 +02:00
sebastienharinck.md contrib: add contributor agreement for user sebastienharinck (#5316) 2020-04-16 11:32:09 +02:00
sevdimali.md Azerbaijani language added (#7911) 2021-04-28 14:42:02 +02:00
shigapov.md added spaCyOpenTapioca (#9181) 2021-09-11 13:16:51 +09:00
shuvanon.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
skrcode.md Restore contributor agreement 2018-03-31 14:06:37 +02:00
sloev.md add spacy_syllables to universe + sign contributor agreement 2020-03-13 18:09:42 +01:00
snsten.md Added support for Sanskrit language (#5956) 2020-08-25 10:56:29 +02:00
socool.md Update Thai tokenizer_exception list (#3529) 2019-04-03 09:13:36 +02:00
solarmist.md Mark Japanese documents as tagged. (#5803) 2020-07-23 08:57:01 +02:00
sorenlind.md Add contributor agreement. 2017-11-24 15:29:54 +01:00
Stannislav.md Change type of texts argument in pipe to iterable (#6186) 2020-10-02 21:00:11 +02:00
suchow.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
svlandeg.md Fix small typo bug in French regexp + relevant unit test (#2980) 2018-11-29 20:16:13 +01:00
swfarnsworth.md Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956) 2021-08-18 09:55:45 +02:00
tamuhey.md Fix iss4278 (#4279) 2019-09-12 10:44:49 +02:00
therealronnie.md Addresses Issue #2228 - Deserialization fails when using tensor=False or sentiment=False (#2230) 2018-05-01 13:40:22 +02:00
theudas.md Added Parameter to NEL to take n sentences into account (#5548) 2020-06-12 02:03:23 +02:00
thomasbird.md Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
thomashacker.md Fix preservation of spacy package meta (#8663) 2021-07-12 11:18:52 +02:00
thomasopsomer.md add contributor agreement 2018-01-28 20:12:05 +01:00
thomasthiebaud.md Add spacy_fastlang to universe (#5271) 2020-04-15 13:50:46 +02:00
thoppe.md Added author information for NLPre (#5414) 2020-05-08 11:28:54 +02:00
tiangolo.md 📄 Add spaCy Contributor Agreement 2020-07-01 20:57:21 +02:00
Tiljander.md Describing priority rules for overlapping matches (#5197) 2020-03-26 13:13:22 +01:00
tilusnet.md Create tilusnet.md (#5914) 2020-08-12 22:46:08 +02:00
tjkemp.md Enhancement/lang fi examples (#2547) 2018-07-15 09:50:27 +02:00
tmetzl.md Merge branch 'master' into develop [ci skip] 2019-03-11 12:23:24 +01:00
tokestermw.md added contributor agreement 2017-11-17 17:27:20 -08:00
tommilligan.md Limit to cupy-cuda v8, so as not to pull in v9 automatically. (#5194) 2020-03-29 13:52:08 +02:00
trungtv.md Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155) 2018-03-29 12:19:51 +02:00
tupui.md SCA tupui 2021-01-29 15:46:53 +01:00
tyburam.md Lex _attrs for polish language (#2750) 2018-09-10 11:53:57 +02:00
tzano.md Add Arabic language (#2314) 2018-05-15 00:27:19 +02:00
ujwal-narayan.md Enhancing Kannada language Resources (#3755) 2019-05-20 12:56:10 +02:00
umarbutler.md Fixed Typo in Warning (#5284) 2020-04-09 15:46:15 +02:00
ursachec.md Add contributor agreement for @ursachec 2018-02-13 20:49:42 +01:00
uwol.md added contributor agreement 2017-11-05 12:33:43 +01:00
veer-bains.md Fixed syntax error in lang/ko when using python 2 (#4082) (closes #4068) 2019-08-05 10:19:32 +02:00
vha14.md add oprd to the list of accepted deps for noun chunking (#6302) 2020-11-05 09:17:35 +01:00
vikaskyadav.md Create vikaskyadav.md (#2621) 2018-08-02 14:03:44 +02:00
vishnumenon.md Fix the code for FACILITIY entities (#2324) 2018-05-12 15:19:17 +02:00
vishnupriyavr.md Limiting noun_chunks for specific languages (#5396) 2020-05-14 12:58:06 +02:00
vondersam.md Swedish like_num (#5371) 2020-04-29 21:25:22 +02:00
vsolovyov.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
w4nderlust.md Added Ludwig among the projects (#3548) [ci skip] 2019-04-07 13:01:26 +02:00
wallinm1.md [finnish] Add contributor file 2017-02-04 13:54:10 +02:00
walterhenry.md User contributor agreement 2020-10-19 16:25:09 +02:00
wannaphongcom.md Update Thai tag map (#3480) 2019-03-25 16:53:26 +01:00
werew.md DependencyMatcher improvements (fix #6678) (#6744) 2021-01-22 11:20:08 +11:00
willismonroe.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
willprice.md Improve random prefix generation in displaCy arcs (#3096) 2018-12-27 14:46:02 +01:00
wojtuch.md User correct variable name in the examples (#2664) 2018-08-13 22:21:24 +02:00
wxv.md Fix is_ascii documentation and create contributor file (#2988) 2018-11-30 15:57:58 +01:00
x-ji.md Fix venv command examples (#2560) [ci skip] 2018-07-18 10:31:24 +02:00
xadrianzetx.md Raise custom error in EntityLinker when KB is not set (#8442) 2021-06-25 23:04:00 +02:00
xssChauhan.md Change default output format from jsonl to json for cli convert (#3583) (closes #3523) 2019-04-12 11:31:23 +02:00
yanaiela.md Custom entity render (#4117) 2019-08-16 18:39:25 +02:00
yaph.md Create yaph.md so I can contribute (#3658) 2019-04-29 19:43:06 +02:00
yashpatadia.md Add test file for issue (#3625) and spacy contributor agreement 2019-07-11 14:53:14 +05:30
YohannesDatasci.md Armenian language support (#5246) 2020-04-03 13:02:18 +02:00
yohasebe.md Create yohasebe.md 2021-07-04 08:57:04 +09:00
yosiasz.md Add Amharic አማርኛ Language support (#6583) 2020-12-22 16:50:34 +01:00
yuukos.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
zaibacu.md Website (Universe): An entry for rita-dsl (#6138) 2020-10-09 10:14:40 +02:00
ZeeD.md applying suggestion to avoid mypy errors (#8265) 2021-06-02 19:25:30 +10:00
zhuorulin.md Bugfix/fix wikidata train entity linker (#4509) 2019-10-24 12:52:59 +02:00
zqhZY.md add contributors.md 2017-12-28 18:04:52 +08:00
zqianem.md Fix typo in documentation (#4322) 2019-09-25 19:42:18 +02:00