spaCy/.github/contributors
Joshua Smith e8420ab2b7 Added support for serializing overwrite and ent_id_sep (#3918)
* Perserve flags in EntityRuler

The EntityRuler (explosion/spaCy#3526) does not preserve
overwrite flags (or `ent_id_sep`) when serialized.  This
commit adds support for serialization/deserialization preserving
overwrite and ent_id_sep flags.

* add signed contributor agreement

* flake8 cleanup

mostly blank line issues.

* mark test from the issue as needing a model

The test from the issue needs some language model for serialization
but the test wasn't originally marked correctly.

* remove unneeded model loading

The model didn't need to be loaded, and I replaced it with
a change that doesn't require it (using existings fixtures)

* change tempdir handling to be compatible with python 2.7

* Adds code to handle item saved before this change.

This code chanes how the save files are handled and how the bytes
are stored as well.  This code adds check to dispatch correctly
if it encounters bytes or files saved in the old format (and tests
for those cases).

* use util function for tempdir management

Updated after PR comments: this code now uses the make_tempdir function from util
instead of doing it by hand.
2019-07-08 17:28:28 +02:00
..
5hirish.md Added Adam project to spaCy Universe (#2275) 2018-04-30 22:25:01 +02:00
aaronkub.md fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
aashishg.md Added numbers to ../lang/hi/lex_attrs.py (#2629) 2018-08-08 16:06:11 +02:00
abhi18av.md Create abhi18av.md 2017-11-13 17:23:05 +05:30
adrianeboyd.md Update TIGER/German dependency relations in documentation (#3204) 2019-01-30 14:23:12 +01:00
adrienball.md Fix egg fragments in direct download (#3369) 2019-03-07 21:07:19 +01:00
akki2825.md add kannada support (#3264) 2019-02-12 18:28:39 +01:00
alexvy86.md Fix code sample for Doc.set_extension (#2282) 2018-05-02 10:16:05 +02:00
aliiae.md Add Tatar Language Support (#2444) 2018-06-19 10:17:53 +02:00
alldefector.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
ALSchwalm.md Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) 2018-11-28 19:49:33 +01:00
alvaroabascar.md Fix issue 2396 (#3089) 2018-12-29 18:05:52 +01:00
alvations.md Create alvations.md (#3119) 2019-01-05 13:11:06 +01:00
amitness.md Fix broken link to Dive Into Python 3 website (#3656) 2019-04-29 19:44:00 +02:00
amperinet.md add small fix for French lemmatizer (#3206) 2019-01-31 23:44:10 +01:00
aniruddha-adhikary.md update bengali token rules for hyphen and digits (#2731) 2018-09-05 21:49:00 +02:00
ansgar-t.md escape html in displacy.render (#2378) (closes #2361) 2018-05-28 18:36:41 +02:00
aongko.md Update Indonesian model (#2752) 2018-09-14 12:30:32 +02:00
aristorinjuang.md adding more words and rephrasing (#2351) 2018-05-24 11:40:57 +02:00
armsp.md Update _training.jade (#2340) 2018-05-21 11:09:33 +02:00
aryaprabhudesai.md Create aryaprabhudesai.md (#2681) 2018-08-20 18:56:14 +02:00
askhogan.md Update example and sign contributor agreement (#3916) 2019-07-08 10:27:20 +02:00
avadhpatel.md Signed contributor agreement 2018-01-17 06:33:37 -06:00
Azagh3l.md Create Azagh3l.md (#3836) 2019-06-11 10:58:32 +02:00
azarezade.md add contributors.md 2018-01-23 13:47:30 +03:30
bdewilde.md Add contributor agreement 2017-11-20 11:28:31 -06:00
beatesi.md Updated wordforms for Norwegian lemmatizer (#3007) 2018-12-06 15:46:18 +01:00
bellabie.md Fix filename 2019-03-16 13:46:58 +01:00
Bharat123rox.md Made changes suggested by @ines 2019-03-20 07:43:19 +05:30
BigstickCarpet.md Better formatting for spacy train CLI (#2357) 2018-05-25 13:08:45 +02:00
bjascob.md Update Universe Website for pyInflect (#3641) 2019-04-26 13:17:36 +02:00
boena.md Updates to Swedish Language (#3164) 2019-01-16 13:45:50 +01:00
BramVanroy.md Documentation improvement regarding joblib and SO (#2867) 2018-10-24 15:19:17 +02:00
BreakBB.md Fix symlink creation to show error message on failure (#3589) (resolves #3307)) 2019-04-16 11:58:31 +02:00
Bri-Will.md Adds contributor agreement for Bri-Will 2017-12-11 14:38:37 -08:00
Brixjohn.md Added alpha support for Tagalog language (#3062) 2018-12-18 13:08:38 +01:00
bryant1410.md Fix website docs for Vectors.from_glove (#3565) 2019-04-10 15:23:27 +02:00
btrungchi.md Fix loading tokenizer with custom prefix search (#2495) 2018-07-04 12:56:07 +02:00
calumcalder.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
cbilgili.md Adds Canbey Bilgili's Contributor Agreement 2017-12-01 17:27:41 +03:00
cclauss.md Create cclauss.md 2017-11-20 14:57:30 +01:00
celikomer.md Signed agreement (#3577) 2019-04-11 11:31:27 +02:00
charlax.md Add charlax's contributor agreement (#2805) 2018-09-27 12:24:42 +02:00
chezou.md Upadate the document for Unidic link with latest version URL (#3022) 2018-12-07 17:24:48 +01:00
chrisdubois.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
cicorias.md fixes symbolic link on py3 and windows (#2949) 2018-11-24 15:34:23 +01:00
Cinnamy.md Correcting lang/ru/examples.py (#2845) 2018-10-13 15:19:43 +02:00
clarus.md Typo (#3865) 2019-06-20 10:31:19 +02:00
clippered.md issue #3012: add test (#3021) 2018-12-18 15:02:49 +01:00
coryhurst.md Silent keyword in info function in init (#2459) 2018-06-18 12:24:21 +02:00
d99kris.md Rename d99kris to d99kris.md 2017-12-17 13:44:55 +01:00
danielhers.md Signed contributor agreement 2017-11-08 16:28:56 +02:00
danielkingai2.md Don't use numpy directly for similarity (#3362) 2019-03-06 22:58:38 +00:00
danielruf.md chore: cache dependencies (#2418) 2018-06-11 00:22:41 +02:00
darindf.md Fix error (#2802) 2018-09-26 21:31:03 +02:00
demfier.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
demongolem.md Update tokenizer.md for construction example (#3790) 2019-06-16 14:32:56 +02:00
DeNeutoy.md Allow vectors to be optional in init-model, more robust string counting (#3155) 2019-01-14 23:48:30 +01:00
DimaBryuhanov.md DimaBryuhanov.md (#2590) 2018-07-24 18:43:27 +02:00
Dobita21.md Create Dobita21.md (#3614) 2019-04-18 12:51:54 +02:00
DoomCoder.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
doug-descombaz.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
DuyguA.md added contributor agreement for DuyguA 2017-11-13 15:45:13 +01:00
dvsrepo.md Adds contributor agreement dvsrepo 2017-04-07 11:58:28 +02:00
elbaulp.md Changed learning rate by its param name. (#3855) 2019-06-20 10:29:20 +02:00
Eleni170.md Add support for Greek language (#2535) 2018-07-10 13:48:38 +02:00
EmilStenstrom.md Add abbreviations from UD_Swedish-Talbanken (#2613) 2018-08-07 13:53:17 +02:00
emulbreh.md Add contributor agreement for emulbreh 2018-02-13 13:40:33 +01:00
enerrio.md add contributor agreement for @enerrio 2018-02-15 12:43:04 -08:00
estr4ng7d.md Marathi Language Support (#3767) 2019-05-24 14:29:42 +02:00
F0rge1cE.md Fix offset bug in loading pre-trained word2vec. (#3689) 2019-05-06 23:00:38 +02:00
filipecaixeta.md Add words to portuguese language _num_words (#2759) 2018-09-14 12:30:16 +02:00
fizban99.md Create fizban99.md (#3601) 2019-04-17 11:22:19 +02:00
foufaster.md Create foufaster.md (#3179) 2019-01-21 15:45:54 +01:00
frascuchon.md Include universe spec for spacy-wordnet component (#2919) 2018-11-13 23:54:46 +01:00
free-variation.md Fixed spaCy+Keras example (#2763) 2018-09-15 13:06:39 +02:00
fsonntag.md Add contributer aggreement 2017-11-19 16:30:35 +01:00
fucking-signup.md Add contributor agreement 2018-01-08 03:08:57 +01:00
gavrieltal.md Initialize trues to 0.0 in training example (#3004) 2018-12-03 01:33:22 +01:00
giannisdaras.md Greek language optimizations (#2558) 2018-07-18 18:51:38 +02:00
Gizzio.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
graus.md adds textpipe to universe (#3500) [ci skip] 2019-03-28 15:13:19 +01:00
greenriverrus.md Added contributor agreement 2017-11-26 22:14:08 +03:00
grivaz.md Introduces a bulk merge function, in order to solve issue #653 (#2696) 2018-09-10 16:41:42 +02:00
henry860916.md update response after calling add_pipe (#3661) 2019-05-01 12:02:18 +02:00
himkt.md fix wrong indexing (#2416) 2018-06-19 10:20:57 +02:00
HiromuHota.md Tags are joined with a comma and padded with asterisks (#3491) 2019-03-28 16:17:31 +01:00
honnibal.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
howl-anderson.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
hugovk.md CLA 2017-11-29 10:25:20 +02:00
iann0036.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
idealley.md Added agrement (#2374) 2018-05-26 18:19:08 +02:00
ines.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
intrafindBreno.md Create intrafindBreno.md (#3814) 2019-06-03 18:33:09 +02:00
IsaacHaze.md Adds contributor agreement IsaacHaze 2017-12-10 23:15:06 +01:00
ivigamberdiev.md Update links and http -> https (#3532) 2019-04-02 17:36:22 +02:00
ivyleavedtoadflax.md Add missing comma to NN example in docs (#2255) 2018-04-28 14:56:00 +02:00
jacopofar.md Visual C++ link updated (#2842) (closes #2841) [ci skip] 2018-10-12 14:59:45 +02:00
janimo.md Update Romanian stopword list (#2316) 2018-05-10 12:16:56 +02:00
jarib.md Add three missing tags from the nb tag map (#3085) 2018-12-27 14:48:40 +01:00
jeannefukumaru.md fix typos in tag_map flagged by python -m debug-data (#3542) 2019-04-05 12:06:09 +02:00
jerbob92.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
jimregan.md CLA 2017-06-26 21:32:48 +01:00
JKhakpour.md Add Persian(Farsi) language support (#2797) 2018-10-13 15:31:49 +02:00
johnhaley81.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
juliamakogon.md Ukrainian language added. Small fixes in Russian (#3241) 2019-02-07 21:05:11 +01:00
justindujardin.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
kabirkhan.md Add optional id property to EntityRuler patterns (#3591) 2019-06-16 13:29:04 +02:00
katarkor.md changed tag_map, morph_rules, lemmatizer for Norwegian (#2565) 2018-07-19 19:38:24 +02:00
katrinleinweber.md Formalise citation info (#2167) 2018-03-30 10:34:14 +02:00
kbulygin.md Fix the first nlp call for ja (closes #2901) (#3065) 2018-12-18 15:01:06 +01:00
keshan.md Adding basic support for Sinhala language. (#2788) 2018-09-25 12:18:25 +02:00
khellan.md Norwegian tweaks (#3894) 2019-07-08 10:28:47 +02:00
Kimahriman.md Fixed auto linking after download and added simple test to check 2018-01-29 14:25:21 -05:00
kimfalk.md agreeing to the contributor agreement. 2017-12-19 15:31:52 +01:00
knoxdw.md Test and fix for Issue #2219 (#2272) 2018-05-03 18:40:46 +02:00
kognate.md Added support for serializing overwrite and ent_id_sep (#3918) 2019-07-08 17:28:28 +02:00
kororo.md Add ExcelCy into Universe list (#2572) 2018-07-19 19:28:33 +02:00
kowaalczyk.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
kwhumphreys.md add agreement 2018-01-03 13:00:14 -08:00
lauraBaakman.md Fix contributor agreement 2019-02-07 20:56:13 +01:00
ldorigo.md Submit contributor agreement (#3705) 2019-05-10 14:19:18 +02:00
ligser.md Fill contributer agreement 2017-11-11 11:39:31 +03:00
Loghijiaha.md Tamil language support (#3154) 2019-01-14 15:32:30 +01:00
LRAbbade.md Adding my contributor agreement (#2315) 2018-05-09 21:25:05 +02:00
luvogels.md Update luvogels.md 2017-04-27 10:42:07 +02:00
magnusburton.md Initial commit for Swedish 2016-12-20 11:05:06 +01:00
markulrich.md Use correct local parameter in example MyComponent (and added markulrich.md contributor file) 2017-11-22 15:59:08 -08:00
MartinoMensio.md added contributor agreement 2017-11-17 16:30:09 +01:00
MateuszOlko.md Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
MathiasDesch.md Add spaCy Contributor Agreement 2017-11-09 11:56:47 +01:00
mauryaland.md Update stop_words.py for French language (#2310) 2018-05-09 12:04:38 +02:00
mbkupfer.md added contributor agreement for mbkupfer (#2738) 2018-09-10 11:32:03 +02:00
mdcclv.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
mdda.md Create mdda.md 2017-12-18 18:09:27 +08:00
melanuria.pdf Add contributor agreement (see #1672) 2017-12-20 22:00:12 +01:00
mikelibg.md Removed space in docs + added contributor indo (#2909) 2018-11-08 14:18:25 +01:00
mirfan899.md Add Urdu Language Support (#2430) 2018-06-22 11:14:03 +02:00
miroli.md Remove incorrect lemma lookup gäng->gänga (#2252) 2018-04-28 14:54:41 +02:00
mn3mos.md #2211 - Support for ssl certs config on download command (#2212) 2018-05-03 18:37:02 +02:00
mollerhoj.md Add Danish lemmatizer (#2184) 2018-04-07 19:07:28 +02:00
moreymat.md Support CUDA 10 (#3126) 2019-01-09 03:10:45 +01:00
mpszumowski.md Fix bug in CLI iob and ner converter (#2392) (fixes #2385) 2018-05-30 12:28:44 +02:00
mpuig.md Catalan Language Support (#2940) 2018-11-26 15:25:47 +01:00
msklvsk.md fix UD data file extensions (#2425) 2018-06-08 14:26:11 +02:00
munozbravo.md Overwrites default getter for like_num in Spanish by adding _num_words and like_num to lex_attrs.py (#3810) (closes #3803)) 2019-06-02 12:22:57 +02:00
nipunsadvilkar.md Incorrect Token attribute ent_iob_ description (#3800) 2019-05-31 16:50:45 +02:00
NirantK.md Create NirantK.md (#3807) [ci skip] 2019-06-01 17:36:06 +02:00
njsmith.md When calling getoption() in conftest.py, pass a default option (#2709) 2018-09-03 09:57:52 +02:00
nlptown.md Improved Dutch language resources and Dutch lemmatization (#3409) 2019-04-03 14:13:26 +02:00
nourshalabi.md Additions to Arabic stop words. (#2422) 2018-06-08 02:33:23 +02:00
NSchrading.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
ohenrik.md Added contributors agreement 2018-01-25 11:05:29 +01:00
oroszgy.md Accepted contributor agreement. 2016-12-26 22:37:02 +01:00
ottosulin.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
oxinabox.md squashme 2018-02-09 23:19:11 +08:00
ozcankasal.md trilyon forgotten (#3083) 2018-12-27 14:44:23 +01:00
pbnsilva.md Adds contributor agreement 2018-01-11 17:40:12 +01:00
phojnacki.md agreement of contributor, may I introduce a tiny pl languge contribution (#2799) 2018-09-27 12:25:22 +02:00
pickfire.md Add myself to contributors (#3575) 2019-04-11 11:31:04 +02:00
pktippa.md Added pktippa contributor agreement 2018-02-07 15:37:28 +05:30
polm.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
Poluglottos.md Fix typo 2019-03-16 13:45:46 +01:00
PolyglotOpenstreetmap.md Create PolyglotOpenstreetmap.md (#3198) 2019-01-26 14:02:54 +01:00
pzelasko.md Less norm computations in token similarity (#2730) 2018-09-05 21:50:23 +02:00
ramananbalakrishnan.md Support single value for attribute list in doc.to_array 2017-10-19 17:00:41 +05:30
retnuh.md Update call to mkdir() to create the parents (#3139) 2019-01-11 03:02:18 +01:00
richardpaulhudson.md Request to include Holmes in spaCy Universe (#3685) 2019-05-08 02:42:03 +02:00
rokasramas.md Lithuanian language support (#3895) 2019-07-08 10:25:22 +02:00
roshni-b.md updates for Bengali language (#3286) 2019-02-18 10:02:28 +01:00
RvanNieuwpoort.md Signed Contributer Agreement by Rob van Nieuwpoort 2016-12-15 10:34:19 +01:00
sainathadapa.md Basic support for Telugu language (#2751) 2018-09-10 11:53:18 +02:00
sammous.md Updating description and code snippet spacy-lefff (#2623) 2018-08-02 17:25:27 +02:00
SamuelLKane.md fix(util): fix decaying function output (#3495) 2019-03-28 13:24:47 +01:00
savkov.md Renamed the file 2018-01-11 17:49:29 +00:00
shuvanon.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
skrcode.md Restore contributor agreement 2018-03-31 14:06:37 +02:00
socool.md Update Thai tokenizer_exception list (#3529) 2019-04-03 09:13:36 +02:00
sorenlind.md Add contributor agreement. 2017-11-24 15:29:54 +01:00
suchow.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
svlandeg.md Fix small typo bug in French regexp + relevant unit test (#2980) 2018-11-29 20:16:13 +01:00
therealronnie.md Addresses Issue #2228 - Deserialization fails when using tensor=False or sentiment=False (#2230) 2018-05-01 13:40:22 +02:00
thomasopsomer.md add contributor agreement 2018-01-28 20:12:05 +01:00
tjkemp.md Enhancement/lang fi examples (#2547) 2018-07-15 09:50:27 +02:00
tmetzl.md Merge branch 'master' into develop [ci skip] 2019-03-11 12:23:24 +01:00
tokestermw.md added contributor agreement 2017-11-17 17:27:20 -08:00
trungtv.md Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155) 2018-03-29 12:19:51 +02:00
tyburam.md Lex _attrs for polish language (#2750) 2018-09-10 11:53:57 +02:00
tzano.md Add Arabic language (#2314) 2018-05-15 00:27:19 +02:00
ujwal-narayan.md Enhancing Kannada language Resources (#3755) 2019-05-20 12:56:10 +02:00
ursachec.md Add contributor agreement for @ursachec 2018-02-13 20:49:42 +01:00
uwol.md added contributor agreement 2017-11-05 12:33:43 +01:00
vikaskyadav.md Create vikaskyadav.md (#2621) 2018-08-02 14:03:44 +02:00
vishnumenon.md Fix the code for FACILITIY entities (#2324) 2018-05-12 15:19:17 +02:00
vsolovyov.md Re-add existing contributor agreements 2016-11-09 16:42:02 +01:00
w4nderlust.md Added Ludwig among the projects (#3548) [ci skip] 2019-04-07 13:01:26 +02:00
wallinm1.md [finnish] Add contributor file 2017-02-04 13:54:10 +02:00
wannaphongcom.md Update Thai tag map (#3480) 2019-03-25 16:53:26 +01:00
willismonroe.md Port over contributor agreements 2018-03-24 17:17:37 +01:00
willprice.md Improve random prefix generation in displaCy arcs (#3096) 2018-12-27 14:46:02 +01:00
wojtuch.md User correct variable name in the examples (#2664) 2018-08-13 22:21:24 +02:00
wxv.md Fix is_ascii documentation and create contributor file (#2988) 2018-11-30 15:57:58 +01:00
x-ji.md Fix venv command examples (#2560) [ci skip] 2018-07-18 10:31:24 +02:00
xssChauhan.md Change default output format from jsonl to json for cli convert (#3583) (closes #3523) 2019-04-12 11:31:23 +02:00
yaph.md Create yaph.md so I can contribute (#3658) 2019-04-29 19:43:06 +02:00
yuukos.md Port over contributor agreements 2017-10-24 20:13:34 +02:00
zqhZY.md add contributors.md 2017-12-28 18:04:52 +08:00