spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-13 09:42:26 +03:00

Author	SHA1	Message	Date
Zhuoru Lin	10d88b09bb	Bugfix/fix wikidata train entity linker (#4509 ) * Fix labels_discard Nonetype iteration error * Contributor agreement for Zhuoru Lin * Enhance EntityLinker.predict() to handle labels_discard is None case.	2019-10-24 12:52:59 +02:00
adrianeboyd	8516e9d53b	Support train dict format as JSONL (#4471 ) * Support train dict format as JSONL * Add (overly simple) check for dict vs. tuple to read JSONL lines as either train dicts or train tuples * Extend JSON/JSONL roundtrip conversion tests using `docs_to_json()` and `GoldCorpus.train_tuples` * Revert docs to default JSON output with convert	2019-10-23 16:01:44 +02:00
adrianeboyd	7fc39f124c	Fix logic in rules+model entity example [ci skip] (#4510 )	2019-10-23 14:41:21 +02:00
Ines Montani	835498d24f	Update azure-pipelines.yml	2019-10-23 14:31:09 +02:00
Matthw Honnibal	9ca109597d	Fix parser model for depth 1	2019-10-23 05:14:00 +02:00
Matthw Honnibal	9e32d8271c	Avoid failing if cupy random can'tbe set	2019-10-23 05:13:28 +02:00
Matthw Honnibal	f8bf5b7fe5	Fx gpu_id arg in pretrain	2019-10-23 04:41:40 +02:00
Matthw Honnibal	95648dcdd7	Pass parser settings better	2019-10-23 04:41:20 +02:00
Matthw Honnibal	8892ce98aa	Improve settings in _ml	2019-10-23 04:40:10 +02:00
Matthw Honnibal	6c8785a238	Add option for GPU ID to pretrain	2019-10-22 22:44:24 +02:00
Matthew Honnibal	ca7f0e669e	Set version to v2.2.2.dev1	2019-10-22 20:11:25 +02:00
Matthew Honnibal	9489c5f6b2	Clip most_similar to range [-1, 1] (fixes #4506 ) (#4507 ) * Clip most_similar to range [-1, 1] * Add/fix vectors tests * Fix test	2019-10-22 20:10:42 +02:00
Ines Montani	74a19aeb1c	Add xfailing test [ci skip]	2019-10-22 18:18:43 +02:00
Matthew Honnibal	3f6cb618a9	Set version to v2.2.2.dev0	2019-10-22 17:47:36 +02:00
Sofie Van Landeghem	48886afc78	prevent zero-length mem alloc (#4429 ) * raise specific error when removing a matcher rule that doesn't exist * rephrasing * goldparse init: allocate fields only if doc is not empty * avoid zero length alloc in saving tokenizer cache * avoid allocating zero length mem in matcher * asserts to avoid allocating zero length mem * fix zero-length allocation in matcher * bump cymem version * revert cymem version bump	2019-10-22 16:54:33 +02:00
adrianeboyd	3dfc764577	Free pointers in parser activations (#4486 ) * Free pointers in ActivationsC * Restructure alloc/free for parser activations * Rewrite/restructure to have allocation and free in parallel functions in `_parser_model` rather than partially in `_parseC()` in `Parser`. * Remove `resize_activations` from `_parser_model.pxd`.	2019-10-22 15:06:44 +02:00
Ines Montani	388ea03065	Update universe.json [ci skip]	2019-10-22 14:54:47 +02:00
Kabir Khan	8a7a30ea1d	Add cookiecutter-spacy-fastapi to spacy universe (#4498 )	2019-10-22 14:50:40 +02:00
tamuhey	fb89f6792b	refactor: remove unused variable (#4499 )	2019-10-22 14:38:17 +02:00
Ines Montani	4659435573	Fix argument type in PhraseMatcher.add docs (closes #4496 ) [ci skip]	2019-10-22 14:37:30 +02:00
Julin S	3ee15fce0d	Update information about Rasa (#4492 ) Rasa has been updated and rasa core and rasa nlu have been merged.	2019-10-22 14:32:31 +02:00
Matthw Honnibal	1dce86c555	Pass settings better from parser	2019-10-22 03:26:43 +02:00
Matthw Honnibal	ab7f85dfa2	Use Mish layer if pieces==1 in CNN	2019-10-22 03:26:27 +02:00
Matthw Honnibal	7ef3bcdc1c	Support cnn_maxout_pieces arg in pretrain	2019-10-22 03:25:30 +02:00
Ines Montani	b0bdb18b4d	Fix typo	2019-10-21 18:36:22 +02:00
Ines Montani	945b8ecfba	Relax plac version requirement	2019-10-21 18:08:33 +02:00
gustavengstrom	050e2445a8	Adding noun_chunks to the Swedish language model (sv) (#4422 ) * Create syntax_iterators.py Replica of spacy/lang/fr/syntax_iterators.py * Added import statements for SYNTAX_ITERATORS * Create gustavengstrom.md * Added "dobj" to list of labels in noun_chunks method and a test_noun_chunks method to the Swedish language model. * Delete README-checkpoint.md Co-authored-by: Gustav <gustav@davcon.se> Co-authored-by: Ines Montani <ines@ines.io>	2019-10-21 12:57:06 +02:00
Ines Montani	b2f88e2060	Fix formatting [ci skip]	2019-10-21 12:26:07 +02:00
adrianeboyd	f5c551a43a	Checks/errors related to ill-formed IOB input in CLI convert and debug-data (#4487 ) * Error for ill-formed input to iob_to_biluo() Check for empty label in iob_to_biluo(), which can result from ill-formed input. * Check for empty NER label in debug-data	2019-10-21 12:20:28 +02:00
adrianeboyd	3195a8f170	Add Entity Linking to menu (#4489 )	2019-10-21 12:17:30 +02:00
Sofie Van Landeghem	d5d55312b2	prevent division by zero in most_similar method (#4488 )	2019-10-21 12:04:46 +02:00
Matthw Honnibal	5a272d9029	Add method to decode predicted characters	2019-10-21 03:56:15 +02:00
Matthw Honnibal	f2808f78a7	Fix parser_maxout_pieces for depth=0	2019-10-21 01:25:03 +02:00
Matthw Honnibal	b4e0040d10	Pass Tok2Vec settings a bit better	2019-10-21 01:11:47 +02:00
Matthw Honnibal	fef50277d7	Support parser depth=0	2019-10-21 01:11:30 +02:00
Ines Montani	a98d1cd58e	Update Thinc version and remove GPU ops	2019-10-20 19:01:45 +02:00
Matthw Honnibal	eba89f08bd	Use chars loss in ClozeMultitask	2019-10-20 17:47:15 +02:00
Matthw Honnibal	77af446d04	Move characters_loss function, add window option	2019-10-20 17:47:00 +02:00
Matthw Honnibal	5a601ef46a	Add cnn_window option to pretrain	2019-10-20 17:46:34 +02:00
Matthw Honnibal	3a67aa857e	Clarify parser model CPU/GPU code The previous version worked with previous thinc, but only because some thinc ops happened to have gpu/cpu compatible implementations. It's better to call the right Ops instance.	2019-10-20 17:15:17 +02:00
Pepe Berba	7772d5d3c5	Update `vocab.get_vector` docs to include features on Fasttext ngram (#4464 ) * Update `vocab.get_vector` * Added contrib agreement	2019-10-20 01:28:18 +02:00
Ines Montani	2c96a5e5b0	Remove lemma attrs on BaseDefaults (#4468 )	2019-10-19 23:18:09 +02:00
Ines Montani	f6af3cf8d9	Add 3.8 classifier [ci skip]	2019-10-19 18:13:25 +02:00
Matthw Honnibal	ee56c6a4e1	Implement character-based pretraining objective	2019-10-19 11:42:38 +02:00
Ines Montani	5e59c9b3ee	Fix unicode strings in examples [ci skip]	2019-10-18 18:47:59 +02:00
adrianeboyd	8d3de90bc4	Suppress convert output if writing to stdout (#4472 )	2019-10-18 18:12:59 +02:00
Matthw Honnibal	36de9bf72a	Add more spacy pretrain options	2019-10-18 17:24:13 +02:00
Matthw Honnibal	f3e2aaea1e	Fix GPU selection in spacy train	2019-10-18 17:23:55 +02:00
Matthw Honnibal	49c0adc706	Add character-based bilstm tok2vec	2019-10-18 17:23:37 +02:00
Matthw Honnibal	727ede6599	Make character copy non-blocking	2019-10-18 17:23:19 +02:00

1 2 3 4 5 ...

11108 Commits