spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-25 20:36:02 +03:00

Author	SHA1	Message	Date
Ines Montani	ce1d441de5	Add docs for Vectors.most_similar [ci skip]	2019-10-03 14:29:47 +02:00
Ines Montani	80cf385f65	Update v2-2.md [ci skip]	2019-10-02 16:58:21 +02:00
Ines Montani	b6670bf0c2	Use consistent spelling	2019-10-02 10:37:39 +02:00
Ines Montani	475e3188ce	Add docs on filtering overlapping spans for merging (resolves #4352 ) [ci skip]	2019-10-01 21:59:50 +02:00
Ines Montani	0dd127bb00	Update v2-2.md [ci skip]	2019-10-01 21:37:06 +02:00
Ines Montani	cf65a80f36	Refactor lemmatizer and data table integration (#4353 ) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5	2019-10-01 21:36:03 +02:00
Ines Montani	bc7e7db208	Fix wording [ci skip]	2019-10-01 14:20:44 +02:00
Ines Montani	2a3a4565cd	Update infobox [ci skip]	2019-10-01 14:19:34 +02:00
Ines Montani	66aa0d479f	Update v2.2 page [ci skip]	2019-10-01 14:11:05 +02:00
Ines Montani	a8a1800f2a	Update lemma data documentation [ci skip]	2019-10-01 13:22:13 +02:00
Ines Montani	932ad9cb91	Fix typos and formatting [ci skip]	2019-10-01 12:30:04 +02:00
Ines Montani	3d8fd4b461	Revert #4334	2019-09-29 17:32:12 +02:00
Ines Montani	3bd4da068e	Fix link [ci skip]	2019-09-29 17:30:38 +02:00
Ines Montani	089f44cc56	Update serialization docs [ci skip]	2019-09-29 17:11:13 +02:00
Ines Montani	c9cd516d96	Move tests out of package (#4334 ) * Move tests out of package * Fix typo	2019-09-28 18:05:00 +02:00
Ines Montani	10742d3219	Update v2 docs [ci skip]	2019-09-28 15:57:22 +02:00
Ines Montani	f8d1e2f214	Update CLI docs [ci skip]	2019-09-28 13:12:30 +02:00
Ines Montani	59beab8405	Update v2-2.md [ci skip]	2019-09-27 18:10:43 +02:00
Ines Montani	685e4b2554	Update v2-2.md [ci skip]	2019-09-27 16:35:01 +02:00
Ines Montani	aad66d9bb9	Document PhraseMatcher.remove [ci skip]	2019-09-27 16:34:53 +02:00
Ines Montani	eb0649e38e	Fix tag [ci skip]	2019-09-26 16:22:33 +02:00
Ines Montani	da9a869d3f	Update vectors name docs [ci skip]	2019-09-26 16:21:32 +02:00
Em Zhan	aafa091541	Fix typo in documentation (#4322 ) * Fix typo 'probj' instead of 'pobj' * Add spaCy contributor agreement for zqianem	2019-09-25 19:42:18 +02:00
Matthew Honnibal	92ed4dc5e0	Allow vectors name to be set in init-model (#4321 ) * Allow vectors name to be specified in init-model * Document --vectors-name argument to init-model * Update website/docs/api/cli.md Co-Authored-By: Ines Montani <ines@ines.io>	2019-09-25 13:11:00 +02:00
Ines Montani	197406de1d	Update v2-2.md [ci skip]	2019-09-19 14:33:58 +02:00
Ines Montani	ddc09b08ed	Update v2-2.md [ci skip]	2019-09-19 00:58:30 +02:00
Matthew Honnibal	e2047576c4	Fix merge conflict	2019-09-18 21:42:11 +02:00
Matthew Honnibal	46c02d25b1	Merge changes to test_ner	2019-09-18 21:41:24 +02:00
Ines Montani	9c940eab94	Update version in examples [ci skip]	2019-09-18 21:23:26 +02:00
Ines Montani	f873548f6c	Add backwards incompatibility [ci skip]	2019-09-18 21:21:48 +02:00
Ines Montani	6ebdc5f7d2	Update download docs [ci skip]	2019-09-18 21:21:39 +02:00
Ines Montani	dd1810f05a	Update DocBin and add docs	2019-09-18 20:23:21 +02:00
Ines Montani	d62690b3ba	Update examples	2019-09-18 19:57:36 +02:00
Ines Montani	bd435faddd	Add note about usage docs [ci skip]	2019-09-18 19:56:43 +02:00
Matthew Honnibal	931e96b6c7	DocPallet->DocBin in docs	2019-09-18 15:17:26 +02:00
Matthew Honnibal	f537cbeacc	Update v2-2 docs	2019-09-18 14:07:55 +02:00
Ines Montani	ee15fdfe88	Fix wording [ci skip]	2019-09-17 14:59:42 +02:00
Ines Montani	f566e69f38	Fix --vectors-loc docs (closes #4270 )	2019-09-17 14:59:12 +02:00
Ines Montani	25c2b4b9a5	Improve init-model docs (see #4137 )	2019-09-17 14:51:44 +02:00
Ines Montani	198b7e9789	Auto-format [ci skip]	2019-09-17 14:48:35 +02:00
adrianeboyd	b5d999e510	Add textcat to train CLI (#4226 ) * Add doc.cats to spacy.gold at the paragraph level Support `doc.cats` as `"cats": [{"label": string, "value": number}]` in the spacy JSON training format at the paragraph level. * `spacy.gold.docs_to_json()` writes `docs.cats` * `GoldCorpus` reads in cats in each `GoldParse` * Update instances of gold_tuples to handle cats Update iteration over gold_tuples / gold_parses to handle addition of cats at the paragraph level. * Add textcat to train CLI * Add textcat options to train CLI * Add textcat labels in `TextCategorizer.begin_training()` * Add textcat evaluation to `Scorer`: * For binary exclusive classes with provided label: F1 for label * For 2+ exclusive classes: F1 macro average * For multilabel (not exclusive): ROC AUC macro average (currently relying on sklearn) * Provide user info on textcat evaluation settings, potential incompatibilities * Provide pipeline to Scorer in `Language.evaluate` for textcat config * Customize train CLI output to include only metrics relevant to current pipeline * Add textcat evaluation to evaluate CLI * Fix handling of unset arguments and config params Fix handling of unset arguments and model confiug parameters in Scorer initialization. * Temporarily add sklearn requirement * Remove sklearn version number * Improve Scorer handling of models without textcats * Fixing Scorer handling of models without textcats * Update Scorer output for python 2.7 * Modify inf in Scorer for python 2.7 * Auto-format Also make small adjustments to make auto-formatting with black easier and produce nicer results * Move error message to Errors * Update documentation * Add cats to annotation JSON format [ci skip] * Fix tpl flag and docs [ci skip] * Switch to internal roc_auc_score Switch to internal `roc_auc_score()` adapted from scikit-learn. * Add AUCROCScore tests and improve errors/warnings * Add tests for AUCROCScore and roc_auc_score * Add missing error for only positive/negative values * Remove unnecessary warnings and errors * Make reduced roc_auc_score functions private Because most of the checks and warnings have been stripped for the internal functions and access is only intended through `ROCAUCScore`, make the functions for roc_auc_score adapted from scikit-learn private. * Check that data corresponds with multilabel flag Check that the training instances correspond with the multilabel flag, adding the multilabel flag if required. * Add textcat score to early stopping check * Add more checks to debug-data for textcat * Add example training data for textcat * Add more checks to textcat train CLI * Check configuration when extending base model * Fix typos * Update textcat example data * Provide licensing details and licenses for data * Remove two labels with no positive instances from jigsaw-toxic-comment data. Co-authored-by: Ines Montani <ines@ines.io>	2019-09-15 22:31:31 +02:00
Ines Montani	bab9976d9a	💫 Adjust Table API and add docs (#4289 ) * Adjust Table API and add docs * Add attributes and update description [ci skip] * Use strings.get_string_id instead of hash_string * Fix table method calls * Make orth arg in Lemmatizer.lookup optional Fall back to string, which is now handled by Table.__contains__ out-of-the-box * Fix method name * Auto-format	2019-09-15 22:08:13 +02:00
Ines Montani	16c2522791	Merge branch 'master' into develop	2019-09-14 16:42:01 +02:00
Ines Montani	86befc80bf	WIP: Add v2.2 page [ci skip]	2019-09-14 16:41:48 +02:00
Ines Montani	04d36d2471	Remove unused link [ci skip]	2019-09-14 16:41:19 +02:00
Ines Montani	5c8b5e68ec	Fix docs consistency [ci skip]	2019-09-14 16:23:37 +02:00
Ines Montani	bbf7337eaf	Update adding languages docs [ci skip]	2019-09-14 15:32:15 +02:00
Ines Montani	3126dd0904	Tidy up and auto-format [ci skip]	2019-09-14 12:58:06 +02:00
Ines Montani	3c3658ef9f	Merge branch 'master' into develop	2019-09-12 18:03:01 +02:00
Sofie Van Landeghem	9be4d1c105	Allow copying of user_data in as_doc (#4282 ) * Allow copying the user_data with as_doc + unit test * add option to docs * add typing * import fix * workaround to avoid bool clashing ... * bint instead of bool	2019-09-12 17:08:14 +02:00

1 2 3 4 5 ...

661 Commits