spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-11 12:14:30 +03:00

Author	SHA1	Message	Date
svlandeg	d7c0f40a96	update comment	2021-01-22 18:55:18 +01:00
svlandeg	a071279bc7	add speed comparison to docs	2021-01-22 18:46:35 +01:00
svlandeg	b132cb3036	update accuracies for new a1 models	2021-01-21 20:24:05 +01:00
Adriane Boyd	d0236136a2	Fix default config init in Transformer API docs (#6781 )	2021-01-21 23:18:03 +08:00
Sofie Van Landeghem	e680efc7cc	Set annotations in update (#6767 ) * bump to 3.0.0rc4 * do set_annotations in component update calls * update docs and remove set_annotations flag * fix EL test	2021-01-20 11:49:25 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Ines Montani	4a1029a9b6	Add infobox [ci skip]	2021-01-19 19:18:39 +11:00
Adriane Boyd	7cd5c9e098	Add xx_sent_ud_sm model to website	2021-01-19 09:02:35 +01:00
Ines Montani	76e25afcd7	Merge pull request #6757 from adrianeboyd/docs/mk-ru-langs [ci skip] Update languages for website	2021-01-19 11:10:48 +11:00
Ines Montani	f50502dad7	Update docs [ci skip]	2021-01-19 00:22:47 +11:00
Adriane Boyd	e8f6400923	Update languages for website * Add Macedonian * Add Russian dependencies * Switch Chinese dependency to spacy-pkuseg	2021-01-18 14:09:34 +01:00
Ines Montani	2ae8dfbb93	Fix website [ci skip]	2021-01-18 22:31:32 +11:00
Ines Montani	09cacbb7ee	Fix website [ci skip]	2021-01-18 11:37:04 +11:00
Sofie Van Landeghem	fed8f48965	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 ) * raise NotImplementedError when noun_chunks iterator is not implemented * bring back, fix and document span.noun_chunks * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-17 19:56:05 +08:00
Adriane Boyd	bf0cdae8d4	Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory	2021-01-17 19:54:41 +08:00
Adriane Boyd	9328dd5625	Handle unset token.morph in Morphologizer (#6704 ) * Handle unset token.morph in Morphologizer Handle unset `token.morph` in `Morphologizer.initialize` and `Morphologizer.get_loss`. If both `token.morph` and `token.pos` are unset, treat the annotation as missing rather than empty. * Add token.has_morph()	2021-01-15 17:20:10 +01:00
Adriane Boyd	0c936004d1	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3	2021-01-14 11:49:58 +01:00
Matthew Honnibal	f277bfdf0f	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 ) * Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>	2021-01-14 17:30:41 +11:00
Ines Montani	29c3ca7e34	Fix SVG integration [ci skip]	2021-01-14 13:33:41 +11:00
Antonio Miras	b4bd8f347a	spaCy Universe: New project; SpacyDotNet (#6702 ) * Universe: SpacyDotNet a .NET Core spaCy wrapper * Signed contributor agreement Co-authored-by: Antonio Miras <antonio@amiras.net>	2021-01-13 12:47:30 +11:00
Adriane Boyd	a45d89f09a	Add initialize.before_init and after_init callbacks Add `initialize.before_init` and `initialize.after_init` callbacks to the config. The `initialize.before_init` callback is a place to implement one-time tokenizer customizations that are then saved with the model.	2021-01-12 13:07:44 +01:00
Sofie Van Landeghem	a612a5ba3f	fix small typos (#6698 )	2021-01-08 09:39:47 +01:00
Sofie Van Landeghem	75d9019343	Fix types of Tok2Vec encoding architectures (#6442 ) * fix TorchBiLSTMEncoder documentation * ensure the types of the encoding Tok2vec layers are correct * update references from v1 to v2 for the new architectures	2021-01-07 16:39:27 +11:00
Sofie Van Landeghem	82ae95267a	Docs for pretrain architectures (#6605 ) * document pretraining architectures * formatting * bit more info * small fixes	2021-01-06 16:12:30 +11:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Ines Montani	6f83abb971	Merge pull request #6647 from svlandeg/feature/init_config_overwrite	2021-01-05 14:59:04 +11:00
Ines Montani	3614472e29	Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]	2021-01-05 13:52:49 +11:00
Ines Montani	9c078a5885	Update formatting for consistency [ci skip]	2021-01-05 13:52:28 +11:00
Ines Montani	a9e845426f	Use --force for consistency and add docs	2021-01-05 13:49:59 +11:00
svlandeg	d5ff0fecf8	add docs	2020-12-30 14:01:13 +01:00
svlandeg	2fa23b0304	fix capitalization for link	2020-12-29 15:01:22 +01:00
svlandeg	43cc6aea93	remove non-existing link	2020-12-29 14:59:39 +01:00
svlandeg	543073bf9d	add pretrain example	2020-12-29 14:51:23 +01:00
svlandeg	1d0ef98873	move example	2020-12-29 14:46:03 +01:00
svlandeg	20113b8063	add train CLI example	2020-12-29 14:44:56 +01:00
Sofie Van Landeghem	87562e470d	fix backticks in docs (#6635 )	2020-12-27 22:12:37 +01:00
Sofie Van Landeghem	8df5b7f513	fix documentation of 'path' in tokenizer.to_disk (#6634 )	2020-12-27 22:01:06 +01:00
Sofie Van Landeghem	282a3b49ea	Fix parser resizing when there is no upper layer (#6460 ) * allow resizing of the parser model even when upper=False * update from spacy.TransitionBasedParser.v1 to v2 * bugfix	2020-12-18 18:56:57 +08:00
Gareth Sparks	efc229c3f4	Doc.char_span arg: alignment_mode (#6591 ) Currently labeled "mode", actually "alignment_mode"	2020-12-18 09:54:56 +01:00
Jeno Pizarro	a6fe35a0f9	Update universe.json	2020-12-15 21:53:20 -05:00
Jeno Pizarro	343a44abe9	Merge branch 'master' of https://github.com/explosion/spaCy	2020-12-15 21:49:46 -05:00
Ines Montani	85ca8c2bdd	Merge branch 'master' into develop	2020-12-11 13:44:41 +11:00
Ines Montani	fb43a30a71	Merge pull request #6545 from svlandeg/feature/discussions [ci skip]	2020-12-11 10:20:35 +11:00
Ines Montani	76cfd89dea	Update site.json	2020-12-11 10:19:42 +11:00
Ines Montani	43a69eecb7	Update site.json	2020-12-11 10:05:21 +11:00
svlandeg	d156b423ae	remove gitter and reddit links	2020-12-10 20:41:02 +01:00
svlandeg	5afa567767	replace gitter with discussions in 101	2020-12-10 20:17:36 +01:00
svlandeg	ae1ccf2b04	update link to discussion forum	2020-12-10 20:02:49 +01:00
Adriane Boyd	27bb75e2a0	Docs and extras updates for v2.3.5 * Update install instructions for updated packages * Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only compatible with `thinc>=7.4.4`)	2020-12-10 15:34:34 +01:00
Ines Montani	513c4e332a	Include custom code via spacy package command (#6531 )	2020-12-10 20:36:46 +08:00
Ines Montani	2a6043fabb	Merge pull request #6530 from explosion/feature/init-config-cpu-gpu	2020-12-10 09:38:46 +11:00
Ines Montani	9d32e839d3	Merge branch 'develop' into feature/init-config-cpu-gpu	2020-12-10 08:50:53 +11:00
Adriane Boyd	972820e2b3	Add batch_size to data formats docs	2020-12-09 12:44:04 +01:00
Adriane Boyd	80ac8af1bf	Format	2020-12-09 12:44:01 +01:00
Adriane Boyd	795b5bd049	Update website/docs/api/language.md Co-authored-by: Ines Montani <ines@ines.io>	2020-12-09 12:23:32 +01:00
Adriane Boyd	fa8fa474a3	Add nlp.batch_size setting Add a default `batch_size` setting for `Language.pipe` and `Language.evaluate` as `nlp.batch_size`.	2020-12-09 09:13:26 +01:00
Ines Montani	04b3068747	Revert landing [ci skip]	2020-12-09 11:20:45 +11:00
Ines Montani	34449b66fd	Update matcher.md	2020-12-09 11:09:45 +11:00
Ines Montani	1980203229	Merge branch 'master' into pr/6444	2020-12-09 11:09:40 +11:00
Ines Montani	05a2812ae0	Merge branch 'develop' into pr/6444	2020-12-09 11:04:03 +11:00
Ines Montani	758ad6c3cd	Make CPU the default for init config	2020-12-09 11:00:51 +11:00
Ines Montani	8921364579	Merge pull request #6521 from explosion/feature/config-stdin Allow reading config from stdin in spacy train	2020-12-08 22:07:43 +11:00
Ines Montani	94a5a9814f	Update argument handling and documentation	2020-12-08 20:41:18 +11:00
Adriane Boyd	5ceac425ee	Remove non-working --use-chars from train CLI Remove the non-working `--use-chars` option from the train CLI. The implementation of the option across component types and the CLI settings could be fixed, but the `CharacterEmbed` model does not work on GPU in v2 so it's better to remove it.	2020-12-08 08:30:00 +01:00
Ines Montani	ef59ce783b	Adjust install instructions [ci skip]	2020-12-08 18:06:50 +11:00
Sofie Van Landeghem	2c27093c5f	require_cpu functionality (#6336 ) * add require_cpu from Thinc 8.0.0rc2 * add docs * fix test if cupy is not installed	2020-12-08 14:42:40 +08:00
Ines Montani	d8e01ca931	Merge pull request #6391 from adrianeboyd/docs/install-guide	2020-12-08 07:42:16 +01:00
Ines Montani	ee2ec52f48	Merge pull request #6409 from svlandeg/feature/trf-docs	2020-12-08 06:32:10 +01:00
Ines Montani	c2b196c2c1	Merge pull request #6419 from svlandeg/feature/rel-docs	2020-12-08 06:30:41 +01:00
Ines Montani	82e88f0e3b	Merge pull request #6379 from svlandeg/fix/labels-constructor	2020-12-08 06:29:56 +01:00
Adriane Boyd	1442d2f213	Improve simple training example in v3 migration (#6438 ) * Create the examples once * Use the examples in the initialization * Provide the batch size * Fix `begin_training` migration example	2020-11-30 09:39:45 +08:00
Adriane Boyd	03ae77e603	Add SPACY as a Matcher attribute (#6463 )	2020-11-30 09:34:50 +08:00
Ines Montani	d21d2c2e59	Don't multiply accuracy by 100	2020-11-27 15:15:51 +08:00
Adriane Boyd	724831b066	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master * Update Macedonian for v3 * Update Turkish for v3	2020-11-25 11:49:34 +01:00
Jacob Bortell	fe9009911a	Update rule-based-matching.md (#6421 ) * Update rule-based-matching.md Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc). Clarified "Type" column header to "Value Type" * Update rule-based-matching.md Improved clarity of wording	2020-11-24 16:20:19 +01:00
Adriane Boyd	6f133877aa	Update source install instructions * Don't recommend an editable install in the default source instructions. * Use `pip install --no-build-isolation` for editable installs. * Remove reference to `virtualenv`.	2020-11-24 14:44:13 +01:00
Yusuke Mori	e3ac90b035	Avoid a SyntaxError in self-attentive-parser (#6428 ) * Avoid a SyntaxError in self-attentive-parser Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser * Create forest1988.md Fill in the spaCy contributor agreement	2020-11-22 21:59:37 +01:00
svlandeg	218abaa69a	typo	2020-11-20 22:36:49 +01:00
svlandeg	e861e928df	more small corrections	2020-11-20 22:29:58 +01:00
svlandeg	5ac0867427	final fixes	2020-11-20 22:18:53 +01:00
svlandeg	331ec83493	edits and updates to implementing REL component docs	2020-11-20 21:41:52 +01:00
svlandeg	4a3e611abc	small fixes and formatting	2020-11-20 15:55:05 +01:00
svlandeg	124f49feb6	update REL model code	2020-11-20 15:25:20 +01:00
svlandeg	636be3c791	Merge remote-tracking branch 'upstream/develop' into feature/trf-docs	2020-11-19 14:15:35 +01:00
Sofie Van Landeghem	165993d8e5	fix typo in transformer docs (#6404 )	2020-11-19 14:11:38 +01:00
M. Revuelta Espinosa	51232ffb9e	Update universe.json (include PatternOmatic) (#6399 ) Request to include PatternOmatic in spaCy Universe Adds @revuel to contributors	2020-11-19 13:15:50 +01:00
Adriane Boyd	3cf6479467	Fix JSON in #6395	2020-11-17 15:25:41 +01:00
Sam Edwardes	78913a4f95	Added spaCyTextBlob to universe.json (#6395 )	2020-11-17 14:38:34 +01:00
Adriane Boyd	96726ec1f6	Fix DocBin init in training example (#6396 )	2020-11-17 14:36:44 +01:00
Adriane Boyd	ed32fa80cd	Update source install instructions * Use `pip install` instead of `python setup.py install` * For developers recommend: * `python setup.py build_ext --inplace -j N` * `python setup.py develop`	2020-11-16 10:13:51 +01:00
svlandeg	99d0412b6e	add link to REL project	2020-11-15 18:35:56 +01:00
svlandeg	73fc1ed963	remove labels from morphologizer constructor	2020-11-11 21:48:50 +01:00
svlandeg	fcd79e0655	remove set_morphology from docs	2020-11-11 21:32:34 +01:00
Ines Montani	3ca5c7082d	Use pip install . in quickstart [ci skip]	2020-11-10 17:27:49 +08:00
Ines Montani	de6453940e	Merge pull request #6305 from svlandeg/feature/score-docs [ci skip]	2020-11-10 02:52:11 +01:00
Ines Montani	4d337eedf2	Merge pull request #6322 from medspacy/master	2020-11-10 02:47:29 +01:00
Ines Montani	d7950c5ada	Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip]	2020-11-10 02:45:52 +01:00
Ines Montani	448bfbdc30	Remove conda from nightly install widget [ci skip]	2020-11-10 09:44:52 +08:00
svlandeg	789fb3d124	add docs for upstream argument of TransformerListener	2020-11-09 21:42:58 +01:00
Ines Montani	363ac73c72	Update docs [ci skip]	2020-11-09 12:43:26 +08:00
Adriane Boyd	8644ee3e3f	Update TIGER link and tag description (#6344 )	2020-11-05 09:33:00 +01:00
Sofie Van Landeghem	8ef056cf98	fix embed_size in Entity Linker architecture (#6343 )	2020-11-04 22:20:13 +01:00
Ines Montani	019a1dd5e8	Fix v3 overview [ci skip]	2020-11-03 18:10:06 +01:00
Adriane Boyd	a4b32b9552	Handle missing reference values in scorer (#6286 ) * Handle missing reference values in scorer Handle missing values in reference doc during scoring where it is possible to detect an unset state for the attribute. If no reference docs contain annotation, `None` is returned instead of a score. `spacy evaluate` displays `-` for missing scores and the missing scores are saved as `None`/`null` in the metrics. Attributes without unset states: * `token.head`: relies on `token.dep` to recognize unset values * `doc.cats`: unable to handle missing annotation Additional changes: * add optional `has_annotation` check to `score_scans` to replace `doc.sents` hack * update `score_token_attr_per_feat` to handle missing and empty morph representations * fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START` vs. `SENT_START` * Fix import * Update return types	2020-11-03 15:47:18 +01:00
Alec Chapman	204c7c8a00	fix thumbnail link to be github raw url	2020-11-01 07:53:48 -07:00
Alec Chapman	73d22d96ff	add medspacy to universe and fix example w/ cov-bsv	2020-10-29 07:53:56 -06:00
Adriane Boyd	8cc5ed6771	Add Macedonian to website languages	2020-10-29 08:49:56 +01:00
Adriane Boyd	dc816bba9d	Fix node name typo in dependency matcher example (#6311 )	2020-10-28 16:32:46 +01:00
Adriane Boyd	4dd86306e9	Add Nepali to supported languages on website (#6315 )	2020-10-28 16:32:07 +01:00
svlandeg	77688b0072	fix config	2020-10-26 11:14:34 +01:00
svlandeg	5878ff6bcd	cleanup	2020-10-26 11:13:02 +01:00
svlandeg	e95d9caa87	small edits	2020-10-26 11:09:25 +01:00
svlandeg	a664994a81	adding score method to explanation of new component	2020-10-26 10:52:47 +01:00
Adriane Boyd	253480353c	Remove zh from quickstart extras	2020-10-23 11:39:25 +02:00
Adriane Boyd	af26886fff	Fix formatting	2020-10-23 11:38:14 +02:00
Adriane Boyd	c0b76f4c19	Add install step to "Compile from source"	2020-10-23 11:36:36 +02:00
Adriane Boyd	8fe7ede667	Add install step to source install quickstart	2020-10-23 11:34:43 +02:00
Adriane Boyd	4299a7f654	Setup / install / quickstart updates * Add `cuda110` to setup.cfg and quickstart dropdown * Switch to `pip` for pip-only packages in conda quickstart instructions * Update zh pkuseg install message with version range and conda * Remove `zh` from `extras_require` because the default doesn't require additional packages	2020-10-23 11:27:54 +02:00
Kunal Sharma	01aec7a313	Adding MindMeld to Universe JSON (#6275 ) * Adding Mindmeld to Universe JSON Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/ * Signing contribution agreement. Co-authored-by: kunshar2 <kunshar2@cisco.com>	2020-10-21 18:42:11 +02:00
Ines Montani	6523f2daac	Merge pull request #6273 from adrianeboyd/bugfix/detailed-scores-in-evaluate2	2020-10-20 10:03:09 +02:00
Adriane Boyd	fbe65b257b	Convert accuracy numbers on website models page	2020-10-19 18:55:55 +02:00
Ines Montani	b6b1c1e23c	Merge pull request #6271 from walterhenry/develop-proof [ci skip]	2020-10-19 16:31:43 +02:00
walterhenry	db24dc5614	Proofread remarks I think these may the last remarks for the nightly docs. Only two minor things actually.	2020-10-19 11:11:32 +02:00
Sofie Van Landeghem	75a202ce65	TextCat updates and fixes (#6263 ) * small fix in example imports * throw error when train_corpus or dev_corpus is not a string * small fix in custom logger example * limit macro_auc to labels with 2 annotations * fix typo * also create parents of output_dir if need be * update documentation of textcat scores * refactor TextCatEnsemble * fix tests for new AUC definition * bump to 3.0.0a42 * update docs * rename to spacy.TextCatEnsemble.v2 * spacy.TextCatEnsemble.v1 in legacy * cleanup * small fix * update to 3.0.0rc2 * fix import that got lost in merge * cursed IDE * fix two typos	2020-10-18 14:50:41 +02:00
Ines Montani	e2f3c4e12d	Fix robots [ci skip]	2020-10-16 17:44:13 +02:00
Adriane Boyd	e896803792	Add and update website license links	2020-10-16 17:01:52 +02:00
Ines Montani	c655742b8b	Remove docs references to starters for now (see #6262 ) [ci skip]	2020-10-16 15:46:34 +02:00
Ines Montani	3851300e80	Update landing [ci skip]	2020-10-16 11:46:33 +02:00
Ines Montani	c968d1560f	Fix docs example [ci skip]	2020-10-16 11:33:20 +02:00
Ines Montani	ba1e004049	Fix typo [ci skip]	2020-10-15 23:39:04 +02:00
Ines Montani	32dc4f4796	Sort models sidebar alphabetically [ci skip]	2020-10-15 22:47:16 +02:00
Ines Montani	20f80587d6	Merge pull request #6257 from walterhenry/develop-proof A few tiny typo fixes to push through with release of nightly	2020-10-15 18:17:30 +02:00
walterhenry	75b7f86383	Three small typos Some little typos since v3.0 is out.	2020-10-15 18:06:37 +02:00
Ines Montani	09dbbe75d7	Update docs [ci skip]	2020-10-15 17:27:24 +02:00
Ines Montani	7f05ccc170	Update docs [ci skip]	2020-10-15 12:35:30 +02:00
Ines Montani	4fa869e6f7	Update docs [ci skip]	2020-10-15 11:16:06 +02:00
Ines Montani	178760855f	Merge branch 'develop' into master-tmp	2020-10-15 09:06:03 +02:00
Ines Montani	abeafcbc08	Update docs [ci skip]	2020-10-15 08:58:30 +02:00
Ines Montani	050aa1e0e2	Update languages.json [ci skip]	2020-10-14 20:51:50 +02:00
Ines Montani	a966c271f7	Update models docs [ci skip]	2020-10-14 20:50:23 +02:00
Ines Montani	a2d4aaee70	Apply suggestions from code review	2020-10-14 19:51:36 +02:00
Ines Montani	d94e241fce	Merge branch 'develop' into pr/6253	2020-10-14 16:55:46 +02:00
Ines Montani	cb47f25cda	Merge pull request #6252 from svlandeg/fix/docs	2020-10-14 16:43:12 +02:00
walterhenry	6af585dba5	New batch of proofs Just tiny fixes to the docs as a proofreader	2020-10-14 16:37:57 +02:00
svlandeg	478a14a619	fix few typos	2020-10-14 15:01:19 +02:00
Ines Montani	1aa8e8f2af	Update docs [ci skip]	2020-10-14 14:58:45 +02:00
Ines Montani	4d99d2b94a	Update docs [ci skip]	2020-10-13 11:38:52 +02:00
svlandeg	40276fd3be	update NEL docs after latest refactor	2020-10-12 11:41:27 +02:00
svlandeg	08cb085f6c	Merge remote-tracking branch 'upstream/develop' into fix/various	2020-10-09 17:01:27 +02:00
Ines Montani	97ff090e49	Fix docs example [ci skip]	2020-10-09 16:03:57 +02:00

1 2 3 4 5 ...

2537 Commits