spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-16 03:02:41 +03:00

Author	SHA1	Message	Date
Matt Upson	9a1d3b63fb	Add missing default to .set_extension (#2297 ) Failing to set a default, method, or getter results in a ValueError: ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0	2018-05-04 18:47:01 +02:00
Matthew Honnibal	2c4a6d66fa	Merge master into develop. Big merge, many conflicts -- need to review	2018-04-29 14:49:26 +02:00
Ines Montani	49cee4af92	💫 Interactive code examples, spaCy Universe and various docs improvements (#2274 ) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label	2018-04-29 02:06:46 +02:00
Matthew Honnibal	cca7e7ad11	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-29 20:27:06 +02:00
Matthew Honnibal	68ad366935	Improve train_new_entity_type example	2018-03-29 20:26:41 +02:00
ines	07b8c255a5	Updatee example with note to install requests	2018-03-28 12:46:27 +02:00
Matthew Honnibal	1f7229f40f	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `c9ba3d3c2d`, reversing changes made to `92c26a35d4`.	2018-03-27 19:23:02 +02:00
Justin DuJardin	4eeb178856	Add example using TensorBoard standalone projector - the tensorboard standalone project expects a different set of files than the plugin to TensorFlow.	2018-03-25 21:50:13 -07:00
ines	4ec2809eb5	Port over TensorBoard example	2018-03-24 17:15:48 +01:00
Matthew Honnibal	00557c5fdd	Add example of NER multitask objective	2018-01-21 19:46:37 +01:00
avinash	b379c9d7d3	typos corrected	2018-01-03 16:54:22 +05:30
mpuels	1e8147aec7	fix: Add missing period in train data	2017-12-13 10:51:05 +01:00
mpuels	ee4d6fdd40	Fix typo in comment	2017-12-09 13:14:57 +01:00
ines	726fb2d0b5	Use fewer iterations by default to avoid overfitting on blank model (resolves #1632 )	2017-11-23 15:27:12 +01:00
ines	ec08996000	Add note on tags matching tokenization (see #1613 )	2017-11-20 15:12:47 +01:00
ines	1a38575de3	Make example Python 2 compatible (see #1617 )	2017-11-20 13:57:51 +01:00
ines	7d5afadf5e	Update vectors_loc description	2017-11-17 14:57:11 +01:00
ines	c57e05bec1	Make sure nr_dim is an int In some languages (e.g. Dutch), the nr_dim is extracted as a byte string, causing an error down the line.	2017-11-17 14:56:27 +01:00
yogendrasoni	334ed433b2	rstrip line before rsplit loading english fast text giving error because line contains new line at the end and rsplit is splitting it incorrectly	2017-11-15 13:55:08 +05:30
Matthew Honnibal	f0e28e8ae5	Make fasttext reader accommodate whitespace	2017-11-12 12:07:13 +01:00
ines	f36fab39b0	Don't rename component in intent parser example (resolves #1551 ) Otherwise, the default saved model won't know that it's supposed to create spaCy's 'parser'.	2017-11-10 23:35:38 +01:00
Ines Montani	1a23a0f87e	Remove broken link (resolves #1541 )	2017-11-10 12:28:39 +01:00
ines	3597a29c24	Update fastText vectors example (see #1525 ) Add option to specify language, and add note on "lang" being required to save out model	2017-11-09 14:54:39 +01:00
ines	33b84f4c39	Change clear_vectors to reset_vectors (resolves #1516 )	2017-11-08 18:11:23 +01:00
ines	89bd40b821	Fix print statement in textcat training example (resolves #1515 )	2017-11-08 17:17:40 +01:00
ines	a09c096d3c	Get docs ready for v2.0.0	2017-11-07 12:00:43 +01:00
ines	173b1551af	Update examples	2017-11-07 01:22:30 +01:00
ines	1b1c9105b4	Update example compatibility statements	2017-11-07 01:11:45 +01:00
ines	8fb48b9b91	Update and document new util functions	2017-11-07 00:22:43 +01:00
Matthew Honnibal	d7016d4050	Update intent parser example	2017-11-06 23:31:11 +01:00
ines	fe498b3d5e	Update training examples to use "simple style"	2017-11-06 23:14:04 +01:00
ines	c646365e2f	Port over changes and add note on compat (see #1445 )	2017-11-06 13:58:34 +01:00
ines	2dca9e71a1	Add notes on catastrophic forgetting (see #1496 )	2017-11-06 13:17:02 +01:00
Matthew Honnibal	717e8124fb	Update Keras sentiment analysis example	2017-11-05 17:11:00 +01:00
Matthew Honnibal	cfb83c231c	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:08:19 +01:00
Matthew Honnibal	ba0201de07	Update multiprocessing example	2017-11-04 23:07:57 +01:00
ines	70a9504560	Add inbetween print statement	2017-11-04 23:06:55 +01:00
Matthew Honnibal	e033162a1d	Update tagger training example	2017-11-01 21:49:08 +01:00
ines	8f1d3fc3ee	Update textcat example	2017-11-01 17:09:22 +01:00
Matthew Honnibal	dad8f09fba	Fix print statements in text classifier example	2017-11-01 16:34:31 +01:00
ines	bfe17b7df1	Fix begin_training if get_gold_tuples is None	2017-11-01 13:14:31 +01:00
ines	0ca152a015	Fix syntax error	2017-11-01 00:43:28 +01:00
ines	4b196fdf7f	Fix formatting	2017-11-01 00:43:22 +01:00
ines	33af6ac69a	Use even smaller examle size 100 was still too much, so try 20 instead	2017-10-30 19:46:45 +01:00
ines	f02b0af821	Fix path and use smaller example size 500 was too larger and caused laggy rendering	2017-10-30 19:44:35 +01:00
ines	18dde7869a	Update training data docs and add vocab JSONL	2017-10-30 19:40:05 +01:00
ines	b5643d8575	Update intent parser docs and add to usage docs	2017-10-27 04:49:05 +02:00
ines	9dfca0f2f8	Add example for custom intent parser	2017-10-27 03:55:11 +02:00
ines	4d272e25ee	Fix examples	2017-10-27 03:55:04 +02:00
ines	44f83b35bc	Update pipeline component examples to use plac	2017-10-27 02:58:14 +02:00
ines	af28ca1ba0	Move example to pipeline directory	2017-10-27 02:00:01 +02:00
ines	1d69a46cd4	Update multi-processing example and add to docs	2017-10-27 01:58:55 +02:00
ines	4eabaafd66	Update docstring and example	2017-10-27 01:50:44 +02:00
ines	ed69bd69f4	Update parallel tagging example	2017-10-27 01:48:52 +02:00
ines	096a80170d	Remove old example files	2017-10-27 01:48:39 +02:00
ines	a7b9074b4c	Update textcat training example and docs	2017-10-27 00:48:45 +02:00
ines	b61866a2e4	Update textcat example	2017-10-27 00:32:19 +02:00
ines	f81cc0bd1c	Fix usage of disable_pipes	2017-10-27 00:31:30 +02:00
ines	b7b285971f	Update examples README	2017-10-26 18:47:11 +02:00
ines	cc2917c9e8	Update fastText example and add to examples in docs	2017-10-26 18:47:02 +02:00
ines	db843735d3	Remove outdated examples	2017-10-26 18:46:25 +02:00
ines	daed7ff8fe	Update information extraction examples	2017-10-26 18:46:11 +02:00
ines	bca5372fb1	Clean up examples	2017-10-26 17:32:59 +02:00
ines	f57043e6fe	Update docstring	2017-10-26 16:29:08 +02:00
ines	b90e958975	Update tagger and parser examples and add to docs	2017-10-26 16:27:42 +02:00
ines	f1529463a8	Update tagger training example	2017-10-26 16:19:02 +02:00
ines	e44bbb5361	Remove old example	2017-10-26 16:12:41 +02:00
ines	421c3837e8	Fix formatting	2017-10-26 16:11:25 +02:00
ines	4d896171ae	Use plac annotations for arguments	2017-10-26 16:11:20 +02:00
ines	c3b681e5fb	Use plac annotations for arguments and add n_iter	2017-10-26 16:11:05 +02:00
ines	bc2c92f22d	Use plac annotations for arguments	2017-10-26 16:10:56 +02:00
ines	b5c74dbb34	Update parser training example	2017-10-26 15:15:37 +02:00
ines	586b9047fd	Use create_pipe instead of importing the entity recognizer	2017-10-26 15:15:26 +02:00
ines	d425ede7e9	Fix example	2017-10-26 15:15:08 +02:00
ines	9d58673aaf	Update train_ner example for spaCy v2.0	2017-10-26 14:24:12 +02:00
ines	e904075f35	Remove stray print statements	2017-10-26 14:24:00 +02:00
ines	c30258c3a2	Remove old example	2017-10-26 14:23:52 +02:00
ines	615c315d70	Update train_new_entity_type example to use disable_pipes	2017-10-25 14:56:53 +02:00
ines	2b8e7c45e0	Use better training data JSON example	2017-10-24 16:00:56 +02:00
ines	9bf5751064	Pretty-print JSON	2017-10-24 12:22:17 +02:00
ines	6675755005	Add training data JSON example	2017-10-24 12:05:10 +02:00
Jeroen Bobbeldijk	84c6c20d1c	Fix #1444 : fix pipeline logic and wrong paramater in update call	2017-10-22 15:18:36 +02:00
Jeffrey Gerard	5ba970b495	minor cleanup	2017-10-12 12:34:46 -07:00
Jeffrey Gerard	39d3cbfdba	Bugfix example script train_ner_standalone.py, fails after training	2017-10-12 11:39:12 -07:00
ines	f4ae6763b9	Fix consistency of imports from spacy.tokens in examples	2017-10-11 02:30:40 +02:00
Matthew Honnibal	e0a9b02b67	Merge Span._ and Span.as_doc methods	2017-10-09 22:00:15 -05:00
ines	6679117000	Add pipeline component examples	2017-10-10 04:26:06 +02:00
Matthew Honnibal	e79fc41ff8	Merge pull request #1391 from explosion/feature/multilabel-textcat 💫 Fix multi-label support for text classification	2017-10-09 04:22:31 +02:00
Matthew Honnibal	563f46f026	Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.	2017-10-05 18:43:02 -05:00
Matthew Honnibal	056b08c0df	Delete obsolete nn_text_class example	2017-10-05 18:27:10 +02:00
Matthew Honnibal	f1b86dff8c	Update textcat example	2017-10-04 15:12:28 +02:00
Matthew Honnibal	79a94bc166	Update textcat exampe	2017-10-04 14:55:30 +02:00
Matthew Honnibal	cbb1fbef80	Update train_ner_standalone example	2017-10-03 18:49:38 +02:00
Matthew Honnibal	38286b6f07	Add example loadig Fast Text vectors	2017-10-01 23:40:02 +02:00
Matthew Honnibal	f92ab03dc8	Rename phrase matcher example	2017-09-20 22:51:58 +02:00
Matthew Honnibal	01858e9b59	Fix PhraseMatcher example	2017-09-20 22:51:41 +02:00
Matthew Honnibal	027a5d8b75	Update train_ner_standalone example	2017-09-15 10:36:46 +02:00
Matthew Honnibal	683d81bb49	Update example for adding entity type	2017-09-14 16:15:59 +02:00
Matthew Honnibal	c16ef0a85c	Clarify train textcat example	2017-07-29 21:59:27 +02:00
Matthew Honnibal	54a539a113	Finish text classifier example	2017-07-23 00:34:12 +02:00
Matthew Honnibal	2bc7d87c70	Add example for training text classifier	2017-07-22 20:15:32 +02:00
ines	992559bf9a	Fix formatting and remove unused imports	2017-06-01 12:47:18 +02:00
Matthew Honnibal	5c30466c95	Update NER training example	2017-05-31 13:42:12 +02:00
akYoung	c158cdb1da	Corretions for model test example The sentences of test data in sentence entailment example should be generated with integers limited to vocab_size.	2017-05-03 22:41:23 +08:00
Matthew Honnibal	2da16adcc2	Add dropout optin for parser and NER Dropout can now be specified in the `Parser.update()` method via the `drop` keyword argument, e.g. nlp.entity.update(doc, gold, drop=0.4) This will randomly drop 40% of features, and multiply the value of the others by 1. / 0.4. This may be useful for generalising from small data sets. This commit also patches the examples/training/train_new_entity_type.py example, to use dropout and fix the output (previously it did not output the learned entity).	2017-04-27 13:18:39 +02:00
Matthew Honnibal	0605b95f2e	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-18 13:48:00 +02:00
Matthew Honnibal	2f84626417	Fix train_new_entity_type example	2017-04-18 13:47:36 +02:00
Ines Montani	e7ae3b7cc2	Fix formatting and typo (closes #967 )	2017-04-16 23:56:12 +02:00
Ines Montani	734b0a4e4a	Update train_new_entity_type.py	2017-04-16 23:42:16 +02:00
ines	264af6cd17	Add documentation	2017-04-16 20:37:46 +02:00
ines	c7adca58a9	Tidy up example and only save/test if output_directory is not None	2017-04-16 16:55:01 +02:00
Matthew Honnibal	40e3024241	Move standalone NER training script into examples directory	2017-04-15 16:13:42 +02:00
Matthew Honnibal	b9c26aae11	Remove neptune refs from new train example	2017-04-15 16:13:17 +02:00
Matthew Honnibal	c729d72fc6	Add new example for training new entity types	2017-04-15 16:11:06 +02:00
Matthew Honnibal	a7626bd7fd	Tmp commit to example	2017-04-15 15:43:14 +02:00
Matthew Honnibal	97b83c74dc	WIP on training example	2017-04-14 23:54:27 +02:00
Kumaran Rajendhiran	3f55d6afae	Update README	2017-04-05 16:59:52 +05:30
Kumaran Rajendhiran	47d7137c83	Set max_length to 100 for demo and evaluate	2017-04-05 16:48:35 +05:30
Kumaran Rajendhiran	10e8dcdfdb	Remove not needed parameters from function	2017-04-05 16:20:47 +05:30
Matthew Honnibal	07726cf0a6	Add example of standalone NER training	2017-03-19 15:01:38 +01:00
Matthew Honnibal	f028f8ad28	Remove unfinished examples	2017-02-18 11:04:41 +01:00
Matthew Honnibal	c031c677cc	Remove unused model_dir option As noted in #845, the `model_dir` argument was not being used. I've removed it for now, although it would be good to have this option restored and working.	2017-02-18 10:38:22 +01:00
Matthew Honnibal	16ce7409e4	Merge branch 'master' of https://github.com/explosion/spaCy	2017-01-31 13:27:34 -06:00
Matthew Honnibal	80aa4e114b	Fix x keras deep learning example	2017-01-31 13:27:13 -06:00
Matthew Honnibal	ab70f6e18d	Update NER training example	2017-01-27 12:27:10 +01:00
Ines Montani	853130bcf8	Update installation instructions (see #727 )	2017-01-14 22:12:42 +01:00
Matthew Honnibal	5a319060b9	Merge branch 'master' of https://github.com/explosion/spaCy	2016-12-20 16:26:57 -06:00
Matthew Honnibal	7793e2ad82	Fix use of dropout in sentiment analysis LSTM example	2016-12-20 16:26:38 -06:00
Christos Savvopoulos	c19b83f6ae	use model_dir inside of load_model	2016-12-12 20:23:24 +00:00
Christos Savvopoulos	93cf4af701	actually commit load_ner.py	2016-12-12 20:13:33 +00:00
Christos Savvopoulos	ad54a929f8	train_ner should save vocab; add load_ner example	2016-12-12 20:09:49 +00:00
Matthew Honnibal	d0c999e0ad	Add config.py for paddle example	2016-11-20 23:24:51 +01:00
Matthew Honnibal	d75fe7c19a	Update paddle example	2016-11-20 21:45:08 +01:00
Matthew Honnibal	1ef541ddff	Add train.sh for paddle	2016-11-20 21:44:33 +01:00
Matthew Honnibal	001abe2b9d	Update config.py	2016-11-20 03:45:51 +01:00
Matthew Honnibal	409a18bd42	Add paddle sentiment example	2016-11-20 03:35:23 +01:00
Matthew Honnibal	e7eac08819	Work on paddle example	2016-11-20 03:29:36 +01:00
Matthew Honnibal	1ed40682a3	Set vectors in chainer example	2016-11-19 18:42:58 -06:00
Matthew Honnibal	b701a08249	Fix embedding in chainer sentiment example	2016-11-19 19:05:37 +01:00
Matthew Honnibal	8a2de46fcb	Fix GPU usage in chainer example	2016-11-19 10:58:00 -06:00
Matthew Honnibal	4c84aae571	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-19 02:41:17 -06:00
Matthew Honnibal	3195c52741	Add WIP Chainer sentiment analysis code.	2016-11-19 09:27:59 +01:00
Matthew Honnibal	ff5ab75f5e	Add partial embedding updates to Parikh model, fix dropout, other corrections.	2016-11-18 06:32:12 -06:00
Matthew Honnibal	718e66a7b9	Minibatch the forward pass. THe output argmax is incorrect...	2016-11-16 06:15:28 -06:00
Matthew Honnibal	8f053fd943	Add flag to toggle GPU to DyNet code	2016-11-16 05:51:00 -06:00
Matthew Honnibal	3a31c3a961	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-16 05:49:42 -06:00
Kyle P. Johnson	d105771a07	Add setup directions for data dir This script's data needs are not intuitive. I have added a note explaining that (a) it expects pos/neg polarity data, (b) the structure of the data dir (train/test), and (c) a standard resource for such polarity data.	2016-11-13 10:08:16 -08:00
Kyle P. Johnson	c8d3694e2d	Ch lex.repvec to lex.vector For preventing the AttributeError: `File "spacy/lexeme.pyx", line 159, in spacy.lexeme.Lexeme.repvec.__get__ (spacy/lexeme.cpp:5016) AttributeError: lex.repvec has been renamed to lex.vector`	2016-11-13 09:54:42 -08:00
Matthew Honnibal	389e8b700e	Fix conflict	2016-11-13 08:52:20 -06:00
Matthew Honnibal	12a7b05360	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-13 08:49:07 -06:00

1 2 3 4 5 ...

318 Commits