spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-21 11:29:13 +03:00

Author	SHA1	Message	Date
pmbaumgartner	040bb061fd	Merge branch 'master' of github.com:pmbaumgartner/spaCy	2019-07-14 20:25:37 -04:00
pmbaumgartner	9a86d95ea2	fix custom attribute links	2019-07-14 20:23:54 -04:00
Ines Montani	02e12b0852	Update landing with IRL videos [ci skip]	2019-07-12 13:36:47 +02:00
Ines Montani	40cd03fc35	Improve EntityRuler serialization	2019-07-10 12:25:45 +02:00
Ines Montani	4ebb4865fe	Update languages.json	2019-07-10 11:19:48 +02:00
Ines Montani	8721849423	Update Scorer.ents_per_type	2019-07-10 11:19:28 +02:00
Ines Montani	ebe58e7fa1	Document gold.docs_to_json [ci skip]	2019-07-10 10:27:33 +02:00
Ines Montani	881f5bc401	Auto-format	2019-07-10 10:27:29 +02:00
Björn Böing	205c73a589	Update tokenizer and doc init example (#3939 ) * Fix Doc.to_json hyperlink * Update tokenizer and doc init examples * Change "matchin rules" to "punctuation rules" * Auto-format	2019-07-10 10:16:48 +02:00
cedar101	58f06e6180	Korean support (#3901 ) * start lang/ko * add test codes * using natto-py * add test_ko_tokenizer_full_tags() * spaCy contributor agreement * external dependency for ko * collections.namedtuple for python version < 3.5 * case fix * tuple unpacking * add jongseong(final consonant) * apply mecab option * Remove Pipfile for now Co-authored-by: Ines Montani <ines@ines.io>	2019-07-09 22:23:16 +02:00
Björn Böing	04982ccc40	Update pretrain to prevent unintended overwriting of weight fil… (#3902 ) * Update pretrain to prevent unintended overwriting of weight files for #3859 * Add '--epoch-start' to pretrain docs * Add mising pretrain arguments to bash example * Update doc tag for v2.1.5	2019-07-09 21:48:30 +02:00
Joshua Smith	2eb925bd05	Added an argument to `EntityRuler` constructor to pass attrs to… (#3919 ) * Perserve flags in EntityRuler The EntityRuler (explosion/spaCy#3526) does not preserve overwrite flags (or `ent_id_sep`) when serialized. This commit adds support for serialization/deserialization preserving overwrite and ent_id_sep flags. * add signed contributor agreement * flake8 cleanup mostly blank line issues. * mark test from the issue as needing a model The test from the issue needs some language model for serialization but the test wasn't originally marked correctly. * Adds `phrase_matcher_attr` to allow args to PhraseMatcher This is an added arg to pass to the `PhraseMatcher`. For example, this allows creation of a case insensitive phrase matcher when the `EntityRuler` is created. References explosion/spaCy#3822 * remove unneeded model loading The model didn't need to be loaded, and I replaced it with a change that doesn't require it (using existings fixtures) * updated docstring for new argument * updated docs to reflect new argument to the EntityRuler constructor * change tempdir handling to be compatible with python 2.7 * return conflicted code to entityruler Some stuff got cut out because of merge conflicts, this returns that code for the phrase_matcher_attr. * fixed typo in the code added back after conflicts * flake8 compliance When I deconflicted the branch there were some flake8 issues introduced. This resolves the spacing problems. * test changes: attempts to fix flaky test in python3.5 These tests seem to be alittle flaky in 3.5 so I changed the check to avoid the comparisons that seem to be fail sometimes.	2019-07-09 20:09:17 +02:00
Ines Montani	4f1dae1c6b	Update languages and examples (see #1107 )	2019-06-26 16:19:17 +02:00
Ines Montani	d361e380b8	Fix matcher callback example (closes #3862 )	2019-06-26 14:47:26 +02:00
Guillaume Claret	d7a519a922	Typo (#3865 ) * Typo * Add contributor agreement	2019-06-20 10:31:19 +02:00
Björn Böing	ebf5a04d6c	Update pretrain docs and add unsupported loss_func error (#3860 ) * Add error to `get_vectors_loss` for unsupported loss function of `pretrain` * Add missing "--loss-func" argument to pretrain docs. Update pretrain plac annotations to match docs. * Add missing quotation marks	2019-06-20 10:30:44 +02:00
Alejandro Alcalde	4866a7ee9e	Changed learning rate by its param name. (#3855 ) * Changed learning rate by its param name. I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful. * Signing SCA	2019-06-20 10:29:20 +02:00
Ines Montani	81c12640ab	Auto-format [ci skip]	2019-06-16 14:33:20 +02:00
Greg Werner	9041a72d7f	Update tokenizer.md for construction example (#3790 ) * Update tokenizer.md for construction example Self contained example. You should really say what nlp is so that the example will work as is * Update CONTRIBUTOR_AGREEMENT.md * Restore contributor agreement * Adjust construction examples	2019-06-16 14:32:56 +02:00
BreakBB	d8573ee715	Update error raising for CLI pretrain to fix #3840 (#3843 ) * Add check for empty input file to CLI pretrain * Raise error if JSONL is not a dict or contains neither `tokens` nor `text` key * Skip empty values for correct pretrain keys and log a counter as warning * Add tests for CLI pretrain core function make_docs. * Add a short hint for the `tokens` key to the CLI pretrain docs * Add success message to CLI pretrain * Update model loading to fix the tests * Skip empty values and do not create docs out of it	2019-06-16 13:22:57 +02:00
Motoki Wu	9c064e6ad9	Add resume logic to spacy pretrain (#3652 ) * Added ability to resume training * Add to readmee * Remove duplicate entry	2019-06-12 13:29:23 +02:00
Ines Montani	511977ae5e	Update universe [ci skip]	2019-06-04 11:15:51 +02:00
Ramanan Balakrishnan	eb12703d10	minor fix to broken link in documentation (#3819 ) [ci skip]	2019-06-04 11:15:35 +02:00
Ines Montani	62ebc65c62	Update universe [ci skip]	2019-06-03 12:19:13 +02:00
Ines Montani	e703301129	Update universe [ci skip]	2019-06-02 13:55:55 +02:00
Ines Montani	892e72451f	Update universe [ci skip]	2019-06-02 12:58:12 +02:00
Ines Montani	42de5be90c	Tidy up universe [ci skip]	2019-06-02 12:38:48 +02:00
Nirant	638caba9b5	Add multiple packages to universe.json (#3809 ) [ci skip] * Add multiple packages to universe.json Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER * Auto-format * Update slogan (probably just copy-paste mistake) * Adjust formatting * Update tags / categories	2019-06-02 12:35:52 +02:00
Nirant	d4d1eab5e1	Add Baderlab/saber to universe.json (#3806 )	2019-06-01 17:36:40 +02:00
Ines Montani	6be7d07315	Update UNIVERSE.md	2019-06-01 16:37:06 +02:00
Ines Montani	0c74506c9c	Fix typos in docs (closes #3802 ) [ci skip]	2019-06-01 11:35:01 +02:00
Nipun Sadvilkar	1f13005751	Incorrect Token attribute ent_iob_ description (#3800 ) * Incorrect Token attribute ent_iob_ description * Add spaCy contributor agreement	2019-05-31 16:50:45 +02:00
Ramanan Balakrishnan	26c37c5a4d	fix all references to BILUO annotation format (#3797 )	2019-05-31 12:19:19 +02:00
mak	89379a7fa4	Corrected example model URL in requirements.txt (#3786 ) The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).	2019-05-29 10:51:55 +02:00
Ines Montani	7634812172	Document Language.evaluate	2019-05-24 14:06:36 +02:00
Ines Montani	45e6855550	Update Language.update docs	2019-05-24 14:06:26 +02:00
Ines Montani	b78a8dc1d2	Update Scorer and add API docs	2019-05-24 14:06:04 +02:00
Ines Montani	321c9f5acc	Fix lex_id docs (closes #3743 )	2019-05-16 23:15:58 +02:00
Ines Montani	f96af8526a	Merge branch 'spacy.io' [ci skip]	2019-05-11 23:03:56 +02:00
Ines Montani	7534f7cb44	Fix return value of Language.update (closes #3692 )	2019-05-11 18:40:19 +02:00
Ines Montani	503b8c85f1	Add TWiML podcast to universe [ci skip]	2019-05-11 17:48:22 +02:00
Ines Montani	0daf2422a3	Auto-format	2019-05-11 17:48:07 +02:00
devforfu	21af12eb53	Make "text" key in JSONL format optional when "tokens" key is provided (#3721 ) * Fix issue with forcing text key when it is not required * Extending the docs to reflect the new behavior	2019-05-11 15:41:29 +02:00
Ines Montani	6cfa1e1f47	Fix DependencyParser.predict docs (resolves #3561 )	2019-05-11 15:37:54 +02:00
Ines Montani	25f5592d57	Improve Token.prob and Lexeme.prob docs (resolves #3701 )	2019-05-11 15:23:41 +02:00
Aaron Kub	719a15f23d	fixing regex matcher examples (#3708 ) (#3719 )	2019-05-10 14:23:52 +02:00
Ines Montani	65b55f1aaa	Add version tag to `--base-model` argument (closes #3720 )	2019-05-10 14:06:47 +02:00
richardpaulhudson	a1e07f0d14	Request to include Holmes in spaCy Universe (#3685 ) * Request to add Holmes to spaCy Universe Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model. * Added	2019-05-08 02:42:03 +02:00
Ines Montani	505c9e0e19	Add util.filter_spans helper (#3686 )	2019-05-08 02:33:40 +02:00
Bram Vanroy	8e6f8deaf6	Re-added Universe readme (#3688 ) (closes #3680 )	2019-05-06 21:08:01 +02:00
Ines Montani	b4d142e3c4	Adjust wording and formatting [ci skip]	2019-05-03 12:00:31 +02:00
d5555	ba4bcbf285	Update universe.json (#3653 ) [ci skip] * Update universe.json * Update universe.json	2019-05-03 11:50:12 +02:00
张晓飞	ba1ff00370	update response after calling add_pipe (#3661 ) * update response after calling add_pipe component:print_info is appened in the last, so need show it at the end of pipeline * Create henry860916.md	2019-05-01 12:02:18 +02:00
Ramiro Gómez	8ee4100f8f	Remove dangling M (#3657 ) I assume this is a typo. Sorry if it has a meaning that I'm not aware of.	2019-04-29 19:44:43 +02:00
Amit Chaudhary	167d63af31	Fix broken link to Dive Into Python 3 website (#3656 ) * Fix broken link to Dive Into Python 3 website * Sign spaCy Contributor Agreement	2019-04-29 19:44:00 +02:00
Brad Jascob	6fcafcc564	Doc changes for local website setup (#3651 )	2019-04-27 13:28:23 +02:00
Ivan Tham	fa94f83697	Improve redundant variable name (#3643 ) * Improve redundant variable name * Apply suggestions from code review Co-Authored-By: pickfire <pickfire@riseup.net>	2019-04-26 16:50:14 +02:00
Ines Montani	dc87fb805d	Merge branch 'master' of https://github.com/explosion/spaCy	2019-04-26 13:17:57 +02:00
Ines Montani	62060ae9c6	Merge branch 'spacy.io'	2019-04-26 13:17:52 +02:00
Brad Jascob	9afa0d6723	Update Universe Website for pyInflect (#3641 )	2019-04-26 13:17:36 +02:00
Ines Montani	db7c0dbfd6	Update seo.js	2019-04-23 18:39:30 +02:00
Ines Montani	ec0d840ab5	Document early stopping	2019-04-22 14:31:32 +02:00
Ines Montani	1d567913f9	Update spacy evaluate example	2019-04-22 14:28:42 +02:00
Ines Montani	7917ce2f73	Make flag shortcut consistent and document	2019-04-22 14:23:44 +02:00
Ines Montani	52658c80d5	Allow jupyter=False to override Jupyter mode (closes #3598 )	2019-04-22 14:18:32 +02:00
Motoki Wu	8e2cef49f3	Add save after `--save-every` batches for `spacy pretrain` (#3510 ) <!--- Provide a general summary of your changes in the title. --> When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches. ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> To test... Save this file to `sample_sents.jsonl` ``` {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} {"text": "hello there."} ``` Then run `--save-every 2` when pretraining. ```bash spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2 ``` And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name. At the end the training, you should see these files (`ls here/`): ```bash config.json model2.bin model5.bin model8.bin log.jsonl model2.temp.bin model5.temp.bin model8.temp.bin model0.bin model3.bin model6.bin model9.bin model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin model1.bin model4.bin model7.bin model1.temp.bin model4.temp.bin model7.temp.bin ``` ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> This is a new feature to `spacy pretrain`. 🌵 Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error). ``` Processing matcher.pyx [Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx' Traceback (most recent call last): File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module> run(args.root) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run process(base, filename, db) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp") File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd func(args) File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx raise Exception("Cython failed") Exception: Cython failed Traceback (most recent call last): File "setup.py", line 276, in <module> setup_package() File "setup.py", line 209, in setup_package generate_cython(root, "spacy") File "setup.py", line 132, in generate_cython raise RuntimeError("Running cythonize failed") RuntimeError: Running cythonize failed ``` Edit: Fixed! after deleting all `.cpp` files: `find spacy -name ".cpp" \| xargs rm` ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-04-22 14:10:16 +02:00
Ines Montani	7937109ed9	Update link [ci skip]	2019-04-19 16:01:41 +02:00
Ines Montani	0dce4585b1	Add course to 101	2019-04-19 15:59:51 +02:00
Ines Montani	2efc87c382	Remove unused image	2019-04-19 15:48:12 +02:00
Ines Montani	38395d9518	Merge branch 'spacy.io'	2019-04-19 15:26:20 +02:00
Ines Montani	7ac5bb0a7b	Update landing and feature overview	2019-04-19 15:23:08 +02:00
fizban99	f2f2df6e78	entity types for colors should be in uppercase (#3599 ) although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.	2019-04-17 11:22:56 +02:00
Ines Montani	5289dd1356	Fix formatting	2019-04-13 17:58:26 +02:00
Ines Montani	9e7deeaf48	Remove Datacamp	2019-04-13 17:46:32 +02:00
oterrier	2854724e69	Added project gracyql to Universe (#3570 ) (resolves #3568 ) As discussed with Ines in https://github.com/explosion/spaCy/issues/3568 , adding a new project proposal for the community in SpaCy Universe website GracyQL a tiny graphql wrapper aroung spacy using graphene and starlette. ## Description Change only in universe.json file to add a new project ### Types of change New project reference in Universe ## Checklist - [x ] I have submitted the spaCy Contributor Agreement. - [x ] I ran the tests, and all new and existing tests passed. - [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-04-10 17:54:42 +02:00
Santiago Castro	86e4b68aa9	Fix website docs for Vectors.from_glove (#3565 ) * Fix website docs for Vectors.from_glove * Add myself as a contributor	2019-04-10 15:23:27 +02:00
Bharat Raghunathan	72820896d4	Fix typo in web docs cli.md (#3559 )	2019-04-09 11:40:03 +02:00
pierremonico	0d26bfe677	Removes duplicate in table (#3550 ) * Removes duplicate in table Just fixing typos. * Remove newline Co-authored-by: Ines Montani <ines@ines.io>	2019-04-08 10:30:42 +02:00
Piero Molino	5198aa4ae6	Added Ludwig among the projects (#3548 ) [ci skip] * Added Ludwig among the projects * Create w4nderlust.md * Add Uber to logo wall	2019-04-07 13:01:26 +02:00
Ines Montani	2f0f439c54	Remove non-existent example (closes #3533 )	2019-04-03 09:59:17 +02:00
Ines Montani	b070e0caf7	Update landing.js	2019-03-30 22:26:46 +01:00
Ines Montani	9d1221943b	Merge branch 'master' into spacy.io	2019-03-30 20:32:14 +01:00
Ines Montani	037ffdfd3f	Add spaCy IRL to landing [ci skip]	2019-03-30 20:32:03 +01:00
Ines Montani	730f759b4f	Merge branch 'master' into spacy.io	2019-03-28 15:26:17 +01:00
Ines Montani	7d033a7b89	Fix met a description in universe projects [ci skip]	2019-03-28 15:26:01 +01:00
Ines Montani	fe2cb642ac	Merge branch 'master' into spacy.io	2019-03-28 15:13:39 +01:00
David	74e738dd4d	adds textpipe to universe (#3500 ) [ci skip] * Adds textpipe to universe * signed contributor agreement * Adjust formatting, code style and use "standalone" category	2019-03-28 15:13:19 +01:00
Ines Montani	04a9fb1a02	Merge branch 'master' into spacy.io	2019-03-28 13:34:46 +01:00
Samuel Kane	06a1846379	fix(util): fix decaying function output (#3495 ) * fix(util): fix decaying function output * fix(util): better test and adhere to code standards * fix(util): correct variable name, pytestify test, update website text	2019-03-28 13:24:47 +01:00
Bharat Raghunathan	1db3e47509	DOC: Update tokenizer docs to include default value for batch_size in pipe (#3492 )	2019-03-28 12:48:02 +01:00
Ines Montani	2ed16d82bf	Fix social image	2019-03-26 18:27:40 +01:00
Ines Montani	9e14b2b69f	Add Estonian to docs [ci skip] (closes #3482 )	2019-03-25 18:01:54 +01:00
Ines Montani	21ade53ef7	Merge branch 'master' into spacy.io	2019-03-25 13:05:00 +01:00
Ines Montani	db938ab0e3	Update favicon (closes #3475 ) [ci skip]	2019-03-25 13:04:47 +01:00
Ines Montani	c8c1baaea8	Update binderVersion	2019-03-25 12:17:03 +01:00
Ines Montani	200d8bdb3c	Merge branch 'spacy.io' [ci skip]	2019-03-23 16:46:34 +01:00
Ines Montani	1e5b917d75	Fix formatting [ci skip]	2019-03-23 16:45:50 +01:00
Matthew Honnibal	6c783f8045	Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs	2019-03-23 16:44:44 +01:00
Ines Montani	5944cf10c7	Add blog post to v2.1 page	2019-03-23 16:34:23 +01:00
Ines Montani	ffebdad08d	Add cheat sheet to spaCy 101	2019-03-23 16:32:55 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00
Ines Montani	dcd6e06c47	Improve landing example [ci skip]	2019-03-22 19:02:15 +01:00
Ines Montani	a841324034	Update landing example [ci skip]	2019-03-22 18:50:00 +01:00
Ines Montani	b532386a60	Fix typo [ci skip]	2019-03-22 18:36:17 +01:00
Ines Montani	d8533f0149	Update Binder [ci skip]	2019-03-22 18:16:46 +01:00
Christos Aridas	9cee3f702a	Add missing space in landing page (#3462 ) [ci skip]	2019-03-22 15:17:35 +01:00
Ines Montani	5073ce63fd	Merge branch 'spacy.io' [ci skip]	2019-03-22 15:17:11 +01:00
Ines Montani	0712efc6b3	Update version requirements [ci skip]	2019-03-21 10:23:54 +01:00
Ines Montani	764359c952	Merge branch 'master' into spacy.io	2019-03-20 17:24:28 +01:00
Ines Montani	dac8f8ff99	Update Span.__init__ docs (see #3445 ) [ci skip]	2019-03-20 17:24:17 +01:00
Ines Montani	f7b5ff7907	Move netlify.toml to root	2019-03-19 14:40:14 +01:00
Ines Montani	c6ee030721	Fix docsearch	2019-03-19 14:38:49 +01:00
Ines Montani	0155083e01	Update netlify.toml	2019-03-19 14:07:00 +01:00
Ines Montani	d4eed4a84f	Add note on unicode build to troubleshooting guide (see #3421 ) [ci skip]	2019-03-19 10:27:02 +01:00
Ines Montani	42d4b818e4	Redirect Netlify URL	2019-03-19 10:17:56 +01:00
Ines Montani	1ee97bc282	Add page title fallback, just in case	2019-03-18 18:58:55 +01:00
Ines Montani	728ae7651b	Fix universe page titles if no separate title is set	2019-03-18 18:58:46 +01:00
Ines Montani	a20d3772fd	FIx responsive landing	2019-03-18 16:24:52 +01:00
Ines Montani	08284f3a11	💫 v2.1.0 launch updates (only merge on launch!) (#3414 ) * Update README.md * Use production docsearch [ci skip] * Add option to exclude pages from search	2019-03-18 16:07:26 +01:00
Ines Montani	a611b32fbf	Update model docs [ci skip]	2019-03-17 11:48:18 +01:00
Matthew Honnibal	62afa64a8d	Expose batch size and length caps on CLI for pretrain (#3417 ) Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`. Also improve CLI output. Closes #3216 ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-16 21:38:45 +01:00
Ines Montani	2c5dd4d602	Update Vectors.find docs [ci skip]	2019-03-16 17:10:57 +01:00
Ines Montani	fa0f501165	Use dev DocSearch index	2019-03-15 14:48:38 +01:00
Ines Montani	8af7d01382	Fix general-purpose IDs	2019-03-15 14:48:26 +01:00
Ines Montani	cbcba699dd	Fix missing ids	2019-03-14 17:56:53 +01:00
Ines Montani	cffe63ea24	Fix :target padding for ids	2019-03-14 17:41:02 +01:00
Ines Montani	51b7b88acf	Generate active sidebar heading (h0) at compile time	2019-03-14 17:20:51 +01:00
Ines Montani	4ab1871a75	Add search-exclude classes	2019-03-14 16:51:29 +01:00
Ines Montani	59bbf85986	Add id to body	2019-03-14 16:51:18 +01:00
Ines Montani	6e07750dd8	Fix class name	2019-03-14 11:52:31 +01:00
Ines Montani	a0813b93e0	Server-side render is-active for crawler	2019-03-14 11:46:27 +01:00
Ines Montani	39ace04b55	Fix active style	2019-03-14 11:46:13 +01:00
Ines Montani	4cfe4aa224	Fix small issues in the docs [ci skip]	2019-03-12 22:57:15 +01:00
Ines Montani	ba7eb2d131	Update section [ci skip]	2019-03-12 16:18:34 +01:00
Ines Montani	cecc31b765	Don't auto-slugify accordion links [ci skip]	2019-03-12 15:30:49 +01:00
Ines Montani	d842d5698e	Tidy up website and add eslint config [ci skip]	2019-03-12 15:21:58 +01:00
Ines Montani	72fb324d95	Add vector training script to bin [ci skip]	2019-03-12 12:07:56 +01:00
Ines Montani	3abf0e6b9f	Replace dev-resources links with real examples	2019-03-12 12:07:40 +01:00
Ines Montani	59c0620487	Auto-format	2019-03-12 12:07:11 +01:00
Ines Montani	1664d1fa62	Update universe [ci skip]	2019-03-12 11:13:03 +01:00
Ines Montani	cdd418b93e	Auto-format [ci skip]	2019-03-11 17:10:50 +01:00
Matthew Honnibal	b0b990e405	Fix token.conjuncts (closes #795 ) (#3392 ) * Implement conjuncts method * Add span.conjuncts property * Un-xfail token.conjuncts tests * Update docs for token.conjuncts and span.conjuncts * Fix merge error in token.conjuncts	2019-03-11 17:05:45 +01:00
Ines Montani	25cb764e64	Document new API [ci skip]	2019-03-11 15:23:53 +01:00
Ines Montani	ebcf2bb1c3	Add Doc.lang and Doc.lang_	2019-03-11 14:21:40 +01:00
Ines Montani	7c05ca01e8	💫 Support mutable default values for extension attributes (#3389 ) * Support mutable default values in extensions * Update documentation	2019-03-11 12:50:44 +01:00
Matthew Honnibal	98acf5ffe4	💫 Allow passing of config parameters to specific pipeline components (#3386 ) * Add component_cfg kwarg to begin_training * Document component_cfg arg to begin_training * Update docs and auto-format * Support component_cfg across Language * Format * Update docs and docstrings [ci skip] * Fix begin_training	2019-03-10 23:36:47 +01:00
Ines Montani	8dbf1e9037	Also fix #3387 on develop	2019-03-10 23:36:28 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Ines Montani	9a8f169e5c	Update v2-1.md	2019-03-10 18:58:51 +01:00
Ines Montani	0426689db8	💫 Improve Doc.to_json and add Doc.is_nered (#3381 ) * Use default return instead of else * Add Doc.is_nered to indicate if entities have been set * Add properties in Doc.to_json if they were set, not if they're available This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.	2019-03-10 15:24:34 +01:00
Ines Montani	76764fcf59	💫 Improve converters and training data file formats (#3374 ) * Populate converter argument info automatically * Add conversion option for msgpack * Update docs * Allow reading training data from JSONL	2019-03-08 23:15:23 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Ines Montani	fa7314b221	Clarify train_path and dev_path format (see #3366 ) [ci skip]	2019-03-07 12:23:27 +01:00
Ines Montani	e9babd9973	Update hyperparameters section (see #3352 )	2019-03-06 14:40:30 +01:00
Ines Montani	48a206a95f	Fix displaCy visualizations in docs (closes #3357 ) [ci skip]	2019-03-06 13:20:44 +01:00
Ines Montani	5eadf61327	Update pretraining docs on file format (closes #3354 )	2019-03-04 16:30:13 +00:00
Ines Montani	1d4ba7678f	Auto-format [ci skip]	2019-02-27 12:07:35 +01:00
Matthew Honnibal	f1d77eb140	💫 Improve handling of missing NER tags (closes #2603 ) (#3341 ) * Improve handling of missing NER tags GoldParse can accept missing NER tags, if entities is provided in BILUO format (rather than as spans). Missing tags can be provided as None values. Fix bug that occurred when first tag was a None value. Closes #2603. * Document specification of missing NER tags.	2019-02-27 12:06:32 +01:00
Ines Montani	c478a2ccb6	Update backwards incompat [ci skip]	2019-02-27 11:56:56 +01:00
Ines Montani	d7217513c9	Merge branch 'spacy.io' into develop [ci skip]	2019-02-27 11:42:10 +01:00
Matthew Honnibal	4a3371acd5	Make doc[0].is_sent_start == True (closes #2869 ) (#3340 ) * Make doc[0] have sent_start True. Closes #2869 * Document that doc[0].is_sent_start defaults True.	2019-02-27 11:17:17 +01:00
Ines Montani	cb481aa1fe	Merge branch 'spacy.io' into develop [ci skip]	2019-02-26 16:51:22 +01:00
Ines Montani	2579ecbb63	Merge branch 'spacy.io' into develop [ci skip]	2019-02-25 21:41:51 +01:00
Ines Montani	3379ebcaa4	Fix default prop [ci skip]	2019-02-25 20:29:11 +01:00
Ines Montani	e711969e3b	Add more human-readable class names [ci skip]	2019-02-25 20:22:40 +01:00
Ines Montani	162bd4d75b	💫 Add Algolia DocSearch (#3332 ) * Add Algolia DocSearch * Add human-readable selector for teaser	2019-02-25 20:11:11 +01:00
Ines Montani	1b6238101a	Add table explaining training metrics [closes #2644 ]	2019-02-25 10:03:43 +01:00
Ines Montani	1981b194cc	Fix recomputing of :target [ci skip] Prevents additional history entry	2019-02-25 10:03:20 +01:00
Ines Montani	d0b3af9222	Fix remaining inaccuracies in API docs (closes #2329 )	2019-02-24 22:21:25 +01:00
Ines Montani	49d0938038	Update version [ci skip]	2019-02-24 22:01:47 +01:00
Ines Montani	62b558ab72	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 ) * Fix formatting and whitespace * Add support for lexical attributes (closes #2390) * Document lexical attribute setting during retokenization * Assign variable oputside of nested loop	2019-02-24 21:13:51 +01:00
Ines Montani	aa52305461	Improve pipeline model and meta example [ci skip]	2019-02-24 18:45:39 +01:00
Ines Montani	df19e2bff6	💫 Allow setting of custom attributes during retokenization (closes #3314 ) (#3324 ) <!--- Provide a general summary of your changes in the title. --> ## Description This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter and a setter implemented. ```python Token.set_extension('is_musician', default=False) doc = nlp("I like David Bowie.") with doc.retokenize() as retokenizer: attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}} retokenizer.merge(doc[2:4], attrs=attrs) assert doc[2].text == "David Bowie" assert doc[2].lemma_ == "David Bowie" assert doc[2]._.is_musician ``` ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-24 18:38:47 +01:00
Ines Montani	403b9cd58b	Add docs on adding to existing tokenizer rules [ci skip]	2019-02-24 18:35:19 +01:00
Ines Montani	1ea1bc98e7	Document regex utilities [ci skip]	2019-02-24 18:34:10 +01:00
Ines Montani	09bf08b3c3	Update redirects [ci skip]	2019-02-24 13:37:50 +01:00
Ines Montani	dceca3264d	Tidy up package.json [ci skip]	2019-02-24 13:37:41 +01:00
Ines Montani	46ec5cdccc	Update TextCategorizer docs	2019-02-24 13:11:57 +01:00
Ines Montani	c03cb1cc63	Improve built-in component API docs	2019-02-24 13:11:49 +01:00
Ines Montani	383e2e1f12	Update Python versions [ci skip]	2019-02-24 11:49:45 +01:00
Ines Montani	b624cb4b89	Update v2-1.md	2019-02-24 11:49:27 +01:00
Ines Montani	250e88ef55	Fix docs example (see #2728 )	2019-02-21 14:22:06 +01:00
Ines Montani	0fc908d7a5	Add note on merging speed in v2.1 (see #3300 ) [ci skip]	2019-02-21 12:34:18 +01:00
Ines Montani	236aa94ded	Update v2-1.md	2019-02-21 12:33:56 +01:00
Sofie	9a478b6db8	Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 ) * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib	2019-02-20 22:10:13 +01:00
Ines Montani	f73d01aa32	Update netlify.toml [ci skip]	2019-02-20 14:33:32 +01:00
Ines Montani	da5edbe434	Tidy up	2019-02-20 14:33:23 +01:00
Ines Montani	57ae71ea95	Add docs on serializing the pipeline (see #3289 ) [ci skip]	2019-02-18 14:13:29 +01:00
Ines Montani	38e4422c0d	Improve matcher example (resolves #3287 )	2019-02-18 13:26:37 +01:00
Ines Montani	660cfe44c5	Fix formatting	2019-02-18 13:26:22 +01:00
Ines Montani	c5476bd75b	Update languages.json	2019-02-18 10:03:35 +01:00
Ines Montani	212ff359ef	Fix links [ci skip]	2019-02-17 22:25:50 +01:00
Ines Montani	04b4df0ec9	Remove n_threads	2019-02-17 22:25:42 +01:00
Ines Montani	4c7ab7620a	Update README.md	2019-02-17 22:16:17 +01:00
Ines Montani	8a8523d8c1	Update README.md	2019-02-17 21:59:52 +01:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
Ines Montani	0184a95340	Merge branch 'master' into develop	2019-02-12 18:29:24 +01:00
Ines Montani	5dd39d8697	Update universe.json	2019-02-12 18:05:51 +01:00
Abhijit Balaji	75a40f56fc	added spacy-langdetect to universe.json (#3266 )	2019-02-12 18:04:38 +01:00
Ines Montani	8ad15a2377	Fix typo [ci skip]	2019-02-08 17:29:53 +01:00
Ines Montani	7a985cba24	Fix typo (closes #3232 ) [ci skip]	2019-02-08 17:29:18 +01:00
Ines Montani	5d0b60999d	Merge branch 'master' into develop	2019-02-07 20:54:07 +01:00
PierreMonico	114d64c4b5	Fix typo (#3223 )	2019-02-04 11:37:29 +01:00
adrianeboyd	03d58f9feb	Update TIGER/German dependency relations in documentation (#3204 ) * Add missing dependency relations for TIGER/German * Contributor agreement for adrianeboyd	2019-01-30 14:23:12 +01:00
Bram Vanroy	11cee62644	Updated spacy_conll information (#3158 )	2019-01-16 13:46:16 +01:00
Álvaro Abella Bascarán	1cd8f9823f	Correct docs of `Token.subtree` and `Span.subtree` (issue #3122 ) (#3124 ) * solve inconsistency between docs and Span.subtree (issue #3122) * solve inconsistency between docs and Token.subtree (issue #3122)	2019-01-09 03:11:15 +01:00
Mathieu Morey	f07b577fbd	Support CUDA 10 (#3126 ) * ENH support CUDA 10 * Update _instructions.jade	2019-01-09 03:10:45 +01:00
alvations	f43338a4c5	Joblib site has moved. (#3118 )	2019-01-05 13:10:54 +01:00
Matthew Honnibal	63b7accd74	💫 Make span.as_doc() return a copy, not a view. Closes #1537 (#3107 ) Initially span.as_doc() was designed to return a view of the span's contents, as a Doc object. This was a nice idea, but it fails due to the token.idx property, which refers to the character offset within the string. In a span, the idx of the first token might not be 0. Because this data is different, we can't have a view --- it'll be inconsistent. This patch changes span.as_doc() to instead return a copy. The docs are updated accordingly. Closes #1537 * Update test for span.as_doc() * Make span.as_doc() return a copy. Closes #1537 * Document change to Span.as_doc()	2018-12-30 15:17:46 +01:00
Sofie	b7916fffcf	Fixing few typos in the documentation (#3103 ) * few typos / small grammatical errors corrected in documentation * one more typo * one last typo	2018-12-28 15:52:26 +01:00
Ines Montani	2dc6c52ccc	Update displayed Binder version (see #3077 ) [ci skip]	2018-12-20 17:36:19 +01:00
Ines Montani	ca244f5f84	Small fixes to displaCy (#3076 ) ## Description - [x] fix auto-detection of Jupyter notebooks (even if `jupyter=True` isn't set) - [x] add `displacy.set_render_wrapper` method to define a custom function called around the HTML markup generated in all calls to `displacy.render` (can be used to allow custom integrations, callbacks and page formatting) - [x] add option to customise host for web server - [x] show warning if `displacy.serve` is called from within Jupyter notebooks - [x] move error message to `spacy.errors.Errors`. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-12-20 17:32:04 +01:00
Ines Montani	61d09c481b	Merge branch 'master' into develop	2018-12-18 13:48:10 +01:00
Ines Montani	8c0f0f50bc	Use nlp.make_doc instead of nlp for patterns [ci skip]	2018-12-08 11:56:01 +01:00
Aki Ariga	7fcd6419ff	Upadate the document for Unidic link with latest version URL (#3022 ) * Upadate Unidic link for latest version in document This patch improves #3017 . The link for Unidic was old version one, so will the lates version. * Add contributor agreement * Use more specific link for unidic-cwj	2018-12-07 17:24:48 +01:00
Ines Montani	27905a7b14	Remove reference to cuda10 in docs (closes #2894 ) [ci skip]	2018-12-06 16:05:37 +01:00
Gavriel Loria	9c8c4287bf	Accept iob2 and allow generic whitespace (#2999 ) * accept non-pipe whitespace as delimiter; allow iob2 filename * added small documentation note for IOB2 allowance * added contributor agreement	2018-12-06 15:50:25 +01:00
Paul O'Leary McCann	b36f6eabfb	Add note that Unidic is required for Japanese (#3017 ) This addresses #3001. -POLM	2018-12-06 15:14:10 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Gavriel Loria	919729d38c	replace user-facing references to "sbd" with "sentencizer" (#2985 ) ## Description Fixes #2693 Previously, the tokens `sbd` and `sentencizer` would create the same nlp pipe. Internally, both would be called `sbd`. This setup became problematic because it was hard for a user relying on the `sentencizer` pipe name to realize that their pipe's name would be `sbd` for all functions other than creating a pipe. This PR intends to change the API and API documentation to fully support `sentencizer` and drop any user-facing references to `sbd`. ### Types of change end-user API bug ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-30 21:22:40 +01:00
Ines Montani	add6469225	Add "new in v2.0.12" note to Span.ents (closes #2986 )	2018-11-30 20:50:55 +01:00
Ines Montani	37c7c85a86	💫 New JSON helpers, training data internals & CLI rewrite (#2932 ) * Support nowrap setting in util.prints * Tidy up and fix whitespace * Simplify script and use read_jsonl helper * Add JSON schemas (see #2928) * Deprecate Doc.print_tree Will be replaced with Doc.to_json, which will produce a unified format * Add Doc.to_json() method (see #2928) Converts Doc objects to JSON using the same unified format as the training data. Method also supports serializing selected custom attributes in the doc._. space. * Remove outdated test * Add write_json and write_jsonl helpers * WIP: Update spacy train * Tidy up spacy train * WIP: Use wasabi for formatting * Add GoldParse helpers for JSON format * WIP: add debug-data command * Fix typo * Add missing import * Update wasabi pin * Add missing import * 💫 Refactor CLI (#2943) To be merged into #2932. ## Description - [x] refactor CLI To use [`wasabi`](https://github.com/ines/wasabi) - [x] use [`black`](https://github.com/ambv/black) for auto-formatting - [x] add `flake8` config - [x] move all messy UD-related scripts to `cli.ud` - [x] make converters function that take the opened file and return the converted data (instead of having them handle the IO) ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Update wasabi pin * Delete old test * Update errors * Fix typo * Tidy up and format remaining code * Fix formatting * Improve formatting of messages * Auto-format remaining code * Add tok2vec stuff to spacy.train * Fix typo * Update wasabi pin * Fix path checks for when train() is called as function * Reformat and tidy up pretrain script * Update argument annotations * Raise error if model language doesn't match lang * Document new train command	2018-11-30 20:16:14 +01:00
wxv	06820ef6e7	Fix is_ascii documentation and create contributor file (#2988 ) Proposed in #2933	2018-11-30 15:57:58 +01:00
Ben Batorsky	658f7e0dc8	OntoNotes url fix (#2981 ) The website for OntoNotes 5 is: https://catalog.ldc.upenn.edu/LDC2013T19, currently the named entity section has it as https://catalog.ldc.upenn.edu/ldc2013T19.	2018-11-29 19:34:30 +01:00
Ines Montani	d33953037e	💫 Port master changes over to develop (#2979 ) * Create aryaprabhudesai.md (#2681) * Update _install.jade (#2688) Typo fix: "models" -> "model" * Add FAC to spacy.explain (resolves #2706) * Remove docstrings for deprecated arguments (see #2703) * When calling getoption() in conftest.py, pass a default option (#2709) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement * update bengali token rules for hyphen and digits (#2731) * Less norm computations in token similarity (#2730) * Less norm computations in token similarity * Contributor agreement * Remove ')' for clarity (#2737) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know. * added contributor agreement for mbkupfer (#2738) * Basic support for Telugu language (#2751) * Lex _attrs for polish language (#2750) * Signed spaCy contributor agreement * Added polish version of english lex_attrs * Introduces a bulk merge function, in order to solve issue #653 (#2696) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions * Describe converters more explicitly (see #2643) * Add multi-threading note to Language.pipe (resolves #2582) [ci skip] * Fix formatting * Fix dependency scheme docs (closes #2705) [ci skip] * Don't set stop word in example (closes #2657) [ci skip] * Add words to portuguese language _num_words (#2759) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Update Indonesian model (#2752) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file * Fixed spaCy+Keras example (#2763) * bug fixes in keras example * created contributor agreement * Adding French hyphenated first name (#2786) * Fix typo (closes #2784) * Fix typo (#2795) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer" * Adding basic support for Sinhala language. (#2788) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement * Also include lowercase norm exceptions * Fix error (#2802) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement * Add charlax's contributor agreement (#2805) * agreement of contributor, may I introduce a tiny pl languge contribution (#2799) * Contributors agreement * Contributors agreement * Contributors agreement * Add jupyter=True to displacy.render in documentation (#2806) * Revert "Also include lowercase norm exceptions" This reverts commit `70f4e8adf3`. * Remove deprecated encoding argument to msgpack * Set up dependency tree pattern matching skeleton (#2732) * Fix bug when too many entity types. Fixes #2800 * Fix Python 2 test failure * Require older msgpack-numpy * Restore encoding arg on msgpack-numpy * Try to fix version pin for msgpack-numpy * Update Portuguese Language (#2790) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language * Correct error in spacy universe docs concerning spacy-lookup (#2814) * Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround * Fix typo (closes #2815) [ci skip] * Update regex version dependency * Set version to 2.0.13.dev3 * Skip seemingly problematic test * Remove problematic test * Try previous version of regex * Revert "Remove problematic test" This reverts commit `bdebbef455`. * Unskip test * Try older version of regex * 💫 Update training examples and use minibatching (#2830) <!--- Provide a general summary of your changes in the title. --> ## Description Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results. ### Types of change enhancements ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Visual C++ link updated (#2842) (closes #2841) [ci skip] * New landing page * Add contribution agreement * Correcting lang/ru/examples.py (#2845) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file * Set version to 2.0.13.dev4 * Add Persian(Farsi) language support (#2797) * Also include lowercase norm exceptions * Remove in favour of https://github.com/explosion/spaCy/graphs/contributors * Rule-based French Lemmatizer (#2818) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> - Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version. - Add several files containing exhaustive list of words for each part of speech - Add some lemma rules - Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX - Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned - Modify the lemmatize function to check in lookup table as a last resort - Init files are updated so the model can support all the functionalities mentioned above - Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [X] I have submitted the spaCy Contributor Agreement. - [X] I ran the tests, and all new and existing tests passed. - [X] My changes don't require a change to the documentation, or if they do, I've added all required information. * Set version to 2.0.13 * Fix formatting and consistency * Update docs for new version [ci skip] * Increment version [ci skip] * Add info on wheels [ci skip] * Adding "This is a sentence" example to Sinhala (#2846) * Add wheels badge * Update badge [ci skip] * Update README.rst [ci skip] * Update murmurhash pin * Increment version to 2.0.14.dev0 * Update GPU docs for v2.0.14 * Add wheel to setup_requires * Import prefer_gpu and require_gpu functions from Thinc * Add tests for prefer_gpu() and require_gpu() * Update requirements and setup.py * Workaround bug in thinc require_gpu * Set version to v2.0.14 * Update push-tag script * Unhack prefer_gpu * Require thinc 6.10.6 * Update prefer_gpu and require_gpu docs [ci skip] * Fix specifiers for GPU * Set version to 2.0.14.dev1 * Set version to 2.0.14 * Update Thinc version pin * Increment version * Fix msgpack-numpy version pin * Increment version * Update version to 2.0.16 * Update version [ci skip] * Redundant ')' in the Stop words' example (#2856) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. * Documentation improvement regarding joblib and SO (#2867) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * raise error when setting overlapping entities as doc.ents (#2880) * Fix out-of-bounds access in NER training The helper method state.B(1) gets the index of the first token of the buffer, or -1 if no such token exists. Normally this is safe because we pass this to functions like state.safe_get(), which returns an empty token. Here we used it directly as an array index, which is not okay! This error may have been the cause of out-of-bounds access errors during training. Similar errors may still be around, so much be hunted down. Hunting this one down took a long time...I printed out values across training runs and diffed, looking for points of divergence between runs, when no randomness should be allowed. * Change PyThaiNLP Url (#2876) * Fix missing comma * Add example showing a fix-up rule for space entities * Set version to 2.0.17.dev0 * Update regex version * Revert "Update regex version" This reverts commit `62358dd867`. * Try setting older regex version, to align with conda * Set version to 2.0.17 * Add spacy-js to universe [ci-skip] * Add spacy-raspberry to universe (closes #2889) * Add script to validate universe json [ci skip] * Removed space in docs + added contributor indo (#2909) * - removed unneeded space in documentation * - added contributor info * Allow input text of length up to max_length, inclusive (#2922) * Include universe spec for spacy-wordnet component (#2919) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement * Minor formatting changes [ci skip] * Fix image [ci skip] Twitter URL doesn't work on live site * Check if the word is in one of the regular lists specific to each POS (#2886) * 💫 Create random IDs for SVGs to prevent ID clashes (#2927) Resolves #2924. ## Description Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.) ### Types of change bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix typo [ci skip] * fixes symbolic link on py3 and windows (#2949) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com> * Fix formatting * Update universe [ci skip] * Catalan Language Support (#2940) * Catalan language Support * Ddding Catalan to documentation * Sort languages alphabetically [ci skip] * Update tests for pytest 4.x (#2965) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize)) - [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here) ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix regex pin to harmonize with conda (#2964) * Update README.rst * Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) Fixes #2976 * Fix typo * Fix typo * Remove duplicate file * Require thinc 7.0.0.dev2 Fixes bug in gpu_ops that would use cupy instead of numpy on CPU * Add missing import * Fix error IDs * Fix tests	2018-11-29 16:30:29 +01:00
Ines Montani	c80c20e1ec	Sort languages alphabetically [ci skip]	2018-11-26 15:37:53 +01:00
Marc Puig	98fe1ab259	Catalan Language Support (#2940 ) * Catalan language Support * Ddding Catalan to documentation	2018-11-26 15:25:47 +01:00
Ines Montani	1844bc238a	Update universe [ci skip]	2018-11-26 14:16:22 +01:00
Ines Montani	696acb0f92	Fix typo [ci skip]	2018-11-24 15:20:57 +01:00
Ines Montani	dfcc8f02af	Fix image [ci skip] Twitter URL doesn't work on live site	2018-11-14 01:01:33 +01:00
Ines Montani	1aa91e926f	Minor formatting changes [ci skip]	2018-11-13 23:59:59 +01:00
Francisco Aranda	be99f1cac5	Include universe spec for spacy-wordnet component (#2919 ) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement	2018-11-13 23:54:46 +01:00
mikelibg	75e7d503b7	Removed space in docs + added contributor indo (#2909 ) * - removed unneeded space in documentation * - added contributor info	2018-11-08 14:18:25 +01:00
Ines Montani	11db4d2f27	Add script to validate universe json [ci skip]	2018-11-06 12:50:41 +01:00
Ines Montani	a9fda638a9	Add spacy-raspberry to universe (closes #2889 )	2018-11-06 12:45:50 +01:00
Ines Montani	c235ddf44f	Add spacy-js to universe [ci-skip]	2018-11-06 12:45:03 +01:00
Bram Vanroy	071789467e	Documentation improvement regarding joblib and SO (#2867 ) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-24 15:19:17 +02:00
Roman	5766d09a5b	Redundant ')' in the Stop words' example (#2856 ) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-18 10:21:16 +02:00
Ines Montani	c6a320cad4	Update version [ci skip]	2018-10-15 16:42:35 +02:00
Ines Montani	f02bb08f39	Update prefer_gpu and require_gpu docs [ci skip]	2018-10-14 23:30:44 +02:00
Ines Montani	5a4c5b78a8	Update GPU docs for v2.0.14	2018-10-14 16:38:12 +02:00
Ines Montani	ac4cadd31d	Add info on wheels [ci skip]	2018-10-14 00:04:37 +02:00
Ines Montani	30aa7f8b20	Increment version [ci skip]	2018-10-13 23:55:50 +02:00
Ines Montani	23d5b4ff5b	Update docs for new version [ci skip]	2018-10-13 23:53:33 +02:00
Ines Montani	f0e7da6478	Fix formatting and consistency	2018-10-13 23:53:26 +02:00
Jacopo Farina	42c42376a3	Visual C++ link updated (#2842 ) (closes #2841 ) [ci skip] * New landing page * Add contribution agreement	2018-10-12 14:59:45 +02:00
Ines Montani	7806deceb4	Fix typo (closes #2815 ) [ci skip]	2018-10-01 10:49:29 +02:00
Ioannis Daras	405a826436	Correct error in spacy universe docs concerning spacy-lookup (#2814 )	2018-10-01 10:24:50 +02:00
Charles-Axel Dein	014dd47c70	Add jupyter=True to displacy.render in documentation (#2806 )	2018-09-27 12:28:04 +02:00
Pranshu Jethmalani	9fd27d777e	Fix typo (#2795 ) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer"	2018-09-25 12:12:40 +02:00
Ines Montani	3c4e3ade30	Fix typo (closes #2784 )	2018-09-21 10:45:11 +02:00
Ines Montani	5001d31be6	Don't set stop word in example (closes #2657 ) [ci skip]	2018-09-12 15:36:51 +02:00
Ines Montani	4e89cfaae1	Fix dependency scheme docs (closes #2705 ) [ci skip]	2018-09-12 15:32:26 +02:00
Ines Montani	0729d1edca	Fix formatting	2018-09-12 15:32:08 +02:00
Ines Montani	907df53904	Add multi-threading note to Language.pipe (resolves #2582 ) [ci skip]	2018-09-12 15:03:30 +02:00
Ines Montani	885691a7ab	Describe converters more explicitly (see #2643 )	2018-09-12 14:53:03 +02:00
Steve Sharp	ca747f58a4	Update _install.jade (#2688 ) Typo fix: "models" -> "model"	2018-08-22 13:16:04 +02:00
Ines Montani	aeb49eb625	Update version [ci skip]	2018-08-16 16:56:02 +02:00
Ines Montani	a0eacd3293	Merge branch 'master' into develop	2018-08-16 16:55:05 +02:00
Ines Montani	c0fa9903f4	Update model directory JS [ci skip] Prevent the default release URL from being overwritten and add license type	2018-08-16 16:54:50 +02:00
Ines Montani	03f661fefb	Add Greek to models directory [ci skip]	2018-08-16 16:51:56 +02:00
Ines Montani	fd9d175a53	Update live code [ci skip]	2018-08-15 15:28:48 +02:00
Matthew Honnibal	4336397ecb	Update develop from master	2018-08-14 03:04:28 +02:00
Wojciech Łukasiewicz	3953e967a0	User correct variable name in the examples (#2664 ) * correct naming * add contributor agreement	2018-08-13 22:21:24 +02:00
Ines Montani	71723cece1	Add note on visualizing long texts ans sentences (see #2636 ) [ci skip]	2018-08-08 15:28:21 +02:00
Ines Montani	6147bd3eb4	Fix link target (closes #2645 ) [ci skip]	2018-08-08 15:03:52 +02:00
Ines Montani	8c47da1f19	Update Language serialization docs (see #2628 ) [ci skip] Add note on using from_disk and from_bytes via subclasses and add example	2018-08-07 14:17:57 +02:00
Matthew Honnibal	664cfc29bc	Merge branch 'master' of https://github.com/explosion/spaCy	2018-08-07 10:49:39 +02:00
Matthew Honnibal	2278c9734e	Fix spelling error #2640	2018-08-07 10:49:21 +02:00
Xiaoquan Kong	f0c9652ed1	New Feature: display more detail when Error E067 (#2639 ) * Fix off-by-one error * Add verbose option * Update verbose option * Update documents for verbose option	2018-08-07 10:45:29 +02:00
Ines Montani	6a4360e425	Update universe [ci skip]	2018-08-02 17:33:08 +02:00
Sami	dbc993f5b3	Updating description and code snippet spacy-lefff (#2623 ) * updating description and code snippet spacy-lefff * contributors agreement	2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav	d3e21aad64	Update _benchmarks.jade (#2618 )	2018-08-02 00:28:28 +02:00
Brian Phillips	8227de0099	Update language.jade (#2616 )	2018-07-31 12:34:42 +02:00
Ioannis Daras	055cc0de44	Bug fix to pseudocode for tokenizer customization (#2604 )	2018-07-27 11:04:12 +02:00
Andriy Mulyar	e9ef51137d	Fixed typo (#2596 ) Changed 'The index of the first character after the span.' to The index of the last character after the span' in description of doc.char_span	2018-07-25 22:17:15 +02:00
Ines Montani	75f3234404	💫 Refactor test suite (#2568 ) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-24 23:38:44 +02:00
kororo	b1ec827ee0	Fix typo (#2579 ) Update slogan, desc and code snippet to latest version	2018-07-24 22:47:33 +02:00
ines	cd687091fb	Remove nl examples from widget for now [ci skip] Restore for next spaCy version when path to example sentences is fixed	2018-07-24 22:41:20 +02:00
ines	2d8ffb8bcd	Fix formatting	2018-07-24 22:40:49 +02:00
ines	1b3da8d2ae	Update website for v2.0.12 [ci skip]	2018-07-24 21:04:22 +02:00
ines	ae5ed2d698	Update docs for v2.0.12 [ci skip]	2018-07-21 15:51:44 +02:00
ines	d517dd4297	Document remove_extension methods	2018-07-21 15:51:28 +02:00
ines	153f41a5cc	Use better examples for Doc extension methods	2018-07-21 15:51:11 +02:00
ines	3c30d1763c	Merge branch 'master' into develop	2018-07-21 15:34:18 +02:00
kororo	2784babef9	Add ExcelCy into Universe list (#2572 ) Hi guys, This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made. ## Description ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe. ### Types of change Update to Universe list in website. ## Checklist - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-19 19:28:33 +02:00
ines	80e7485630	Merge branch 'master' into develop	2018-07-18 17:28:47 +02:00
Xiang Ji	19a5ef1c58	Fix venv command examples (#2560 ) [ci skip] * Fix venv command examples The documentation refers to `venv`, which is native to Python3. However, the command examples are as if they were still `virtualenv`, which is a package independent of `venv`: - It doesn't need to be installed via `pip`. In fact `pip install venv` would return an error. - The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would return command not found. See https://docs.python.org/3/library/venv.html I suspect the documentation simply replaced all occurrences of `virtualenv` with `venv`. However they are different modules and are used differently. * Update comment [ci skip]	2018-07-18 10:31:24 +02:00
ines	50c367ee96	Update meta [ci skip]	2018-07-10 13:51:45 +02:00
ines	3a321e79ac	Merge branch 'master' into develop	2018-07-10 13:49:08 +02:00
ines	71bfc92913	Exclude models for non-stable versions [ci skip]	2018-07-10 13:44:55 +02:00
ines	b5200962c0	Adjust formatting [ci skip]	2018-07-09 18:35:46 +02:00
Alex Villarreal	bd35bf7f09	Guidance to handle binary files in git in Windows (#2526 ) Adds guidance on what to do if users encounter the error described in [1634](https://github.com/explosion/spaCy/issues/1634), which probably only happens in Windows environments.	2018-07-09 18:31:37 +02:00
ines	f575b01595	Update language and license meta [ci skip]	2018-07-04 15:09:36 +02:00
ines	63666af328	Merge branch 'master' into develop	2018-07-04 14:52:25 +02:00
Matthew Honnibal	a85620a731	Note CoreNLP tokenizer correction on website	2018-07-02 11:35:31 +02:00
ines	06c6dc6fbc	Update Juniper [ci skip]	2018-06-28 11:48:17 +02:00
Nipun Sadvilkar	741ba80bd5	Train model command n_iteration 20 -> 30 (#2454 ) In source code `train.py` default Number of iterations is 30	2018-06-18 11:57:08 +02:00
ines	53a2bc8c8d	Only scroll sidebar item into view if needed [ci skip]	2018-06-12 10:58:50 +02:00
ines	65713a6593	Increment versions [ci skip]	2018-06-12 10:49:50 +02:00
Ines Montani	968f6f0bda	💫 Document Cython API (#2433 ) ## Description This PR adds the most relevant documentation of spaCy's Cython API. (Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.) ### Types of change docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-06-11 17:47:46 +02:00
GolanLevy	72d7e80f94	adding a missing apostrophe (#2436 )	2018-06-11 17:47:24 +02:00
ines	778e5f4da3	Merge branch 'master' into develop	2018-06-11 00:38:04 +02:00
himkt	57311d5d47	replace janome with mecab in the documentation and the test (#2415 ) * Add links to Reddit data (see #2401) * replace janome with mecab in the documentation and the test * add the assignment	2018-06-11 00:33:13 +02:00
ines	effb55d591	Adjust formatting [ci skip]	2018-06-11 00:29:13 +02:00
Nathan Breit	ba6d2cf393	Add EpiTator to Universe (#2429 )	2018-06-11 00:24:13 +02:00
himkt	1a568f2e08	fix wrong documentations (#2423 )	2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi	d66292f767	fix UD data file extensions (#2425 ) * fix UD data files extension * add contributor agreement for msklvsk	2018-06-08 14:26:11 +02:00
ines	a0017e4909	Merge branch 'master' into develop	2018-05-30 14:10:47 +02:00
ines	0baaf836cf	Update formatting [ci skip]	2018-05-30 13:32:49 +02:00
ines	3913e18201	Add self-attentive-parser to universe (see #59 )	2018-05-30 13:31:28 +02:00
ines	4a62486340	Merge branch 'master' into develop	2018-05-30 13:01:01 +02:00
ines	605c663a4c	Fix HTML merger examples (see #2390 )	2018-05-30 12:22:32 +02:00
ines	d0b16aa014	Update list of languages	2018-05-26 18:56:26 +02:00
Samuel Pouyt	5f988b8e9c	Update _custom.jade (#2372 ) It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`	2018-05-26 18:17:12 +02:00
ines	d84a830d79	Merge branch 'master' of https://github.com/explosion/spaCy	2018-05-26 17:57:05 +02:00
ines	fb923b31ea	Fix bad HTML example (see #2376 ) and turn it into section on matcher + components Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.	2018-05-26 17:57:02 +02:00
Shantam Raj	592834183a	corrected spelling (#2359 ) changed interpretted to interpreted	2018-05-24 13:29:52 +02:00
ines	8adb967e0c	Fix from source quickstart instructions for Windows See: https://stackoverflow.com/a/50478036/6400719	2018-05-24 12:42:16 +02:00
Shantam Raj	1a4682dd0b	Update _training.jade (#2340 ) * Update _training.jade Correcting grammar. Replacing "The" with "To". * Create armsp.md * Update armsp.md	2018-05-21 11:09:33 +02:00
ines	ff1082d8e4	Add version tag in CLI docs [ci skip]	2018-05-21 01:17:49 +02:00
Ines Montani	d4cc736b7c	💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346 ) * Go back to using requests instead of urllib (closes #2320) Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey. * Only download model if not installed (see #1456) Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience. * Pass additional options to pip when installing model (resolves #1456) Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example: python -m spacy download en --user * Add CLI option to enable installing model package dependencies * Revert "Add CLI option to enable installing model package dependencies" This reverts commit `9336ffe695`. * Update documentation	2018-05-20 20:26:56 +02:00
vishnumenon	ae3719ece5	Fix the code for FACILITIY entities (#2324 ) * Fix the code for FACILITIY entities As far as I can tell, the default models all use "FAC" rather than "FACILITY" * Added my Contributor Agreement * Rename vishnumenon to vishnumenon.md	2018-05-12 15:19:17 +02:00
ines	ac25bc4016	Add docs section on sentence segmentation [ci skip]	2018-05-07 21:25:20 +02:00
ines	14148cd147	Fix formatting and wording	2018-05-07 21:24:35 +02:00
ines	f803da609f	Add scattertext [ci skip]	2018-05-07 19:10:23 +02:00
ines	c9547b7b8b	Update Juniper (see #2293 )	2018-05-03 15:36:02 +02:00
Alex Villarreal	647f2544c5	Fix code sample for span.set_extension (#2286 )	2018-05-03 00:39:22 +02:00
Alex Villarreal	13d562e1a4	Fix code sample for Doc.set_extension (#2282 ) * Fix code sample for `set_extension` The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text. * Contributor agreement	2018-05-02 10:16:05 +02:00
Shirish Kadam	d98a90440f	Added Adam project to spaCy Universe (#2275 ) * Added 5hirish to contributors * Added Adam Qas Project to spaCy Universe * Remove $ from code example	2018-04-30 22:25:01 +02:00
ines	56e7faf16b	Fix spacing	2018-04-30 22:24:40 +02:00
ines	6efb4cdf88	Use Juniper and tidy up	2018-04-30 18:48:35 +02:00
ines	45bb8d75a5	Fix overflow issues on small screens [ci skip]	2018-04-29 03:17:36 +02:00
Ines Montani	49cee4af92	💫 Interactive code examples, spaCy Universe and various docs improvements (#2274 ) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label	2018-04-29 02:06:46 +02:00
ines	a512fa60ef	Remove upcoming option from docs for now	2018-04-28 23:32:18 +02:00
ines	6fb6371670	Add collapse_phrases option to displacy (closes #2266 )	2018-04-28 23:06:50 +02:00
Matt Upson	87cc6b3599	Add missing comma to NN example in docs (#2255 ) Also add a completed contributor agreement.	2018-04-28 14:56:00 +02:00
ines	4a3bea00c7	Update resources [ci skip]	2018-04-26 22:10:34 +02:00
Pradeep Kumar Tippa	df389e5b74	spacy-101 vocab doc giving valid variable names (#2236 )	2018-04-18 14:54:26 -07:00
ines	ce63f8997b	Update init-model docs	2018-04-10 21:42:54 +02:00
ines	0e847d7fe5	Fix typo	2018-04-09 14:51:14 +02:00
ines	de137fba84	Add TensorBoard examples to examples overview [ci skip]	2018-04-03 16:01:52 +02:00
ines	6d87b28f15	Add Vietnamese to language overview [ci skip]	2018-04-03 16:01:36 +02:00
ines	9615ed5ed7	Update emoji/hashtag matcher example (resolves #2156 ) [ci skip]	2018-03-28 18:41:28 +02:00
ines	ce6071ca89	Remove ftfy dependency and update docs	2018-03-28 12:09:42 +02:00
ines	5ecc60cf3b	Add book to resources [ci skip]	2018-03-24 17:12:56 +01:00
ines	53680642af	Port over docs changes [ci skip]	2018-03-24 17:12:48 +01:00
Matthew Honnibal	f9f46e5a07	Revert matcher fixes from GregDubbin	2018-02-18 10:59:28 +01:00
ines	612c79a4f5	Update first matcher example and match_id (resolves #1989 )	2018-02-17 11:57:38 +01:00
ines	ca56fb53d1	Add user survey to navigation [ci skip]	2018-02-15 12:14:30 +01:00
ines	cab5b775e7	Document ENT_TYPE matcher attribute [ci skip]	2018-02-15 12:14:19 +01:00
Pradeep Kumar Tippa	416cd021ce	Added TAG from spacy symbols which used below	2018-02-09 19:16:59 +05:30
Pradeep Kumar Tippa	01cc9cd9c0	assert statement syntax fix in doc	2018-02-09 19:16:25 +05:30
Pradeep Kumar Tippa	a78062e466	Merge remote-tracking branch 'upstream/master' into web-doc-patches	2018-02-09 19:13:19 +05:30
ines	ab33e274f5	Add more details on symlink error & Windows solution (resolves #1941 ) [ci skip]	2018-02-09 10:43:33 +01:00
ines	8eaa934382	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-09 10:23:36 +01:00
ines	e9f67be04d	Fix regex flag matcher example (resolves #1950 )	2018-02-09 10:23:33 +01:00
ines	fc4ae04c55	Document LENGTH attribute in matcher	2018-02-09 10:23:03 +01:00
Pradeep Kumar Tippa	8a7467b26e	Merge remote-tracking branch 'upstream/master' into web-doc-patches	2018-02-09 13:54:26 +05:30
Orion Montoya	24af6375db	update link to Honnibal and Johnson 2015 aclweb.org is throwing a gateway timeout on the link as `https`+`aclweb.org`, but is fine with `https`+`www.aclweb.org` (also with `http`+`aclweb.org`, but let's keep it in `https`, shall we?	2018-02-08 10:49:09 -08:00
Pradeep Kumar Tippa	03113d6779	Fixing navigating parse tree doc under dependency parse	2018-02-08 19:34:15 +05:30
ines	a3b965b29d	Remove UPPER from Matcher attributes docs (resolves #1949 )	2018-02-08 11:29:27 +01:00
ines	696ae87b47	Fix whitespace	2018-02-08 11:28:54 +01:00
ines	26bc75134d	Fix typo	2018-02-08 11:28:44 +01:00
Pradeep Kumar Tippa	da9d687e75	Fixing typo from taining to training	2018-02-07 16:49:25 +05:30
Pradeep Kumar Tippa	ed7d268e93	Fixing vocab doc Replacing "like" with "love", coffee suffix should be "fee" but not "ffe"	2018-02-07 14:55:12 +05:30
ines	f377c483e4	Add note on manual entity order in displaCy [ci skip]	2018-02-07 01:08:42 +01:00
ines	58eb178667	Update Doc.char_span docs [ci skip]	2018-02-07 01:08:30 +01:00
sayf eddine hammemi	86e7727855	Fix typo in the word build.	2018-02-04 20:48:45 +01:00
ines	901bc0e85f	Add Persian to list of languages [ci skip]	2018-02-01 04:47:34 +01:00
Hassan Shamim	a0b912c528	fix broken link to test suite models	2018-01-30 15:01:01 -08:00
greg	daefed0a34	Correct documentation of '+' and '*' ops	2018-01-22 15:55:44 -05:00
ines	67ba73351d	Fix typo and use better serialization example (resolves #1851 ) [ci skip]	2018-01-16 18:42:03 +01:00
ines	7943a8e90c	Add spacy-lookup by @mpuig [ci skip]	2018-01-16 00:28:46 +01:00
ines	5684206154	Add LanguageCrunch by @artpar [ci skip]	2018-01-15 16:14:26 +01:00
Mateusz Tatusko	dda0e58c11	Update _pos-tags.jade really small changes to English tags description, but might help some people while working on projects 1) -PRB- should be -RRB- instead 2) space gets tagged as _SP, and not SP	2018-01-15 12:01:51 +09:00
ines	0536e91564	Add note on Tagger.tag_names vs. Tagger.labels (see #1666 ) [ci skip]	2018-01-14 14:37:19 +01:00
ines	bbee48080d	Clarify hyperparameters and alias usage in spacy train (resolves #1838 ) [ci skip]	2018-01-14 14:32:50 +01:00
ines	4daba3abda	Add regex section to rule-based matching docs (see #1567 , #1833 ) [ci skip]	2018-01-14 14:22:13 +01:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
ines	cfac5b955f	Fix aligment issues with newsletter signup form	2018-01-12 22:06:44 +01:00
ines	65babd9e2e	Fix typo, formatting and operator descriptions (resolves #1820 )	2018-01-12 22:06:27 +01:00
Matthew Honnibal	a2a06dce24	Merge pull request #1792 from explosion/feature-improve-model-download 💫 Improve model downloading and linking	2018-01-11 20:02:08 +01:00
Ines Montani	11676b47f2	Merge pull request #1828 from wrathagom/patch-1 Small Grammar Fix to _basics.jade	2018-01-11 17:27:23 +00:00
pbnsilva	4cfd848bc3	Fixes typo in PhraseMatcher API docs	2018-01-11 17:35:59 +01:00
Caleb M. Keller	e68f6bf890	Small Grammar Fix to _basics.jade Fixed an incorrect word order.	2018-01-11 09:26:47 -05:00
Matthew Honnibal	7ca49c2061	Merge branch 'master' into feature-improve-model-download	2018-01-10 18:21:55 +01:00
Kit	db6e4ba72e	Update code example according to new changes	2018-01-08 03:45:56 +01:00
ines	ef210c73dd	Update cli.download and cli.validate docs	2018-01-03 21:34:03 +01:00
ines	cc9df10e69	Document util.set_lang_class (see #1737 )	2018-01-03 20:13:25 +01:00
Ines Montani	874f174ab1	Merge pull request #1790 from nirdesh37/patch-1 Update goldparse.jade	2018-01-03 18:37:07 +00:00
ines	1fa6ba8130	Fix Doc.from_array example to make it work (see #1527 )	2018-01-03 16:59:38 +01:00
ines	49635350f0	Add .from_disk() to pipeline component init example (resolves #1728 )	2018-01-03 16:50:24 +01:00
ines	95063ba26b	Update tests documentation (resolves #1781 )	2018-01-03 16:42:26 +01:00
nirdesh37	67fdceed6a	Update goldparse.jade	2018-01-03 17:25:21 +05:30
Martin Andrews	e4355dade2	Documentation example fix : token.head needs '==' rather than 'is' (similar change to #1689, it seems).	2017-12-18 18:12:10 +08:00
Kristofer Berggren	1cb8c997fb	Fix typo Span -> Token on Token API page Change Span.vector_norm to Token.vector_norm.	2017-12-17 20:32:19 +08:00
Ines Montani	4befd8bd44	Merge pull request #1724 from mpuels/patch-7 doc: Fix minor mistakes	2017-12-17 12:09:17 +00:00
ines	21482b391b	Fix head	2017-12-16 13:48:19 +01:00
mpuels	b3df2a2ffd	doc: Fix minor mistakes	2017-12-14 20:55:59 +01:00

... 6 7 8 9 10 ...

1712 Commits