spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-26 01:46:28 +03:00

Author	SHA1	Message	Date
Ines Montani	57ae71ea95	Add docs on serializing the pipeline (see #3289 ) [ci skip]	2019-02-18 14:13:29 +01:00
Ines Montani	38e4422c0d	Improve matcher example (resolves #3287 )	2019-02-18 13:26:37 +01:00
Ines Montani	660cfe44c5	Fix formatting	2019-02-18 13:26:22 +01:00
Ines Montani	c5476bd75b	Update languages.json	2019-02-18 10:03:35 +01:00
Ines Montani	212ff359ef	Fix links [ci skip]	2019-02-17 22:25:50 +01:00
Ines Montani	04b4df0ec9	Remove n_threads	2019-02-17 22:25:42 +01:00
Ines Montani	4c7ab7620a	Update README.md	2019-02-17 22:16:17 +01:00
Ines Montani	8a8523d8c1	Update README.md	2019-02-17 21:59:52 +01:00
Ines Montani	e597110d31	💫 Update website (#3285 ) <!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in straightforward Markdown without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-02-17 19:31:19 +01:00
Ines Montani	0184a95340	Merge branch 'master' into develop	2019-02-12 18:29:24 +01:00
Ines Montani	5dd39d8697	Update universe.json	2019-02-12 18:05:51 +01:00
Abhijit Balaji	75a40f56fc	added spacy-langdetect to universe.json (#3266 )	2019-02-12 18:04:38 +01:00
Ines Montani	8ad15a2377	Fix typo [ci skip]	2019-02-08 17:29:53 +01:00
Ines Montani	7a985cba24	Fix typo (closes #3232 ) [ci skip]	2019-02-08 17:29:18 +01:00
Ines Montani	5d0b60999d	Merge branch 'master' into develop	2019-02-07 20:54:07 +01:00
PierreMonico	114d64c4b5	Fix typo (#3223 )	2019-02-04 11:37:29 +01:00
adrianeboyd	03d58f9feb	Update TIGER/German dependency relations in documentation (#3204 ) * Add missing dependency relations for TIGER/German * Contributor agreement for adrianeboyd	2019-01-30 14:23:12 +01:00
Bram Vanroy	11cee62644	Updated spacy_conll information (#3158 )	2019-01-16 13:46:16 +01:00
Álvaro Abella Bascarán	1cd8f9823f	Correct docs of `Token.subtree` and `Span.subtree` (issue #3122 ) (#3124 ) * solve inconsistency between docs and Span.subtree (issue #3122) * solve inconsistency between docs and Token.subtree (issue #3122)	2019-01-09 03:11:15 +01:00
Mathieu Morey	f07b577fbd	Support CUDA 10 (#3126 ) * ENH support CUDA 10 * Update _instructions.jade	2019-01-09 03:10:45 +01:00
alvations	f43338a4c5	Joblib site has moved. (#3118 )	2019-01-05 13:10:54 +01:00
Matthew Honnibal	63b7accd74	💫 Make span.as_doc() return a copy, not a view. Closes #1537 (#3107 ) Initially span.as_doc() was designed to return a view of the span's contents, as a Doc object. This was a nice idea, but it fails due to the token.idx property, which refers to the character offset within the string. In a span, the idx of the first token might not be 0. Because this data is different, we can't have a view --- it'll be inconsistent. This patch changes span.as_doc() to instead return a copy. The docs are updated accordingly. Closes #1537 * Update test for span.as_doc() * Make span.as_doc() return a copy. Closes #1537 * Document change to Span.as_doc()	2018-12-30 15:17:46 +01:00
Sofie	b7916fffcf	Fixing few typos in the documentation (#3103 ) * few typos / small grammatical errors corrected in documentation * one more typo * one last typo	2018-12-28 15:52:26 +01:00
Ines Montani	2dc6c52ccc	Update displayed Binder version (see #3077 ) [ci skip]	2018-12-20 17:36:19 +01:00
Ines Montani	ca244f5f84	Small fixes to displaCy (#3076 ) ## Description - [x] fix auto-detection of Jupyter notebooks (even if `jupyter=True` isn't set) - [x] add `displacy.set_render_wrapper` method to define a custom function called around the HTML markup generated in all calls to `displacy.render` (can be used to allow custom integrations, callbacks and page formatting) - [x] add option to customise host for web server - [x] show warning if `displacy.serve` is called from within Jupyter notebooks - [x] move error message to `spacy.errors.Errors`. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-12-20 17:32:04 +01:00
Ines Montani	61d09c481b	Merge branch 'master' into develop	2018-12-18 13:48:10 +01:00
Ines Montani	8c0f0f50bc	Use nlp.make_doc instead of nlp for patterns [ci skip]	2018-12-08 11:56:01 +01:00
Aki Ariga	7fcd6419ff	Upadate the document for Unidic link with latest version URL (#3022 ) * Upadate Unidic link for latest version in document This patch improves #3017 . The link for Unidic was old version one, so will the lates version. * Add contributor agreement * Use more specific link for unidic-cwj	2018-12-07 17:24:48 +01:00
Ines Montani	27905a7b14	Remove reference to cuda10 in docs (closes #2894 ) [ci skip]	2018-12-06 16:05:37 +01:00
Gavriel Loria	9c8c4287bf	Accept iob2 and allow generic whitespace (#2999 ) * accept non-pipe whitespace as delimiter; allow iob2 filename * added small documentation note for IOB2 allowance * added contributor agreement	2018-12-06 15:50:25 +01:00
Paul O'Leary McCann	b36f6eabfb	Add note that Unidic is required for Japanese (#3017 ) This addresses #3001. -POLM	2018-12-06 15:14:10 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Gavriel Loria	919729d38c	replace user-facing references to "sbd" with "sentencizer" (#2985 ) ## Description Fixes #2693 Previously, the tokens `sbd` and `sentencizer` would create the same nlp pipe. Internally, both would be called `sbd`. This setup became problematic because it was hard for a user relying on the `sentencizer` pipe name to realize that their pipe's name would be `sbd` for all functions other than creating a pipe. This PR intends to change the API and API documentation to fully support `sentencizer` and drop any user-facing references to `sbd`. ### Types of change end-user API bug ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-11-30 21:22:40 +01:00
Ines Montani	add6469225	Add "new in v2.0.12" note to Span.ents (closes #2986 )	2018-11-30 20:50:55 +01:00
Ines Montani	37c7c85a86	💫 New JSON helpers, training data internals & CLI rewrite (#2932 ) * Support nowrap setting in util.prints * Tidy up and fix whitespace * Simplify script and use read_jsonl helper * Add JSON schemas (see #2928) * Deprecate Doc.print_tree Will be replaced with Doc.to_json, which will produce a unified format * Add Doc.to_json() method (see #2928) Converts Doc objects to JSON using the same unified format as the training data. Method also supports serializing selected custom attributes in the doc._. space. * Remove outdated test * Add write_json and write_jsonl helpers * WIP: Update spacy train * Tidy up spacy train * WIP: Use wasabi for formatting * Add GoldParse helpers for JSON format * WIP: add debug-data command * Fix typo * Add missing import * Update wasabi pin * Add missing import * 💫 Refactor CLI (#2943) To be merged into #2932. ## Description - [x] refactor CLI To use [`wasabi`](https://github.com/ines/wasabi) - [x] use [`black`](https://github.com/ambv/black) for auto-formatting - [x] add `flake8` config - [x] move all messy UD-related scripts to `cli.ud` - [x] make converters function that take the opened file and return the converted data (instead of having them handle the IO) ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Update wasabi pin * Delete old test * Update errors * Fix typo * Tidy up and format remaining code * Fix formatting * Improve formatting of messages * Auto-format remaining code * Add tok2vec stuff to spacy.train * Fix typo * Update wasabi pin * Fix path checks for when train() is called as function * Reformat and tidy up pretrain script * Update argument annotations * Raise error if model language doesn't match lang * Document new train command	2018-11-30 20:16:14 +01:00
wxv	06820ef6e7	Fix is_ascii documentation and create contributor file (#2988 ) Proposed in #2933	2018-11-30 15:57:58 +01:00
Ben Batorsky	658f7e0dc8	OntoNotes url fix (#2981 ) The website for OntoNotes 5 is: https://catalog.ldc.upenn.edu/LDC2013T19, currently the named entity section has it as https://catalog.ldc.upenn.edu/ldc2013T19.	2018-11-29 19:34:30 +01:00
Ines Montani	d33953037e	💫 Port master changes over to develop (#2979 ) * Create aryaprabhudesai.md (#2681) * Update _install.jade (#2688) Typo fix: "models" -> "model" * Add FAC to spacy.explain (resolves #2706) * Remove docstrings for deprecated arguments (see #2703) * When calling getoption() in conftest.py, pass a default option (#2709) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement * update bengali token rules for hyphen and digits (#2731) * Less norm computations in token similarity (#2730) * Less norm computations in token similarity * Contributor agreement * Remove ')' for clarity (#2737) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know. * added contributor agreement for mbkupfer (#2738) * Basic support for Telugu language (#2751) * Lex _attrs for polish language (#2750) * Signed spaCy contributor agreement * Added polish version of english lex_attrs * Introduces a bulk merge function, in order to solve issue #653 (#2696) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions * Describe converters more explicitly (see #2643) * Add multi-threading note to Language.pipe (resolves #2582) [ci skip] * Fix formatting * Fix dependency scheme docs (closes #2705) [ci skip] * Don't set stop word in example (closes #2657) [ci skip] * Add words to portuguese language _num_words (#2759) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Update Indonesian model (#2752) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file * Fixed spaCy+Keras example (#2763) * bug fixes in keras example * created contributor agreement * Adding French hyphenated first name (#2786) * Fix typo (closes #2784) * Fix typo (#2795) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer" * Adding basic support for Sinhala language. (#2788) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement * Also include lowercase norm exceptions * Fix error (#2802) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement * Add charlax's contributor agreement (#2805) * agreement of contributor, may I introduce a tiny pl languge contribution (#2799) * Contributors agreement * Contributors agreement * Contributors agreement * Add jupyter=True to displacy.render in documentation (#2806) * Revert "Also include lowercase norm exceptions" This reverts commit `70f4e8adf3`. * Remove deprecated encoding argument to msgpack * Set up dependency tree pattern matching skeleton (#2732) * Fix bug when too many entity types. Fixes #2800 * Fix Python 2 test failure * Require older msgpack-numpy * Restore encoding arg on msgpack-numpy * Try to fix version pin for msgpack-numpy * Update Portuguese Language (#2790) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language * Correct error in spacy universe docs concerning spacy-lookup (#2814) * Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround * Fix typo (closes #2815) [ci skip] * Update regex version dependency * Set version to 2.0.13.dev3 * Skip seemingly problematic test * Remove problematic test * Try previous version of regex * Revert "Remove problematic test" This reverts commit `bdebbef455`. * Unskip test * Try older version of regex * 💫 Update training examples and use minibatching (#2830) <!--- Provide a general summary of your changes in the title. --> ## Description Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results. ### Types of change enhancements ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Visual C++ link updated (#2842) (closes #2841) [ci skip] * New landing page * Add contribution agreement * Correcting lang/ru/examples.py (#2845) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file * Set version to 2.0.13.dev4 * Add Persian(Farsi) language support (#2797) * Also include lowercase norm exceptions * Remove in favour of https://github.com/explosion/spaCy/graphs/contributors * Rule-based French Lemmatizer (#2818) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> - Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version. - Add several files containing exhaustive list of words for each part of speech - Add some lemma rules - Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX - Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned - Modify the lemmatize function to check in lookup table as a last resort - Init files are updated so the model can support all the functionalities mentioned above - Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [X] I have submitted the spaCy Contributor Agreement. - [X] I ran the tests, and all new and existing tests passed. - [X] My changes don't require a change to the documentation, or if they do, I've added all required information. * Set version to 2.0.13 * Fix formatting and consistency * Update docs for new version [ci skip] * Increment version [ci skip] * Add info on wheels [ci skip] * Adding "This is a sentence" example to Sinhala (#2846) * Add wheels badge * Update badge [ci skip] * Update README.rst [ci skip] * Update murmurhash pin * Increment version to 2.0.14.dev0 * Update GPU docs for v2.0.14 * Add wheel to setup_requires * Import prefer_gpu and require_gpu functions from Thinc * Add tests for prefer_gpu() and require_gpu() * Update requirements and setup.py * Workaround bug in thinc require_gpu * Set version to v2.0.14 * Update push-tag script * Unhack prefer_gpu * Require thinc 6.10.6 * Update prefer_gpu and require_gpu docs [ci skip] * Fix specifiers for GPU * Set version to 2.0.14.dev1 * Set version to 2.0.14 * Update Thinc version pin * Increment version * Fix msgpack-numpy version pin * Increment version * Update version to 2.0.16 * Update version [ci skip] * Redundant ')' in the Stop words' example (#2856) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. * Documentation improvement regarding joblib and SO (#2867) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * raise error when setting overlapping entities as doc.ents (#2880) * Fix out-of-bounds access in NER training The helper method state.B(1) gets the index of the first token of the buffer, or -1 if no such token exists. Normally this is safe because we pass this to functions like state.safe_get(), which returns an empty token. Here we used it directly as an array index, which is not okay! This error may have been the cause of out-of-bounds access errors during training. Similar errors may still be around, so much be hunted down. Hunting this one down took a long time...I printed out values across training runs and diffed, looking for points of divergence between runs, when no randomness should be allowed. * Change PyThaiNLP Url (#2876) * Fix missing comma * Add example showing a fix-up rule for space entities * Set version to 2.0.17.dev0 * Update regex version * Revert "Update regex version" This reverts commit `62358dd867`. * Try setting older regex version, to align with conda * Set version to 2.0.17 * Add spacy-js to universe [ci-skip] * Add spacy-raspberry to universe (closes #2889) * Add script to validate universe json [ci skip] * Removed space in docs + added contributor indo (#2909) * - removed unneeded space in documentation * - added contributor info * Allow input text of length up to max_length, inclusive (#2922) * Include universe spec for spacy-wordnet component (#2919) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement * Minor formatting changes [ci skip] * Fix image [ci skip] Twitter URL doesn't work on live site * Check if the word is in one of the regular lists specific to each POS (#2886) * 💫 Create random IDs for SVGs to prevent ID clashes (#2927) Resolves #2924. ## Description Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.) ### Types of change bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix typo [ci skip] * fixes symbolic link on py3 and windows (#2949) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com> * Fix formatting * Update universe [ci skip] * Catalan Language Support (#2940) * Catalan language Support * Ddding Catalan to documentation * Sort languages alphabetically [ci skip] * Update tests for pytest 4.x (#2965) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize)) - [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here) ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix regex pin to harmonize with conda (#2964) * Update README.rst * Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) Fixes #2976 * Fix typo * Fix typo * Remove duplicate file * Require thinc 7.0.0.dev2 Fixes bug in gpu_ops that would use cupy instead of numpy on CPU * Add missing import * Fix error IDs * Fix tests	2018-11-29 16:30:29 +01:00
Ines Montani	c80c20e1ec	Sort languages alphabetically [ci skip]	2018-11-26 15:37:53 +01:00
Marc Puig	98fe1ab259	Catalan Language Support (#2940 ) * Catalan language Support * Ddding Catalan to documentation	2018-11-26 15:25:47 +01:00
Ines Montani	1844bc238a	Update universe [ci skip]	2018-11-26 14:16:22 +01:00
Ines Montani	696acb0f92	Fix typo [ci skip]	2018-11-24 15:20:57 +01:00
Ines Montani	dfcc8f02af	Fix image [ci skip] Twitter URL doesn't work on live site	2018-11-14 01:01:33 +01:00
Ines Montani	1aa91e926f	Minor formatting changes [ci skip]	2018-11-13 23:59:59 +01:00
Francisco Aranda	be99f1cac5	Include universe spec for spacy-wordnet component (#2919 ) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement	2018-11-13 23:54:46 +01:00
mikelibg	75e7d503b7	Removed space in docs + added contributor indo (#2909 ) * - removed unneeded space in documentation * - added contributor info	2018-11-08 14:18:25 +01:00
Ines Montani	11db4d2f27	Add script to validate universe json [ci skip]	2018-11-06 12:50:41 +01:00
Ines Montani	a9fda638a9	Add spacy-raspberry to universe (closes #2889 )	2018-11-06 12:45:50 +01:00
Ines Montani	c235ddf44f	Add spacy-js to universe [ci-skip]	2018-11-06 12:45:03 +01:00
Bram Vanroy	071789467e	Documentation improvement regarding joblib and SO (#2867 ) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-24 15:19:17 +02:00
Roman	5766d09a5b	Redundant ')' in the Stop words' example (#2856 ) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-18 10:21:16 +02:00
Ines Montani	c6a320cad4	Update version [ci skip]	2018-10-15 16:42:35 +02:00
Ines Montani	f02bb08f39	Update prefer_gpu and require_gpu docs [ci skip]	2018-10-14 23:30:44 +02:00
Ines Montani	5a4c5b78a8	Update GPU docs for v2.0.14	2018-10-14 16:38:12 +02:00
Ines Montani	ac4cadd31d	Add info on wheels [ci skip]	2018-10-14 00:04:37 +02:00
Ines Montani	30aa7f8b20	Increment version [ci skip]	2018-10-13 23:55:50 +02:00
Ines Montani	23d5b4ff5b	Update docs for new version [ci skip]	2018-10-13 23:53:33 +02:00
Ines Montani	f0e7da6478	Fix formatting and consistency	2018-10-13 23:53:26 +02:00
Jacopo Farina	42c42376a3	Visual C++ link updated (#2842 ) (closes #2841 ) [ci skip] * New landing page * Add contribution agreement	2018-10-12 14:59:45 +02:00
Ines Montani	7806deceb4	Fix typo (closes #2815 ) [ci skip]	2018-10-01 10:49:29 +02:00
Ioannis Daras	405a826436	Correct error in spacy universe docs concerning spacy-lookup (#2814 )	2018-10-01 10:24:50 +02:00
Charles-Axel Dein	014dd47c70	Add jupyter=True to displacy.render in documentation (#2806 )	2018-09-27 12:28:04 +02:00
Pranshu Jethmalani	9fd27d777e	Fix typo (#2795 ) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer"	2018-09-25 12:12:40 +02:00
Ines Montani	3c4e3ade30	Fix typo (closes #2784 )	2018-09-21 10:45:11 +02:00
Ines Montani	5001d31be6	Don't set stop word in example (closes #2657 ) [ci skip]	2018-09-12 15:36:51 +02:00
Ines Montani	4e89cfaae1	Fix dependency scheme docs (closes #2705 ) [ci skip]	2018-09-12 15:32:26 +02:00
Ines Montani	0729d1edca	Fix formatting	2018-09-12 15:32:08 +02:00
Ines Montani	907df53904	Add multi-threading note to Language.pipe (resolves #2582 ) [ci skip]	2018-09-12 15:03:30 +02:00
Ines Montani	885691a7ab	Describe converters more explicitly (see #2643 )	2018-09-12 14:53:03 +02:00
Steve Sharp	ca747f58a4	Update _install.jade (#2688 ) Typo fix: "models" -> "model"	2018-08-22 13:16:04 +02:00
Ines Montani	aeb49eb625	Update version [ci skip]	2018-08-16 16:56:02 +02:00
Ines Montani	a0eacd3293	Merge branch 'master' into develop	2018-08-16 16:55:05 +02:00
Ines Montani	c0fa9903f4	Update model directory JS [ci skip] Prevent the default release URL from being overwritten and add license type	2018-08-16 16:54:50 +02:00
Ines Montani	03f661fefb	Add Greek to models directory [ci skip]	2018-08-16 16:51:56 +02:00
Ines Montani	fd9d175a53	Update live code [ci skip]	2018-08-15 15:28:48 +02:00
Matthew Honnibal	4336397ecb	Update develop from master	2018-08-14 03:04:28 +02:00
Wojciech Łukasiewicz	3953e967a0	User correct variable name in the examples (#2664 ) * correct naming * add contributor agreement	2018-08-13 22:21:24 +02:00
Ines Montani	71723cece1	Add note on visualizing long texts ans sentences (see #2636 ) [ci skip]	2018-08-08 15:28:21 +02:00
Ines Montani	6147bd3eb4	Fix link target (closes #2645 ) [ci skip]	2018-08-08 15:03:52 +02:00
Ines Montani	8c47da1f19	Update Language serialization docs (see #2628 ) [ci skip] Add note on using from_disk and from_bytes via subclasses and add example	2018-08-07 14:17:57 +02:00
Matthew Honnibal	664cfc29bc	Merge branch 'master' of https://github.com/explosion/spaCy	2018-08-07 10:49:39 +02:00
Matthew Honnibal	2278c9734e	Fix spelling error #2640	2018-08-07 10:49:21 +02:00
Xiaoquan Kong	f0c9652ed1	New Feature: display more detail when Error E067 (#2639 ) * Fix off-by-one error * Add verbose option * Update verbose option * Update documents for verbose option	2018-08-07 10:45:29 +02:00
Ines Montani	6a4360e425	Update universe [ci skip]	2018-08-02 17:33:08 +02:00
Sami	dbc993f5b3	Updating description and code snippet spacy-lefff (#2623 ) * updating description and code snippet spacy-lefff * contributors agreement	2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav	d3e21aad64	Update _benchmarks.jade (#2618 )	2018-08-02 00:28:28 +02:00
Brian Phillips	8227de0099	Update language.jade (#2616 )	2018-07-31 12:34:42 +02:00
Ioannis Daras	055cc0de44	Bug fix to pseudocode for tokenizer customization (#2604 )	2018-07-27 11:04:12 +02:00
Andriy Mulyar	e9ef51137d	Fixed typo (#2596 ) Changed 'The index of the first character after the span.' to The index of the last character after the span' in description of doc.char_span	2018-07-25 22:17:15 +02:00
Ines Montani	75f3234404	💫 Refactor test suite (#2568 ) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-24 23:38:44 +02:00
kororo	b1ec827ee0	Fix typo (#2579 ) Update slogan, desc and code snippet to latest version	2018-07-24 22:47:33 +02:00
ines	cd687091fb	Remove nl examples from widget for now [ci skip] Restore for next spaCy version when path to example sentences is fixed	2018-07-24 22:41:20 +02:00
ines	2d8ffb8bcd	Fix formatting	2018-07-24 22:40:49 +02:00
ines	1b3da8d2ae	Update website for v2.0.12 [ci skip]	2018-07-24 21:04:22 +02:00
ines	ae5ed2d698	Update docs for v2.0.12 [ci skip]	2018-07-21 15:51:44 +02:00
ines	d517dd4297	Document remove_extension methods	2018-07-21 15:51:28 +02:00
ines	153f41a5cc	Use better examples for Doc extension methods	2018-07-21 15:51:11 +02:00
ines	3c30d1763c	Merge branch 'master' into develop	2018-07-21 15:34:18 +02:00
kororo	2784babef9	Add ExcelCy into Universe list (#2572 ) Hi guys, This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made. ## Description ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe. ### Types of change Update to Universe list in website. ## Checklist - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-19 19:28:33 +02:00
ines	80e7485630	Merge branch 'master' into develop	2018-07-18 17:28:47 +02:00
Xiang Ji	19a5ef1c58	Fix venv command examples (#2560 ) [ci skip] * Fix venv command examples The documentation refers to `venv`, which is native to Python3. However, the command examples are as if they were still `virtualenv`, which is a package independent of `venv`: - It doesn't need to be installed via `pip`. In fact `pip install venv` would return an error. - The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would return command not found. See https://docs.python.org/3/library/venv.html I suspect the documentation simply replaced all occurrences of `virtualenv` with `venv`. However they are different modules and are used differently. * Update comment [ci skip]	2018-07-18 10:31:24 +02:00
ines	50c367ee96	Update meta [ci skip]	2018-07-10 13:51:45 +02:00
ines	3a321e79ac	Merge branch 'master' into develop	2018-07-10 13:49:08 +02:00
ines	71bfc92913	Exclude models for non-stable versions [ci skip]	2018-07-10 13:44:55 +02:00
ines	b5200962c0	Adjust formatting [ci skip]	2018-07-09 18:35:46 +02:00
Alex Villarreal	bd35bf7f09	Guidance to handle binary files in git in Windows (#2526 ) Adds guidance on what to do if users encounter the error described in [1634](https://github.com/explosion/spaCy/issues/1634), which probably only happens in Windows environments.	2018-07-09 18:31:37 +02:00
ines	f575b01595	Update language and license meta [ci skip]	2018-07-04 15:09:36 +02:00
ines	63666af328	Merge branch 'master' into develop	2018-07-04 14:52:25 +02:00
Matthew Honnibal	a85620a731	Note CoreNLP tokenizer correction on website	2018-07-02 11:35:31 +02:00
ines	06c6dc6fbc	Update Juniper [ci skip]	2018-06-28 11:48:17 +02:00
Nipun Sadvilkar	741ba80bd5	Train model command n_iteration 20 -> 30 (#2454 ) In source code `train.py` default Number of iterations is 30	2018-06-18 11:57:08 +02:00
ines	53a2bc8c8d	Only scroll sidebar item into view if needed [ci skip]	2018-06-12 10:58:50 +02:00
ines	65713a6593	Increment versions [ci skip]	2018-06-12 10:49:50 +02:00
Ines Montani	968f6f0bda	💫 Document Cython API (#2433 ) ## Description This PR adds the most relevant documentation of spaCy's Cython API. (Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.) ### Types of change docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-06-11 17:47:46 +02:00
GolanLevy	72d7e80f94	adding a missing apostrophe (#2436 )	2018-06-11 17:47:24 +02:00
ines	778e5f4da3	Merge branch 'master' into develop	2018-06-11 00:38:04 +02:00
himkt	57311d5d47	replace janome with mecab in the documentation and the test (#2415 ) * Add links to Reddit data (see #2401) * replace janome with mecab in the documentation and the test * add the assignment	2018-06-11 00:33:13 +02:00
ines	effb55d591	Adjust formatting [ci skip]	2018-06-11 00:29:13 +02:00
Nathan Breit	ba6d2cf393	Add EpiTator to Universe (#2429 )	2018-06-11 00:24:13 +02:00
himkt	1a568f2e08	fix wrong documentations (#2423 )	2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi	d66292f767	fix UD data file extensions (#2425 ) * fix UD data files extension * add contributor agreement for msklvsk	2018-06-08 14:26:11 +02:00
ines	a0017e4909	Merge branch 'master' into develop	2018-05-30 14:10:47 +02:00
ines	0baaf836cf	Update formatting [ci skip]	2018-05-30 13:32:49 +02:00
ines	3913e18201	Add self-attentive-parser to universe (see #59 )	2018-05-30 13:31:28 +02:00
ines	4a62486340	Merge branch 'master' into develop	2018-05-30 13:01:01 +02:00
ines	605c663a4c	Fix HTML merger examples (see #2390 )	2018-05-30 12:22:32 +02:00
ines	d0b16aa014	Update list of languages	2018-05-26 18:56:26 +02:00
Samuel Pouyt	5f988b8e9c	Update _custom.jade (#2372 ) It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`	2018-05-26 18:17:12 +02:00
ines	d84a830d79	Merge branch 'master' of https://github.com/explosion/spaCy	2018-05-26 17:57:05 +02:00
ines	fb923b31ea	Fix bad HTML example (see #2376 ) and turn it into section on matcher + components Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.	2018-05-26 17:57:02 +02:00
Shantam Raj	592834183a	corrected spelling (#2359 ) changed interpretted to interpreted	2018-05-24 13:29:52 +02:00
ines	8adb967e0c	Fix from source quickstart instructions for Windows See: https://stackoverflow.com/a/50478036/6400719	2018-05-24 12:42:16 +02:00
Shantam Raj	1a4682dd0b	Update _training.jade (#2340 ) * Update _training.jade Correcting grammar. Replacing "The" with "To". * Create armsp.md * Update armsp.md	2018-05-21 11:09:33 +02:00
ines	ff1082d8e4	Add version tag in CLI docs [ci skip]	2018-05-21 01:17:49 +02:00
Ines Montani	d4cc736b7c	💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346 ) * Go back to using requests instead of urllib (closes #2320) Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey. * Only download model if not installed (see #1456) Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience. * Pass additional options to pip when installing model (resolves #1456) Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example: python -m spacy download en --user * Add CLI option to enable installing model package dependencies * Revert "Add CLI option to enable installing model package dependencies" This reverts commit `9336ffe695`. * Update documentation	2018-05-20 20:26:56 +02:00
vishnumenon	ae3719ece5	Fix the code for FACILITIY entities (#2324 ) * Fix the code for FACILITIY entities As far as I can tell, the default models all use "FAC" rather than "FACILITY" * Added my Contributor Agreement * Rename vishnumenon to vishnumenon.md	2018-05-12 15:19:17 +02:00
ines	ac25bc4016	Add docs section on sentence segmentation [ci skip]	2018-05-07 21:25:20 +02:00
ines	14148cd147	Fix formatting and wording	2018-05-07 21:24:35 +02:00
ines	f803da609f	Add scattertext [ci skip]	2018-05-07 19:10:23 +02:00
ines	c9547b7b8b	Update Juniper (see #2293 )	2018-05-03 15:36:02 +02:00
Alex Villarreal	647f2544c5	Fix code sample for span.set_extension (#2286 )	2018-05-03 00:39:22 +02:00
Alex Villarreal	13d562e1a4	Fix code sample for Doc.set_extension (#2282 ) * Fix code sample for `set_extension` The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text. * Contributor agreement	2018-05-02 10:16:05 +02:00
Shirish Kadam	d98a90440f	Added Adam project to spaCy Universe (#2275 ) * Added 5hirish to contributors * Added Adam Qas Project to spaCy Universe * Remove $ from code example	2018-04-30 22:25:01 +02:00
ines	56e7faf16b	Fix spacing	2018-04-30 22:24:40 +02:00
ines	6efb4cdf88	Use Juniper and tidy up	2018-04-30 18:48:35 +02:00
ines	45bb8d75a5	Fix overflow issues on small screens [ci skip]	2018-04-29 03:17:36 +02:00
Ines Montani	49cee4af92	💫 Interactive code examples, spaCy Universe and various docs improvements (#2274 ) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label	2018-04-29 02:06:46 +02:00
ines	a512fa60ef	Remove upcoming option from docs for now	2018-04-28 23:32:18 +02:00
ines	6fb6371670	Add collapse_phrases option to displacy (closes #2266 )	2018-04-28 23:06:50 +02:00
Matt Upson	87cc6b3599	Add missing comma to NN example in docs (#2255 ) Also add a completed contributor agreement.	2018-04-28 14:56:00 +02:00

1 2 3 4 5 ...

1275 Commits