Some documentation improvements
## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)
### Types of change
Documentation
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
<!--- Provide a general summary of your changes in the title. -->
## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->
### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
## Description
Related issues: #2379 (should be fixed by separating model tests)
* **total execution time down from > 300 seconds to under 60 seconds** 🎉
* removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure
* changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version)
* merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways)
* tidied up and rewrote existing tests wherever possible
### Todo
- [ ] move tests to `/tests` and adjust CI commands accordingly
- [x] move model test suite from internal repo to `spacy-models`
- [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~
- [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted
- [ ] update documentation on how to run tests
### Types of change
enhancement, tests
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
Hi guys,
This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.
## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.
### Types of change
Update to Universe list in website.
## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix venv command examples
The documentation refers to `venv`, which is native to Python3.
However, the command examples are as if they were still `virtualenv`,
which is a package independent of `venv`:
- It doesn't need to be installed via `pip`. In fact `pip install venv` would
return an error.
- The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would
return command not found.
See https://docs.python.org/3/library/venv.html
I suspect the documentation simply replaced all occurrences of `virtualenv` with
`venv`. However they are different modules and are used differently.
* Update comment [ci skip]
## Description
This PR adds the most relevant documentation of spaCy's Cython API.
(Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.)
### Types of change
docs
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Go back to using requests instead of urllib (closes#2320)
Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.
* Only download model if not installed (see #1456)
Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.
* Pass additional options to pip when installing model (resolves#1456)
Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:
python -m spacy download en --user
* Add CLI option to enable installing model package dependencies
* Revert "Add CLI option to enable installing model package dependencies"
This reverts commit 9336ffe695.
* Update documentation
* Fix the code for FACILITIY entities
As far as I can tell, the default models all use "FAC" rather than "FACILITY"
* Added my Contributor Agreement
* Rename vishnumenon to vishnumenon.md
* Fix code sample for `set_extension`
The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.
* Contributor agreement
* Integrate Python kernel via Binder
* Add live model test for languages with examples
* Update docs and code examples
* Adjust margin (if not bootstrapped)
* Add binder version to global config
* Update terminal and executable code mixins
* Pass attributes through infobox and section
* Hide v-cloak
* Fix example
* Take out model comparison for now
* Add meta text for compat
* Remove chart.js dependency
* Tidy up and simplify JS and port big components over to Vue
* Remove chartjs example
* Add Twitter icon
* Add purple stylesheet option
* Add utility for hand cursor (special cases only)
* Add transition classes
* Add small option for section
* Add thumb object for small round thumbnail images
* Allow unset code block language via "none" value
(workaround to still allow unset language to default to DEFAULT_SYNTAX)
* Pass through attributes
* Add syntax highlighting definitions for Julia, R and Docker
* Add website icon
* Remove user survey from navigation
* Don't hide GitHub icon on small screens
* Make top navigation scrollable on small screens
* Remove old resources page and references to it
* Add Universe
* Add helper functions for better page URL and title
* Update site description
* Increment versions
* Update preview images
* Update mentions of resources
* Fix image
* Fix social images
* Fix problem with cover sizing and floats
* Add divider and move badges into heading
* Add docstrings
* Reference converting section
* Add section on converting word vectors
* Move converting section to custom section and fix formatting
* Remove old fastText example
* Move extensions content to own section
Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)
* Use better component example and add factories section
* Add note on larger model
* Use better example for non-vector
* Remove similarity in context section
Only works via small models with tensors so has always been kind of confusing
* Add note on init-model command
* Fix lightning tour examples and make excutable if possible
* Add spacy train CLI section to train
* Fix formatting and add video
* Fix formatting
* Fix textcat example description (resolves#2246)
* Add dummy file to try resolve conflict
* Delete dummy file
* Tidy up [ci skip]
* Ensure sufficient height of loading container
* Add loading animation to universe
* Update Thebelab build and use better startup message
* Fix asset versioning
* Fix typo [ci skip]
* Add note on project idea label
aclweb.org is throwing a gateway timeout on the link as `https`+`aclweb.org`, but is fine with `https`+`www.aclweb.org` (also with `http`+`aclweb.org`, but let's keep it in `https`, shall we?
really small changes to English tags description, but might help some people while working on projects
1) -PRB- should be -RRB- instead
2) space gets tagged as _SP, and not SP
The function `dependency_labels_to_root(token)` defined in section *Get syntactic dependencies* does not terminate. Here is a complete example:
import spacy
nlp = spacy.load('en')
doc = nlp("Apple and banana are similar. Pasta and hippo aren't.")
def dependency_labels_to_root(token):
"""Walk up the syntactic tree, collecting the arc labels."""
dep_labels = []
while token.head is not token:
dep_labels.append(token.dep)
token = token.head
return dep_labels
dep_labels = dependency_labels_to_root(doc[1])
dep_labels
Replacing `is not` with `!=` solves the issue:
import spacy
nlp = spacy.load('en')
doc = nlp("Apple and banana are similar. Pasta and hippo aren't.")
def dependency_labels_to_root(token):
"""Walk up the syntactic tree, collecting the arc labels."""
dep_labels = []
while token.head != token:
dep_labels.append(token.dep)
token = token.head
return dep_labels
dep_labels = dependency_labels_to_root(doc[1])
dep_labels
The output is
['cc', 'nsubj']
Python 3 throws an error message on the original assert statement. Also, according to the Python documentation regarding the assert statement (https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement), `assert` takes at least one argument and at most two. In the two-argument form the second argument is meant as an error message to be displayed when the assertion fails. I don't think this is intended in this case.
I'm using SpaCy version 2.0.3. If I don't use the *-operator in the example, Python throws an error message. With the operator it works fine. Also according to the documentation of the function `nlp.disable_pipes()`, it expects one or more strings as arguments and not one argument being a list of strings.
User can select two model and their meta is fetched from GitHub. Features, accuracy figures and speed benchmarks are displayed in a table, with an additional chart comparing the accuracy scores if available. Main use case: demonstrating and visualising trade-offs between larger and smaller models of the same type.
With speed benchmarks, charts ended up taking up too much space – and they were mostly data porn and not particularly useful anyways. Instead, we might add a "Compare" page that fetches all models and lets the user compare two or more models in terms of accuracy, speed etc.
Always add tooltip text as hidden label. Use different tooltip icons for tags and inline help icons. Add labels to old/new code blocks and add option to customise label text.