## Description
Related issues: #2379 (should be fixed by separating model tests)
* **total execution time down from > 300 seconds to under 60 seconds** 🎉
* removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure
* changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version)
* merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways)
* tidied up and rewrote existing tests wherever possible
### Todo
- [ ] move tests to `/tests` and adjust CI commands accordingly
- [x] move model test suite from internal repo to `spacy-models`
- [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~
- [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted
- [ ] update documentation on how to run tests
### Types of change
enhancement, tests
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
Hi guys,
This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.
## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.
### Types of change
Update to Universe list in website.
## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Fix venv command examples
The documentation refers to `venv`, which is native to Python3.
However, the command examples are as if they were still `virtualenv`,
which is a package independent of `venv`:
- It doesn't need to be installed via `pip`. In fact `pip install venv` would
return an error.
- The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would
return command not found.
See https://docs.python.org/3/library/venv.html
I suspect the documentation simply replaced all occurrences of `virtualenv` with
`venv`. However they are different modules and are used differently.
* Update comment [ci skip]
## Description
This PR adds the most relevant documentation of spaCy's Cython API.
(Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.)
### Types of change
docs
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
* Go back to using requests instead of urllib (closes#2320)
Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.
* Only download model if not installed (see #1456)
Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.
* Pass additional options to pip when installing model (resolves#1456)
Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:
python -m spacy download en --user
* Add CLI option to enable installing model package dependencies
* Revert "Add CLI option to enable installing model package dependencies"
This reverts commit 9336ffe695.
* Update documentation
* Fix the code for FACILITIY entities
As far as I can tell, the default models all use "FAC" rather than "FACILITY"
* Added my Contributor Agreement
* Rename vishnumenon to vishnumenon.md
* Fix code sample for `set_extension`
The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text.
* Contributor agreement
* Integrate Python kernel via Binder
* Add live model test for languages with examples
* Update docs and code examples
* Adjust margin (if not bootstrapped)
* Add binder version to global config
* Update terminal and executable code mixins
* Pass attributes through infobox and section
* Hide v-cloak
* Fix example
* Take out model comparison for now
* Add meta text for compat
* Remove chart.js dependency
* Tidy up and simplify JS and port big components over to Vue
* Remove chartjs example
* Add Twitter icon
* Add purple stylesheet option
* Add utility for hand cursor (special cases only)
* Add transition classes
* Add small option for section
* Add thumb object for small round thumbnail images
* Allow unset code block language via "none" value
(workaround to still allow unset language to default to DEFAULT_SYNTAX)
* Pass through attributes
* Add syntax highlighting definitions for Julia, R and Docker
* Add website icon
* Remove user survey from navigation
* Don't hide GitHub icon on small screens
* Make top navigation scrollable on small screens
* Remove old resources page and references to it
* Add Universe
* Add helper functions for better page URL and title
* Update site description
* Increment versions
* Update preview images
* Update mentions of resources
* Fix image
* Fix social images
* Fix problem with cover sizing and floats
* Add divider and move badges into heading
* Add docstrings
* Reference converting section
* Add section on converting word vectors
* Move converting section to custom section and fix formatting
* Remove old fastText example
* Move extensions content to own section
Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)
* Use better component example and add factories section
* Add note on larger model
* Use better example for non-vector
* Remove similarity in context section
Only works via small models with tensors so has always been kind of confusing
* Add note on init-model command
* Fix lightning tour examples and make excutable if possible
* Add spacy train CLI section to train
* Fix formatting and add video
* Fix formatting
* Fix textcat example description (resolves#2246)
* Add dummy file to try resolve conflict
* Delete dummy file
* Tidy up [ci skip]
* Ensure sufficient height of loading container
* Add loading animation to universe
* Update Thebelab build and use better startup message
* Fix asset versioning
* Fix typo [ci skip]
* Add note on project idea label
aclweb.org is throwing a gateway timeout on the link as `https`+`aclweb.org`, but is fine with `https`+`www.aclweb.org` (also with `http`+`aclweb.org`, but let's keep it in `https`, shall we?
really small changes to English tags description, but might help some people while working on projects
1) -PRB- should be -RRB- instead
2) space gets tagged as _SP, and not SP
The function `dependency_labels_to_root(token)` defined in section *Get syntactic dependencies* does not terminate. Here is a complete example:
import spacy
nlp = spacy.load('en')
doc = nlp("Apple and banana are similar. Pasta and hippo aren't.")
def dependency_labels_to_root(token):
"""Walk up the syntactic tree, collecting the arc labels."""
dep_labels = []
while token.head is not token:
dep_labels.append(token.dep)
token = token.head
return dep_labels
dep_labels = dependency_labels_to_root(doc[1])
dep_labels
Replacing `is not` with `!=` solves the issue:
import spacy
nlp = spacy.load('en')
doc = nlp("Apple and banana are similar. Pasta and hippo aren't.")
def dependency_labels_to_root(token):
"""Walk up the syntactic tree, collecting the arc labels."""
dep_labels = []
while token.head != token:
dep_labels.append(token.dep)
token = token.head
return dep_labels
dep_labels = dependency_labels_to_root(doc[1])
dep_labels
The output is
['cc', 'nsubj']
Python 3 throws an error message on the original assert statement. Also, according to the Python documentation regarding the assert statement (https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement), `assert` takes at least one argument and at most two. In the two-argument form the second argument is meant as an error message to be displayed when the assertion fails. I don't think this is intended in this case.
I'm using SpaCy version 2.0.3. If I don't use the *-operator in the example, Python throws an error message. With the operator it works fine. Also according to the documentation of the function `nlp.disable_pipes()`, it expects one or more strings as arguments and not one argument being a list of strings.
User can select two model and their meta is fetched from GitHub. Features, accuracy figures and speed benchmarks are displayed in a table, with an additional chart comparing the accuracy scores if available. Main use case: demonstrating and visualising trade-offs between larger and smaller models of the same type.
With speed benchmarks, charts ended up taking up too much space – and they were mostly data porn and not particularly useful anyways. Instead, we might add a "Compare" page that fetches all models and lets the user compare two or more models in terms of accuracy, speed etc.