* Fix surprises when asking for the root of a git repo
In the case of the first asset I wanted to get from git, the data I
wanted was the entire repository. I tried leaving "path" blank, which
gave a less-than-helpful error, and then I tried `path: "/"`, which
started copying my entire filesystem into the project. The path I should
have used was "".
I've made two changes to make this smoother for others:
- The 'path' within a git clone defaults to ""
- If the path points outside of the tmpdir that the git clone goes
into, we fail with an error
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* use a descriptive error instead of a default
plus some minor fixes from PR review
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* check for None values in assets
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
* Add textcat docs
* Add NER docs
* Add Entity Linker docs
* Add assigned fields docs for the tagger
This also adds a preamble, since there wasn't one.
* Add morphologizer docs
* Add dependency parser docs
* Update entityrecognizer docs
This is a little weird because `Doc.ents` is the only thing assigned to,
but it's actually a bidirectional property.
* Add token fields for entityrecognizer
* Fix section name
* Add entity ruler docs
* Add lemmatizer docs
* Add sentencizer/recognizer docs
* Update website/docs/api/entityrecognizer.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/entityruler.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/tagger.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/entityruler.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update type for Doc.ents
This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be
correct.
* Run prettier
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Run prettier
* Add transformers section
This basically just moves and renames the "custom attributes" section
from the bottom of the page to be consistent with "assigned attributes"
on other pages.
I looked at moving the paragraph just above the section into the
section, but it includes the unrelated registry additions, so it seemed
better to leave it unchanged.
* Make table header consistent
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* test for error after Doc has been garbage collected
* warn about using a SpanGroup when the Doc has been garbage collected
* add warning to the docs
* rephrase slightly
* raise error instead of warning
* update
* move warning to doc property
* Add training data section
Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?
* Add pointer from binary format to training data section
* Minor cleanup
* Add to ToC, fix filename
* Update website/docs/usage/training.md
Co-authored-by: Ines Montani <ines@ines.io>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Move the training data section further down the page
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/usage/training.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Run prettier
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow passing in array vars for speedup
This fixes#8845. Not sure about the docstring changes here...
* Update docs
Types maybe need more detail? Maybe not?
* Run prettier on docs
* Update spacy/tokens/span.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add scores to output in spancat
This exposes the scores as an attribute on the SpanGroup. Includes a
basic test.
* Add basic doc note
* Vectorize score calcs
* Add "annotation format" section
* Update website/docs/api/spancategorizer.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Clean up doc section
* Ran prettier on docs
* Get arrays off the gpu before iterating over them
* Remove int() calls
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Support list values and IS_INTERSECT in Matcher
* Support list values as token attributes for set operators, not just as
pattern values.
* Add `IS_INTERSECT` operator.
* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.
* Rename IS_INTERSECT to INTERSECTS