spaCy/spacy/tokens/span.pyi
Adriane Boyd a322d6d5f2
Add SpanRuler component (#9880)
* Add SpanRuler component

Add a `SpanRuler` component similar to `EntityRuler` that saves a list
of matched spans to `Doc.spans[spans_key]`. The matches from the token
and phrase matchers are deduplicated and sorted before assignment but
are not otherwise filtered.

* Update spacy/pipeline/span_ruler.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix cast

* Add self.key property

* Use number of patterns as length

* Remove patterns kwarg from init

* Update spacy/tests/pipeline/test_span_ruler.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add options for spans filter and setting to ents

* Add `spans_filter` option as a registered function'
* Make `spans_key` optional and if `None`, set to `doc.ents` instead of
`doc.spans[spans_key]`.

* Update and generalize tests

* Add test for setting doc.ents, fix key property type

* Fix typing

* Allow independent doc.spans and doc.ents

* If `spans_key` is set, set `doc.spans` with `spans_filter`.
* If `annotate_ents` is set, set `doc.ents` with `ents_fitler`.
  * Use `util.filter_spans` by default as `ents_filter`.
  * Use a custom warning if the filter does not work for `doc.ents`.

* Enable use of SpanC.id in Span

* Support id in SpanRuler as Span.id

* Update types

* `id` can only be provided as string (already by `PatternType`
definition)

* Update all uses of Span.id/ent_id in Doc

* Rename Span id kwarg to span_id

* Update types and docs

* Add ents filter to mimic EntityRuler overwrite_ents

* Refactor `ents_filter` to take `entities, spans` args for more
  filtering options
* Give registered filters more descriptive names
* Allow registered `filter_spans` filter
  (`spacy.first_longest_spans_filter.v1`) to take any number of
  `Iterable[Span]` objects as args so it can be used for spans filter
  or ents filter

* Implement future entity ruler as span ruler

Implement a compatible `entity_ruler` as `future_entity_ruler` using
`SpanRuler` as the underlying component:
* Add `sort_key` and `sort_reverse` to allow the sorting behavior to be
  customized. (Necessary for the same sorting/filtering as in
  `EntityRuler`.)
* Implement `overwrite_overlapping_ents_filter` and
  `preserve_existing_ents_filter` to support
  `EntityRuler.overwrite_ents` settings.
* Add `remove_by_id` to support `EntityRuler.remove` functionality.
* Refactor `entity_ruler` tests to parametrize all tests to test both
  `entity_ruler` and `future_entity_ruler`
* Implement `SpanRuler.token_patterns` and `SpanRuler.phrase_patterns`
  properties.

Additional changes:

* Move all config settings to top-level attributes to avoid duplicating
  settings in the config vs. `span_ruler/cfg`. (Also avoids a lot of
  casting.)

* Format

* Fix filter make method name

* Refactor to use same error for removing by label or ID

* Also provide existing spans to spans filter

* Support ids property

* Remove token_patterns and phrase_patterns

* Update docstrings

* Add span ruler docs

* Fix types

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move sorting into filters

* Check for all tokens in seen tokens in entity ruler filters

* Remove registered sort key

* Set Token.ent_id in a backwards-compatible way in Doc.set_ents

* Remove sort options from API docs

* Update docstrings

* Rename entity ruler filters

* Fix and parameterize scoring

* Add id to Span API docs

* Fix typo in API docs

* Include explicit labeled=True for scorer

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-02 13:12:53 +02:00

128 lines
3.5 KiB
Python

from typing import Callable, Protocol, Iterator, Optional, Union, Tuple, Any, overload
from thinc.types import Floats1d, Ints2d, FloatsXd
from .doc import Doc
from .token import Token
from .underscore import Underscore
from ..lexeme import Lexeme
from ..vocab import Vocab
class SpanMethod(Protocol):
def __call__(self: Span, *args: Any, **kwargs: Any) -> Any: ... # type: ignore[misc]
class Span:
@classmethod
def set_extension(
cls,
name: str,
default: Optional[Any] = ...,
getter: Optional[Callable[[Span], Any]] = ...,
setter: Optional[Callable[[Span, Any], None]] = ...,
method: Optional[SpanMethod] = ...,
force: bool = ...,
) -> None: ...
@classmethod
def get_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[SpanMethod],
Optional[Callable[[Span], Any]],
Optional[Callable[[Span, Any], None]],
]: ...
@classmethod
def has_extension(cls, name: str) -> bool: ...
@classmethod
def remove_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[SpanMethod],
Optional[Callable[[Span], Any]],
Optional[Callable[[Span, Any], None]],
]: ...
def __init__(
self,
doc: Doc,
start: int,
end: int,
label: Union[str, int] = ...,
vector: Optional[Floats1d] = ...,
vector_norm: Optional[float] = ...,
kb_id: Union[str, int] = ...,
span_id: Union[str, int] = ...,
) -> None: ...
def __richcmp__(self, other: Span, op: int) -> bool: ...
def __hash__(self) -> int: ...
def __len__(self) -> int: ...
def __repr__(self) -> str: ...
@overload
def __getitem__(self, i: int) -> Token: ...
@overload
def __getitem__(self, i: slice) -> Span: ...
def __iter__(self) -> Iterator[Token]: ...
@property
def _(self) -> Underscore: ...
def as_doc(self, *, copy_user_data: bool = ...) -> Doc: ...
def get_lca_matrix(self) -> Ints2d: ...
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
@property
def doc(self) -> Doc: ...
@property
def vocab(self) -> Vocab: ...
@property
def sent(self) -> Span: ...
@property
def ents(self) -> Tuple[Span]: ...
@property
def has_vector(self) -> bool: ...
@property
def vector(self) -> Floats1d: ...
@property
def vector_norm(self) -> float: ...
@property
def tensor(self) -> FloatsXd: ...
@property
def sentiment(self) -> float: ...
@property
def text(self) -> str: ...
@property
def text_with_ws(self) -> str: ...
@property
def noun_chunks(self) -> Iterator[Span]: ...
@property
def root(self) -> Token: ...
def char_span(
self,
start_idx: int,
end_idx: int,
label: int = ...,
kb_id: int = ...,
vector: Optional[Floats1d] = ...,
) -> Span: ...
@property
def conjuncts(self) -> Tuple[Token]: ...
@property
def lefts(self) -> Iterator[Token]: ...
@property
def rights(self) -> Iterator[Token]: ...
@property
def n_lefts(self) -> int: ...
@property
def n_rights(self) -> int: ...
@property
def subtree(self) -> Iterator[Token]: ...
start: int
end: int
start_char: int
end_char: int
label: int
kb_id: int
ent_id: int
ent_id_: str
@property
def orth_(self) -> str: ...
@property
def lemma_(self) -> str: ...
label_: str
kb_id_: str