mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 10:16:27 +03:00
Correct alignment example and documentation (#11491)
* Correct example and documentation * Added altered example.md * Changes based on review + apply prettier * Remote unnecessary 'the' Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
This commit is contained in:
parent
6be6913ba5
commit
3f0c3ad7d3
|
@ -286,10 +286,14 @@ Calculate alignment tables between two tokenizations.
|
||||||
|
|
||||||
### Alignment attributes {#alignment-attributes"}
|
### Alignment attributes {#alignment-attributes"}
|
||||||
|
|
||||||
|
Alignment attributes are managed using `AlignmentArray`, which is a
|
||||||
|
simplified version of Thinc's [Ragged](https://thinc.ai/docs/api-types#ragged)
|
||||||
|
type that only supports the `data` and `length` attributes.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----- | --------------------------------------------------------------------- |
|
| ----- | ------------------------------------------------------------------------------------- |
|
||||||
| `x2y` | The `Ragged` object holding the alignment from `x` to `y`. ~~Ragged~~ |
|
| `x2y` | The `AlignmentArray` object holding the alignment from `x` to `y`. ~~AlignmentArray~~ |
|
||||||
| `y2x` | The `Ragged` object holding the alignment from `y` to `x`. ~~Ragged~~ |
|
| `y2x` | The `AlignmentArray` object holding the alignment from `y` to `x`. ~~AlignmentArray~~ |
|
||||||
|
|
||||||
<Infobox title="Important note" variant="warning">
|
<Infobox title="Important note" variant="warning">
|
||||||
|
|
||||||
|
@ -309,10 +313,10 @@ tokenizations add up to the same string. For example, you'll be able to align
|
||||||
> spacy_tokens = ["obama", "'s", "podcast"]
|
> spacy_tokens = ["obama", "'s", "podcast"]
|
||||||
> alignment = Alignment.from_strings(bert_tokens, spacy_tokens)
|
> alignment = Alignment.from_strings(bert_tokens, spacy_tokens)
|
||||||
> a2b = alignment.x2y
|
> a2b = alignment.x2y
|
||||||
> assert list(a2b.dataXd) == [0, 1, 1, 2]
|
> assert list(a2b.data) == [0, 1, 1, 2]
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> If `a2b.dataXd[1] == a2b.dataXd[2] == 1`, that means that `A[1]` (`"'"`) and
|
> If `a2b.data[1] == a2b.data[2] == 1`, that means that `A[1]` (`"'"`) and
|
||||||
> `A[2]` (`"s"`) both align to `B[1]` (`"'s"`).
|
> `A[2]` (`"s"`) both align to `B[1]` (`"'s"`).
|
||||||
|
|
||||||
### Alignment.from_strings {#classmethod tag="function"}
|
### Alignment.from_strings {#classmethod tag="function"}
|
||||||
|
|
|
@ -1422,9 +1422,9 @@ other_tokens = ["i", "listened", "to", "obama", "'", "s", "podcasts", "."]
|
||||||
spacy_tokens = ["i", "listened", "to", "obama", "'s", "podcasts", "."]
|
spacy_tokens = ["i", "listened", "to", "obama", "'s", "podcasts", "."]
|
||||||
align = Alignment.from_strings(other_tokens, spacy_tokens)
|
align = Alignment.from_strings(other_tokens, spacy_tokens)
|
||||||
print(f"a -> b, lengths: {align.x2y.lengths}") # array([1, 1, 1, 1, 1, 1, 1, 1])
|
print(f"a -> b, lengths: {align.x2y.lengths}") # array([1, 1, 1, 1, 1, 1, 1, 1])
|
||||||
print(f"a -> b, mapping: {align.x2y.dataXd}") # array([0, 1, 2, 3, 4, 4, 5, 6]) : two tokens both refer to "'s"
|
print(f"a -> b, mapping: {align.x2y.data}") # array([0, 1, 2, 3, 4, 4, 5, 6]) : two tokens both refer to "'s"
|
||||||
print(f"b -> a, lengths: {align.y2x.lengths}") # array([1, 1, 1, 1, 2, 1, 1]) : the token "'s" refers to two tokens
|
print(f"b -> a, lengths: {align.y2x.lengths}") # array([1, 1, 1, 1, 2, 1, 1]) : the token "'s" refers to two tokens
|
||||||
print(f"b -> a, mappings: {align.y2x.dataXd}") # array([0, 1, 2, 3, 4, 5, 6, 7])
|
print(f"b -> a, mappings: {align.y2x.data}") # array([0, 1, 2, 3, 4, 5, 6, 7])
|
||||||
```
|
```
|
||||||
|
|
||||||
Here are some insights from the alignment information generated in the example
|
Here are some insights from the alignment information generated in the example
|
||||||
|
@ -1433,10 +1433,10 @@ above:
|
||||||
- The one-to-one mappings for the first four tokens are identical, which means
|
- The one-to-one mappings for the first four tokens are identical, which means
|
||||||
they map to each other. This makes sense because they're also identical in the
|
they map to each other. This makes sense because they're also identical in the
|
||||||
input: `"i"`, `"listened"`, `"to"` and `"obama"`.
|
input: `"i"`, `"listened"`, `"to"` and `"obama"`.
|
||||||
- The value of `x2y.dataXd[6]` is `5`, which means that `other_tokens[6]`
|
- The value of `x2y.data[6]` is `5`, which means that `other_tokens[6]`
|
||||||
(`"podcasts"`) aligns to `spacy_tokens[5]` (also `"podcasts"`).
|
(`"podcasts"`) aligns to `spacy_tokens[5]` (also `"podcasts"`).
|
||||||
- `x2y.dataXd[4]` and `x2y.dataXd[5]` are both `4`, which means that both tokens
|
- `x2y.data[4]` and `x2y.data[5]` are both `4`, which means that both tokens 4
|
||||||
4 and 5 of `other_tokens` (`"'"` and `"s"`) align to token 4 of `spacy_tokens`
|
and 5 of `other_tokens` (`"'"` and `"s"`) align to token 4 of `spacy_tokens`
|
||||||
(`"'s"`).
|
(`"'s"`).
|
||||||
|
|
||||||
<Infobox title="Important note" variant="warning">
|
<Infobox title="Important note" variant="warning">
|
||||||
|
|
Loading…
Reference in New Issue
Block a user