From 50e3b624b4f5491a08ef95eacff55ca238890934 Mon Sep 17 00:00:00 2001 From: Jacob Bortell Date: Tue, 24 Nov 2020 10:16:06 -0500 Subject: [PATCH] Update rule-based-matching.md (#6421) * Update rule-based-matching.md Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc). Clarified "Type" column header to "Value Type" * Update rule-based-matching.md Improved clarity of wording --- website/docs/usage/rule-based-matching.md | 28 +++++++++++------------ 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/website/docs/usage/rule-based-matching.md b/website/docs/usage/rule-based-matching.md index 7749dab59..7ad402df1 100644 --- a/website/docs/usage/rule-based-matching.md +++ b/website/docs/usage/rule-based-matching.md @@ -157,20 +157,20 @@ The available token pattern keys correspond to a number of [`Token` attributes](/api/token#attributes). The supported attributes for rule-based matching are: -| Attribute | Type |  Description | -| -------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------ | -| `ORTH` | unicode | The exact verbatim text of a token. | -| `TEXT` 2.1 | unicode | The exact verbatim text of a token. | -| `LOWER` | unicode | The lowercase form of the token text. | -|  `LENGTH` | int | The length of the token text. | -|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | bool | Token text consists of alphabetic characters, ASCII characters, digits. | -|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | bool | Token text is in lowercase, uppercase, titlecase. | -|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | bool | Token is punctuation, whitespace, stop word. | -|  `IS_SENT_START` | bool | Token is start of sentence. | -|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | bool | Token text resembles a number, URL, email. | -|  `POS`, `TAG`, `DEP`, `LEMMA`, `SHAPE` | unicode | The token's simple and extended part-of-speech tag, dependency label, lemma, shape. | -| `ENT_TYPE` | unicode | The token's entity label. | -| `_` 2.1 | dict | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). | +| Attribute | Value Type |  Description | +| -------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------ | +| `ORTH` | unicode | The exact verbatim text of a token. | +| `TEXT` 2.1 | unicode | The exact verbatim text of a token. | +| `LOWER` | unicode | The lowercase form of the token text. | +| `LENGTH` | int | The length of the token text. | +| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | bool | Token text consists of alphabetic characters, ASCII characters, digits. | +| `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | bool | Token text is in lowercase, uppercase, titlecase. | +| `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | bool | Token is punctuation, whitespace, stop word. | +| `IS_SENT_START` | bool | Token is start of sentence. | +| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | bool | Token text resembles a number, URL, email. | +| `POS`, `TAG`, `DEP`, `LEMMA`, `SHAPE` | unicode | The token's simple and extended part-of-speech tag, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation).| +| `ENT_TYPE` | unicode | The token's entity label. | +| `_` 2.1 | dict | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). |