
.. automodule:: spacy.en

Tokenizer API

.. automethod:: spacy.en.EN.tokenize

.. automethod:: spacy.en.EN.lookup

Lexeme Features Flag IDs

A number of boolean features are computed for English Lexemes. To access a feature,
pass its ID to the :py:meth:`spacy.word.Lexeme.check_flag` function.

Orthographic Features

These features describe the `orthographic` (lettering) type of the word. The
function used to compute the value is listed along with the flag.

.. data:: IS_ALPHA


.. data:: IS_DIGIT
.. data:: IS_UPPER


.. data:: IS_PUNCT


.. data:: IS_SPACE


.. data:: IS_ASCII


.. data:: IS_TITLE


.. data:: IS_LOWER


.. data:: IS_UPPER


Distributional Orthographic Features

These features describe how often the lower-cased form of the word appears
in various case-styles in a large sample of English text. See :py:func:`spacy.orth.oft_case`

.. data:: OFT_UPPER
.. data:: OFT_LOWER
.. data:: OFT_TITLE

Tag Dictionary Features

These features describe whether the word commonly occurs with a given
part-of-speech, in a large text corpus, using a part-of-speech tagger designed
to reduce the tag-dictionary bias of its training corpus. See

.. data:: CAN_PUNCT
.. data:: CAN_CONJ
.. data:: CAN_NUM
.. data:: CAN_DET
.. data:: CAN_ADP
.. data:: CAN_ADJ
.. data:: CAN_ADV
.. data:: CAN_VERB
.. data:: CAN_NOUN
.. data:: CAN_PDT
.. data:: CAN_POS
.. data:: CAN_PRON
.. data:: CAN_PRT