spaCy/docs/source/reference/lookup.rst

Lexical Lookup
--------------

Where possible, spaCy computes information over lexical *types*, rather than
*tokens*.  If you process a large batch of text, the number of unique types
you will see will grow exponentially slower than the number of tokens --- so
it's much more efficient to compute over types.  And, in small samples, we generally
want to know about the distribution of a word in the language at large ---
which again, is type-based information.

You can access the lexical features via the Token object, but you can also look them
up in the vocabulary directly:

    >>> from spacy.en import English
    >>> nlp = English()
    >>> lexeme = nlp.vocab[u'Amazon']

.. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None)

  .. py:method:: __len__(self) --> int

  .. py:method:: __getitem__(self, id: int) --> unicode

  .. py:method:: __getitem__(self, string: unicode) --> int

  .. py:method:: __setitem__(self, py_str: unicode, props: Dict[str, int[float]) --> None

  .. py:method:: dump(self, loc: unicode) --> None

  .. py:method:: load_lexemes(self, loc: unicode) --> None

  .. py:method:: load_vectors(self, loc: unicode) --> None


.. py:class:: strings.StringStore(self)

  .. py:method:: __len__(self) --> int

  .. py:method:: __getitem__(self, id: int) --> unicode

  .. py:method:: __getitem__(self, string: bytes) --> id

  .. py:method:: __getitem__(self, string: unicode) --> id

  .. py:method:: dump(self, loc: unicode) --> None

  .. py:method:: load(self, loc: unicode) --> None
* Add docs for vocab and string store 2015-07-08 19:00:33 +03:00			`Lexical Lookup`
			`--------------`

			`Where possible, spaCy computes information over lexical types, rather than`
			`tokens. If you process a large batch of text, the number of unique types`
			`you will see will grow exponentially slower than the number of tokens --- so`
			`it's much more efficient to compute over types. And, in small samples, we generally`
			`want to know about the distribution of a word in the language at large ---`
			`which again, is type-based information.`

			`You can access the lexical features via the Token object, but you can also look them`
			`up in the vocabulary directly:`

			`>>> from spacy.en import English`
			`>>> nlp = English()`
			`>>> lexeme = nlp.vocab[u'Amazon']`

			`.. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None)`

			`.. py:method:: __len__(self) --> int`

			`.. py:method:: __getitem__(self, id: int) --> unicode`

			`.. py:method:: __getitem__(self, string: unicode) --> int`

			`.. py:method:: __setitem__(self, py_str: unicode, props: Dict[str, int[float]) --> None`

			`.. py:method:: dump(self, loc: unicode) --> None`

			`.. py:method:: load_lexemes(self, loc: unicode) --> None`

			`.. py:method:: load_vectors(self, loc: unicode) --> None`


			`.. py:class:: strings.StringStore(self)`

			`.. py:method:: __len__(self) --> int`

			`.. py:method:: __getitem__(self, id: int) --> unicode`

			`.. py:method:: __getitem__(self, string: bytes) --> id`

			`.. py:method:: __getitem__(self, string: unicode) --> id`

			`.. py:method:: dump(self, loc: unicode) --> None`

			`.. py:method:: load(self, loc: unicode) --> None`