spaCy/spacy/pipeline/_parser_internals/stateclass.pyx

# cython: infer_types=True
import numpy

from ...tokens.doc cimport Doc


cdef class StateClass:
    def __init__(self, Doc doc=None, int offset=0):
        cdef Pool mem = Pool()
        self.mem = mem
        self._borrowed = 0
        if doc is not None:
            self.c = new StateC(doc.c, doc.length)
            self.c.offset = offset

    def __dealloc__(self):
        if self._borrowed != 1:
            del self.c

    @property
    def stack(self):
        return {self.S(i) for i in range(self.c._s_i)}

    @property
    def queue(self):
        return {self.B(i) for i in range(self.c.buffer_length())}

    @property
    def token_vector_lenth(self):
        return self.doc.tensor.shape[1]

    @property
    def history(self):
        hist = numpy.ndarray((8,), dtype='i')
        for i in range(8):
            hist[i] = self.c.get_hist(i+1)
        return hist

    def is_final(self):
        return self.c.is_final()

    def copy(self):
        cdef StateClass new_state = StateClass.init(self.c._sent, self.c.length)
        new_state.c.clone(self.c)
        return new_state

    def print_state(self, words):
        words = list(words) + ['_']
        top = f"{words[self.S(0)]}_{self.S_(0).head}"
        second = f"{words[self.S(1)]}_{self.S_(1).head}"
        third = f"{words[self.S(2)]}_{self.S_(2).head}"
        n0 = words[self.B(0)]
        n1 = words[self.B(1)]
        return ' '.join((third, second, top, '|', n0, n1))
Data running through, likely errors in model 2017-05-06 15:22:20 +03:00			`# cython: infer_types=True`
Support history features in stateclass 2017-10-03 13:43:48 +03:00			`import numpy`
Tidy up and fix formatting and imports 2017-04-15 14:05:15 +03:00
The Parser is now a Pipe (2) (#5844) * moving syntax folder to _parser_internals * moving nn_parser and transition_system * move nn_parser and transition_system out of internals folder * moving nn_parser code into transition_system file * rename transition_system to transition_parser * moving parser_model and _state to ml * move _state back to internals * The Parser now inherits from Pipe! * small code fixes * removing unnecessary imports * remove link_vectors_to_models * transition_system to internals folder * little bit more cleanup * newlines 2020-07-31 00:30:54 +03:00			`from ...tokens.doc cimport Doc`
* Add StateClass, to replace/refactor the mess in _state 2015-06-09 02:39:54 +03:00

			`cdef class StateClass:`
Improve integration of NN parser, to support unified training API 2017-05-15 22:46:08 +03:00			`def __init__(self, Doc doc=None, int offset=0):`
* Prepare to switch to using state class, instead of state struct 2015-06-09 22:20:14 +03:00			`cdef Pool mem = Pool()`
			`self.mem = mem`
Fix memory leak in beam parser 2017-11-14 04:11:40 +03:00			`self._borrowed = 0`
Improve integration of NN parser, to support unified training API 2017-05-15 22:46:08 +03:00			`if doc is not None:`
			`self.c = new StateC(doc.c, doc.length)`
			`self.c.offset = offset`
* Continue proxying. Some problem currently 2016-02-01 04:22:21 +03:00
			`def __dealloc__(self):`
Fix memory leak in beam parser 2017-11-14 04:11:40 +03:00			`if self._borrowed != 1:`
			`del self.c`
* Continue proxying. Some problem currently 2016-02-01 04:22:21 +03:00
* Add stack and queue properties to stateclass, for python access 2015-08-09 00:32:42 +03:00			`@property`
			`def stack(self):`
different handling of space tokens space tokens are now always attached to the previous non-space token there are two exceptions: leading space tokens are attached to the first following non-space token in input that consists exclusively of space tokens, the last space token is the head of all others. 2016-04-13 16:28:28 +03:00			`return {self.S(i) for i in range(self.c._s_i)}`
* Add stack and queue properties to stateclass, for python access 2015-08-09 00:32:42 +03:00
			`@property`
			`def queue(self):`
Fix queue Python property in StateClass 2016-10-16 18:04:41 +03:00			`return {self.B(i) for i in range(self.c.buffer_length())}`
* Add stack and queue properties to stateclass, for python access 2015-08-09 00:32:42 +03:00
Data running through, likely errors in model 2017-05-06 15:22:20 +03:00			`@property`
			`def token_vector_lenth(self):`
			`return self.doc.tensor.shape[1]`

Support history features in stateclass 2017-10-03 13:43:48 +03:00			`@property`
			`def history(self):`
			`hist = numpy.ndarray((8,), dtype='i')`
			`for i in range(8):`
			`hist[i] = self.c.get_hist(i+1)`
			`return hist`

Improve integration of NN parser, to support unified training API 2017-05-15 22:46:08 +03:00			`def is_final(self):`
Data running through, likely errors in model 2017-05-06 15:22:20 +03:00			`return self.c.is_final()`

Improve efficiency of parser batching 2017-05-26 19:31:23 +03:00			`def copy(self):`
			`cdef StateClass new_state = StateClass.init(self.c._sent, self.c.length)`
			`new_state.c.clone(self.c)`
			`return new_state`

* Move StateClass into interface of transition functions 2015-06-10 02:35:28 +03:00			`def print_state(self, words):`
			`words = list(words) + ['_']`
More formatting changes 2019-12-25 19:59:52 +03:00			`top = f"{words[self.S(0)]}_{self.S_(0).head}"`
			`second = f"{words[self.S(1)]}_{self.S_(1).head}"`
			`third = f"{words[self.S(2)]}_{self.S_(2).head}"`
Tidy up and fix formatting and imports 2017-04-15 14:05:15 +03:00			`n0 = words[self.B(0)]`
			`n1 = words[self.B(1)]`
* Upd stateclass.print_state 2015-06-14 18:44:29 +03:00			`return ' '.join((third, second, top, '\|', n0, n1))`