mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 13:47:13 +03:00
0f01f46e02
* Replace all basestring references with unicode `basestring` was a compatability type introduced by Cython to make dealing with utf-8 strings in Python2 easier. In Python3 it is equivalent to the unicode (or str) type. I replaced all references to basestring with unicode, since that was used elsewhere, but we could also just replace them with str, which shoudl also be equivalent. All tests pass locally. * Replace all references to unicode type with str Since we only support python3 this is simpler. * Remove all references to unicode type This removes all references to the unicode type across the codebase and replaces them with `str`, which makes it more drastic than the prior commits. In order to make this work importing `unicode_literals` had to be removed, and one explicit unicode literal also had to be removed (it is unclear why this is necessary in Cython with language level 3, but without doing it there were errors about implicit conversion). When `unicode` is used as a type in comments it was also edited to be `str`. Additionally `coding: utf8` headers were removed from a few files.
30 lines
699 B
Cython
30 lines
699 B
Cython
from libc.stdint cimport int64_t
|
|
from libcpp.vector cimport vector
|
|
from libcpp.set cimport set
|
|
from cymem.cymem cimport Pool
|
|
from preshed.maps cimport PreshMap
|
|
from murmurhash.mrmr cimport hash64
|
|
|
|
from .typedefs cimport attr_t, hash_t
|
|
|
|
|
|
cpdef hash_t hash_string(str string) except 0
|
|
cdef hash_t hash_utf8(char* utf8_string, int length) nogil
|
|
|
|
cdef str decode_Utf8Str(const Utf8Str* string)
|
|
|
|
|
|
ctypedef union Utf8Str:
|
|
unsigned char[8] s
|
|
unsigned char* p
|
|
|
|
|
|
cdef class StringStore:
|
|
cdef Pool mem
|
|
|
|
cdef vector[hash_t] keys
|
|
cdef public PreshMap _map
|
|
|
|
cdef const Utf8Str* intern_unicode(self, str py_string)
|
|
cdef const Utf8Str* _intern_utf8(self, char* utf8_string, int length)
|