| Get a custom spaCy pipeline, tailor-made for your NLP problem by spaCy's core developers. Streamlined, production-ready, predictable and maintainable. Start by completing our 5-minute questionnaire to tell us what you need and we'll be in touch! **[Learn more →](https://explosion.ai/spacy-tailored-pipelines)** |
-|
| Bespoke advice for problem solving, strategy and analysis for applied NLP projects. Services include data strategy, code reviews, pipeline design and annotation coaching. Curious? Fill in our 5-minute questionnaire to tell us what you need and we'll be in touch! **[Learn more →](https://explosion.ai/spacy-tailored-analysis)** |
+| Documentation | |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ⭐️ **[spaCy 101]** | New to spaCy? Here's everything you need to know! |
+| 📚 **[Usage Guides]** | How to use spaCy and its features. |
+| 🚀 **[New in v3.0]** | New features, backwards incompatibilities and migration guide. |
+| 🪐 **[Project Templates]** | End-to-end workflows you can clone, modify and run. |
+| 🎛 **[API Reference]** | The detailed reference for spaCy's API. |
+| 📦 **[Models]** | Download trained pipelines for spaCy. |
+| 🌌 **[Universe]** | Plugins, extensions, demos and books from the spaCy ecosystem. |
+| ⚙️ **[spaCy VS Code Extension]** | Additional tooling and features for working with spaCy's config files. |
+| 👩🏫 **[Online Course]** | Learn spaCy in this free and interactive online course. |
+| 📺 **[Videos]** | Our YouTube channel with video tutorials, talks and more. |
+| 🛠 **[Changelog]** | Changes and version history. |
+| 💝 **[Contribute]** | How to contribute to the spaCy project and code base. |
+|
| Get a custom spaCy pipeline, tailor-made for your NLP problem by spaCy's core developers. Streamlined, production-ready, predictable and maintainable. Start by completing our 5-minute questionnaire to tell us what you need and we'll be in touch! **[Learn more →](https://explosion.ai/spacy-tailored-pipelines)** |
+|
| Bespoke advice for problem solving, strategy and analysis for applied NLP projects. Services include data strategy, code reviews, pipeline design and annotation coaching. Curious? Fill in our 5-minute questionnaire to tell us what you need and we'll be in touch! **[Learn more →](https://explosion.ai/spacy-tailored-analysis)** |
[spacy 101]: https://spacy.io/usage/spacy-101
[new in v3.0]: https://spacy.io/usage/v3
@@ -58,7 +55,7 @@ open-source software, released under the [MIT license](https://github.com/explos
[api reference]: https://spacy.io/api/
[models]: https://spacy.io/models
[universe]: https://spacy.io/universe
-[spaCy VS Code Extension]: https://github.com/explosion/spacy-vscode
+[spacy vs code extension]: https://github.com/explosion/spacy-vscode
[videos]: https://www.youtube.com/c/ExplosionAI
[online course]: https://course.spacy.io
[project templates]: https://github.com/explosion/projects
@@ -92,7 +89,9 @@ more people can benefit from it.
- State-of-the-art speed
- Production-ready **training system**
- Linguistically-motivated **tokenization**
-- Components for named **entity recognition**, part-of-speech-tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
+- Components for named **entity recognition**, part-of-speech-tagging,
+ dependency parsing, sentence segmentation, **text classification**,
+ lemmatization, morphological analysis, entity linking and more
- Easily extensible with **custom components** and attributes
- Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
- Built in **visualizers** for syntax and NER
@@ -118,8 +117,8 @@ For detailed installation instructions, see the
### pip
Using pip, spaCy releases are available as source packages and binary wheels.
-Before you install spaCy and its dependencies, make sure that
-your `pip`, `setuptools` and `wheel` are up to date.
+Before you install spaCy and its dependencies, make sure that your `pip`,
+`setuptools` and `wheel` are up to date.
```bash
pip install -U pip setuptools wheel
@@ -174,9 +173,9 @@ with the new version.
## 📦 Download model packages
-Trained pipelines for spaCy can be installed as **Python packages**. This
-means that they're a component of your application, just like any other module.
-Models can be installed using spaCy's [`download`](https://spacy.io/api/cli#download)
+Trained pipelines for spaCy can be installed as **Python packages**. This means
+that they're a component of your application, just like any other module. Models
+can be installed using spaCy's [`download`](https://spacy.io/api/cli#download)
command, or manually by pointing pip to a path or URL.
| Documentation | |
@@ -242,8 +241,7 @@ do that depends on your system.
| **Mac** | Install a recent version of [XCode](https://developer.apple.com/xcode/), including the so-called "Command Line Tools". macOS and OS X ship with Python and git preinstalled. |
| **Windows** | Install a version of the [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that matches the version that was used to compile your Python interpreter. |
-For more details
-and instructions, see the documentation on
+For more details and instructions, see the documentation on
[compiling spaCy from source](https://spacy.io/usage#source) and the
[quickstart widget](https://spacy.io/usage#section-quickstart) to get the right
commands for your platform and Python version.
diff --git a/requirements.txt b/requirements.txt
index a007f495e..4a131d18c 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -38,4 +38,5 @@ types-setuptools>=57.0.0
types-requests
types-setuptools>=57.0.0
black==22.3.0
+cython-lint>=0.15.0; python_version >= "3.7"
isort>=5.0,<6.0
diff --git a/setup.py b/setup.py
index 243554c7a..3b6fae37b 100755
--- a/setup.py
+++ b/setup.py
@@ -1,10 +1,9 @@
#!/usr/bin/env python
from setuptools import Extension, setup, find_packages
import sys
-import platform
import numpy
-from distutils.command.build_ext import build_ext
-from distutils.sysconfig import get_python_inc
+from setuptools.command.build_ext import build_ext
+from sysconfig import get_path
from pathlib import Path
import shutil
from Cython.Build import cythonize
@@ -88,30 +87,6 @@ COPY_FILES = {
}
-def is_new_osx():
- """Check whether we're on OSX >= 10.7"""
- if sys.platform != "darwin":
- return False
- mac_ver = platform.mac_ver()[0]
- if mac_ver.startswith("10"):
- minor_version = int(mac_ver.split(".")[1])
- if minor_version >= 7:
- return True
- else:
- return False
- return False
-
-
-if is_new_osx():
- # On Mac, use libc++ because Apple deprecated use of
- # libstdc
- COMPILE_OPTIONS["other"].append("-stdlib=libc++")
- LINK_OPTIONS["other"].append("-lc++")
- # g++ (used by unix compiler on mac) links to libstdc++ as a default lib.
- # See: https://stackoverflow.com/questions/1653047/avoid-linking-to-libstdc
- LINK_OPTIONS["other"].append("-nodefaultlibs")
-
-
# By subclassing build_extensions we have the actual compiler that will be used which is really known only after finalize_options
# http://stackoverflow.com/questions/724664/python-distutils-how-to-get-a-compiler-that-is-going-to-be-used
class build_ext_options:
@@ -204,7 +179,7 @@ def setup_package():
include_dirs = [
numpy.get_include(),
- get_python_inc(plat_specific=True),
+ get_path("include"),
]
ext_modules = []
ext_modules.append(
diff --git a/spacy/attrs.pxd b/spacy/attrs.pxd
index 6dc9ecaee..fbbac0ec2 100644
--- a/spacy/attrs.pxd
+++ b/spacy/attrs.pxd
@@ -96,4 +96,4 @@ cdef enum attr_id_t:
ENT_ID = symbols.ENT_ID
IDX
- SENT_END
\ No newline at end of file
+ SENT_END
diff --git a/spacy/attrs.pyx b/spacy/attrs.pyx
index dc8eed7c3..97b5d5e36 100644
--- a/spacy/attrs.pyx
+++ b/spacy/attrs.pyx
@@ -117,7 +117,7 @@ def intify_attrs(stringy_attrs, strings_map=None, _do_deprecated=False):
if "pos" in stringy_attrs:
stringy_attrs["TAG"] = stringy_attrs.pop("pos")
if "morph" in stringy_attrs:
- morphs = stringy_attrs.pop("morph")
+ morphs = stringy_attrs.pop("morph") # no-cython-lint
if "number" in stringy_attrs:
stringy_attrs.pop("number")
if "tenspect" in stringy_attrs:
diff --git a/spacy/kb/candidate.pxd b/spacy/kb/candidate.pxd
index 9fc4c4e9d..80fcbc459 100644
--- a/spacy/kb/candidate.pxd
+++ b/spacy/kb/candidate.pxd
@@ -4,7 +4,8 @@ from ..typedefs cimport hash_t
from .kb cimport KnowledgeBase
-# Object used by the Entity Linker that summarizes one entity-alias candidate combination.
+# Object used by the Entity Linker that summarizes one entity-alias candidate
+# combination.
cdef class Candidate:
cdef readonly KnowledgeBase kb
cdef hash_t entity_hash
diff --git a/spacy/kb/candidate.pyx b/spacy/kb/candidate.pyx
index 4cd734f43..53fc9b036 100644
--- a/spacy/kb/candidate.pyx
+++ b/spacy/kb/candidate.pyx
@@ -8,15 +8,24 @@ from ..tokens import Span
cdef class Candidate:
- """A `Candidate` object refers to a textual mention (`alias`) that may or may not be resolved
- to a specific `entity` from a Knowledge Base. This will be used as input for the entity linking
- algorithm which will disambiguate the various candidates to the correct one.
+ """A `Candidate` object refers to a textual mention (`alias`) that may or
+ may not be resolved to a specific `entity` from a Knowledge Base. This
+ will be used as input for the entity linking algorithm which will
+ disambiguate the various candidates to the correct one.
Each candidate (alias, entity) pair is assigned a certain prior probability.
DOCS: https://spacy.io/api/kb/#candidate-init
"""
- def __init__(self, KnowledgeBase kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob):
+ def __init__(
+ self,
+ KnowledgeBase kb,
+ entity_hash,
+ entity_freq,
+ entity_vector,
+ alias_hash,
+ prior_prob
+ ):
self.kb = kb
self.entity_hash = entity_hash
self.entity_freq = entity_freq
@@ -59,7 +68,8 @@ cdef class Candidate:
def get_candidates(kb: KnowledgeBase, mention: Span) -> Iterable[Candidate]:
"""
- Return candidate entities for a given mention and fetching appropriate entries from the index.
+ Return candidate entities for a given mention and fetching appropriate
+ entries from the index.
kb (KnowledgeBase): Knowledge base to query.
mention (Span): Entity mention for which to identify candidates.
RETURNS (Iterable[Candidate]): Identified candidates.
@@ -67,9 +77,12 @@ def get_candidates(kb: KnowledgeBase, mention: Span) -> Iterable[Candidate]:
return kb.get_candidates(mention)
-def get_candidates_batch(kb: KnowledgeBase, mentions: Iterable[Span]) -> Iterable[Iterable[Candidate]]:
+def get_candidates_batch(
+ kb: KnowledgeBase, mentions: Iterable[Span]
+) -> Iterable[Iterable[Candidate]]:
"""
- Return candidate entities for the given mentions and fetching appropriate entries from the index.
+ Return candidate entities for the given mentions and fetching appropriate entries
+ from the index.
kb (KnowledgeBase): Knowledge base to query.
mention (Iterable[Span]): Entity mentions for which to identify candidates.
RETURNS (Iterable[Iterable[Candidate]]): Identified candidates.
diff --git a/spacy/kb/kb.pyx b/spacy/kb/kb.pyx
index a88e18e1f..6ad4c3564 100644
--- a/spacy/kb/kb.pyx
+++ b/spacy/kb/kb.pyx
@@ -12,8 +12,9 @@ from .candidate import Candidate
cdef class KnowledgeBase:
- """A `KnowledgeBase` instance stores unique identifiers for entities and their textual aliases,
- to support entity linking of named entities to real-world concepts.
+ """A `KnowledgeBase` instance stores unique identifiers for entities and
+ their textual aliases, to support entity linking of named entities to
+ real-world concepts.
This is an abstract class and requires its operations to be implemented.
DOCS: https://spacy.io/api/kb
@@ -31,10 +32,13 @@ cdef class KnowledgeBase:
self.entity_vector_length = entity_vector_length
self.mem = Pool()
- def get_candidates_batch(self, mentions: Iterable[Span]) -> Iterable[Iterable[Candidate]]:
+ def get_candidates_batch(
+ self, mentions: Iterable[Span]
+ ) -> Iterable[Iterable[Candidate]]:
"""
- Return candidate entities for specified texts. Each candidate defines the entity, the original alias,
- and the prior probability of that alias resolving to that entity.
+ Return candidate entities for specified texts. Each candidate defines
+ the entity, the original alias, and the prior probability of that
+ alias resolving to that entity.
If no candidate is found for a given text, an empty list is returned.
mentions (Iterable[Span]): Mentions for which to get candidates.
RETURNS (Iterable[Iterable[Candidate]]): Identified candidates.
@@ -43,14 +47,17 @@ cdef class KnowledgeBase:
def get_candidates(self, mention: Span) -> Iterable[Candidate]:
"""
- Return candidate entities for specified text. Each candidate defines the entity, the original alias,
+ Return candidate entities for specified text. Each candidate defines
+ the entity, the original alias,
and the prior probability of that alias resolving to that entity.
If the no candidate is found for a given text, an empty list is returned.
mention (Span): Mention for which to get candidates.
RETURNS (Iterable[Candidate]): Identified candidates.
"""
raise NotImplementedError(
- Errors.E1045.format(parent="KnowledgeBase", method="get_candidates", name=self.__name__)
+ Errors.E1045.format(
+ parent="KnowledgeBase", method="get_candidates", name=self.__name__
+ )
)
def get_vectors(self, entities: Iterable[str]) -> Iterable[Iterable[float]]:
@@ -68,7 +75,9 @@ cdef class KnowledgeBase:
RETURNS (Iterable[float]): Vector for specified entity.
"""
raise NotImplementedError(
- Errors.E1045.format(parent="KnowledgeBase", method="get_vector", name=self.__name__)
+ Errors.E1045.format(
+ parent="KnowledgeBase", method="get_vector", name=self.__name__
+ )
)
def to_bytes(self, **kwargs) -> bytes:
@@ -76,7 +85,9 @@ cdef class KnowledgeBase:
RETURNS (bytes): Current state as binary string.
"""
raise NotImplementedError(
- Errors.E1045.format(parent="KnowledgeBase", method="to_bytes", name=self.__name__)
+ Errors.E1045.format(
+ parent="KnowledgeBase", method="to_bytes", name=self.__name__
+ )
)
def from_bytes(self, bytes_data: bytes, *, exclude: Tuple[str] = tuple()):
@@ -85,25 +96,35 @@ cdef class KnowledgeBase:
exclude (Tuple[str]): Properties to exclude when restoring KB.
"""
raise NotImplementedError(
- Errors.E1045.format(parent="KnowledgeBase", method="from_bytes", name=self.__name__)
+ Errors.E1045.format(
+ parent="KnowledgeBase", method="from_bytes", name=self.__name__
+ )
)
- def to_disk(self, path: Union[str, Path], exclude: Iterable[str] = SimpleFrozenList()) -> None:
+ def to_disk(
+ self, path: Union[str, Path], exclude: Iterable[str] = SimpleFrozenList()
+ ) -> None:
"""
Write KnowledgeBase content to disk.
path (Union[str, Path]): Target file path.
exclude (Iterable[str]): List of components to exclude.
"""
raise NotImplementedError(
- Errors.E1045.format(parent="KnowledgeBase", method="to_disk", name=self.__name__)
+ Errors.E1045.format(
+ parent="KnowledgeBase", method="to_disk", name=self.__name__
+ )
)
- def from_disk(self, path: Union[str, Path], exclude: Iterable[str] = SimpleFrozenList()) -> None:
+ def from_disk(
+ self, path: Union[str, Path], exclude: Iterable[str] = SimpleFrozenList()
+ ) -> None:
"""
Load KnowledgeBase content from disk.
path (Union[str, Path]): Target file path.
exclude (Iterable[str]): List of components to exclude.
"""
raise NotImplementedError(
- Errors.E1045.format(parent="KnowledgeBase", method="from_disk", name=self.__name__)
+ Errors.E1045.format(
+ parent="KnowledgeBase", method="from_disk", name=self.__name__
+ )
)
diff --git a/spacy/kb/kb_in_memory.pxd b/spacy/kb/kb_in_memory.pxd
index 08ec6b2a3..e0e33301a 100644
--- a/spacy/kb/kb_in_memory.pxd
+++ b/spacy/kb/kb_in_memory.pxd
@@ -55,23 +55,28 @@ cdef class InMemoryLookupKB(KnowledgeBase):
# optional data, we can let users configure a DB as the backend for this.
cdef object _features_table
-
cdef inline int64_t c_add_vector(self, vector[float] entity_vector) nogil:
"""Add an entity vector to the vectors table."""
cdef int64_t new_index = self._vectors_table.size()
self._vectors_table.push_back(entity_vector)
return new_index
-
- cdef inline int64_t c_add_entity(self, hash_t entity_hash, float freq,
- int32_t vector_index, int feats_row) nogil:
+ cdef inline int64_t c_add_entity(
+ self,
+ hash_t entity_hash,
+ float freq,
+ int32_t vector_index,
+ int feats_row
+ ) nogil:
"""Add an entry to the vector of entries.
- After calling this method, make sure to update also the _entry_index using the return value"""
+ After calling this method, make sure to update also the _entry_index
+ using the return value"""
# This is what we'll map the entity hash key to. It's where the entry will sit
# in the vector of entries, so we can get it later.
cdef int64_t new_index = self._entries.size()
- # Avoid struct initializer to enable nogil, cf https://github.com/cython/cython/issues/1642
+ # Avoid struct initializer to enable nogil, cf.
+ # https://github.com/cython/cython/issues/1642
cdef KBEntryC entry
entry.entity_hash = entity_hash
entry.vector_index = vector_index
@@ -81,11 +86,17 @@ cdef class InMemoryLookupKB(KnowledgeBase):
self._entries.push_back(entry)
return new_index
- cdef inline int64_t c_add_aliases(self, hash_t alias_hash, vector[int64_t] entry_indices, vector[float] probs) nogil:
- """Connect a mention to a list of potential entities with their prior probabilities .
- After calling this method, make sure to update also the _alias_index using the return value"""
- # This is what we'll map the alias hash key to. It's where the alias will be defined
- # in the vector of aliases.
+ cdef inline int64_t c_add_aliases(
+ self,
+ hash_t alias_hash,
+ vector[int64_t] entry_indices,
+ vector[float] probs
+ ) nogil:
+ """Connect a mention to a list of potential entities with their prior
+ probabilities. After calling this method, make sure to update also the
+ _alias_index using the return value"""
+ # This is what we'll map the alias hash key to. It's where the alias will be
+ # defined in the vector of aliases.
cdef int64_t new_index = self._aliases_table.size()
# Avoid struct initializer to enable nogil
@@ -98,8 +109,9 @@ cdef class InMemoryLookupKB(KnowledgeBase):
cdef inline void _create_empty_vectors(self, hash_t dummy_hash) nogil:
"""
- Initializing the vectors and making sure the first element of each vector is a dummy,
- because the PreshMap maps pointing to indices in these vectors can not contain 0 as value
+ Initializing the vectors and making sure the first element of each vector is a
+ dummy, because the PreshMap maps pointing to indices in these vectors can not
+ contain 0 as value.
cf. https://github.com/explosion/preshed/issues/17
"""
cdef int32_t dummy_value = 0
@@ -130,12 +142,18 @@ cdef class InMemoryLookupKB(KnowledgeBase):
cdef class Writer:
cdef FILE* _fp
- cdef int write_header(self, int64_t nr_entries, int64_t entity_vector_length) except -1
+ cdef int write_header(
+ self, int64_t nr_entries, int64_t entity_vector_length
+ ) except -1
cdef int write_vector_element(self, float element) except -1
- cdef int write_entry(self, hash_t entry_hash, float entry_freq, int32_t vector_index) except -1
+ cdef int write_entry(
+ self, hash_t entry_hash, float entry_freq, int32_t vector_index
+ ) except -1
cdef int write_alias_length(self, int64_t alias_length) except -1
- cdef int write_alias_header(self, hash_t alias_hash, int64_t candidate_length) except -1
+ cdef int write_alias_header(
+ self, hash_t alias_hash, int64_t candidate_length
+ ) except -1
cdef int write_alias(self, int64_t entry_index, float prob) except -1
cdef int _write(self, void* value, size_t size) except -1
@@ -143,12 +161,18 @@ cdef class Writer:
cdef class Reader:
cdef FILE* _fp
- cdef int read_header(self, int64_t* nr_entries, int64_t* entity_vector_length) except -1
+ cdef int read_header(
+ self, int64_t* nr_entries, int64_t* entity_vector_length
+ ) except -1
cdef int read_vector_element(self, float* element) except -1
- cdef int read_entry(self, hash_t* entity_hash, float* freq, int32_t* vector_index) except -1
+ cdef int read_entry(
+ self, hash_t* entity_hash, float* freq, int32_t* vector_index
+ ) except -1
cdef int read_alias_length(self, int64_t* alias_length) except -1
- cdef int read_alias_header(self, hash_t* alias_hash, int64_t* candidate_length) except -1
+ cdef int read_alias_header(
+ self, hash_t* alias_hash, int64_t* candidate_length
+ ) except -1
cdef int read_alias(self, int64_t* entry_index, float* prob) except -1
cdef int _read(self, void* value, size_t size) except -1
diff --git a/spacy/kb/kb_in_memory.pyx b/spacy/kb/kb_in_memory.pyx
index e991f7720..02773cbae 100644
--- a/spacy/kb/kb_in_memory.pyx
+++ b/spacy/kb/kb_in_memory.pyx
@@ -1,5 +1,5 @@
# cython: infer_types=True, profile=True
-from typing import Any, Callable, Dict, Iterable, Union
+from typing import Any, Callable, Dict, Iterable
import srsly
@@ -27,8 +27,9 @@ from .candidate import Candidate as Candidate
cdef class InMemoryLookupKB(KnowledgeBase):
- """An `InMemoryLookupKB` instance stores unique identifiers for entities and their textual aliases,
- to support entity linking of named entities to real-world concepts.
+ """An `InMemoryLookupKB` instance stores unique identifiers for entities
+ and their textual aliases, to support entity linking of named entities to
+ real-world concepts.
DOCS: https://spacy.io/api/inmemorylookupkb
"""
@@ -71,7 +72,8 @@ cdef class InMemoryLookupKB(KnowledgeBase):
def add_entity(self, str entity, float freq, vector[float] entity_vector):
"""
- Add an entity to the KB, optionally specifying its log probability based on corpus frequency
+ Add an entity to the KB, optionally specifying its log probability
+ based on corpus frequency.
Return the hash of the entity ID/name at the end.
"""
cdef hash_t entity_hash = self.vocab.strings.add(entity)
@@ -83,14 +85,20 @@ cdef class InMemoryLookupKB(KnowledgeBase):
# Raise an error if the provided entity vector is not of the correct length
if len(entity_vector) != self.entity_vector_length:
- raise ValueError(Errors.E141.format(found=len(entity_vector), required=self.entity_vector_length))
+ raise ValueError(
+ Errors.E141.format(
+ found=len(entity_vector), required=self.entity_vector_length
+ )
+ )
vector_index = self.c_add_vector(entity_vector=entity_vector)
- new_index = self.c_add_entity(entity_hash=entity_hash,
- freq=freq,
- vector_index=vector_index,
- feats_row=-1) # Features table currently not implemented
+ new_index = self.c_add_entity(
+ entity_hash=entity_hash,
+ freq=freq,
+ vector_index=vector_index,
+ feats_row=-1
+ ) # Features table currently not implemented
self._entry_index[entity_hash] = new_index
return entity_hash
@@ -115,7 +123,12 @@ cdef class InMemoryLookupKB(KnowledgeBase):
else:
entity_vector = vector_list[i]
if len(entity_vector) != self.entity_vector_length:
- raise ValueError(Errors.E141.format(found=len(entity_vector), required=self.entity_vector_length))
+ raise ValueError(
+ Errors.E141.format(
+ found=len(entity_vector),
+ required=self.entity_vector_length
+ )
+ )
entry.entity_hash = entity_hash
entry.freq = freq_list[i]
@@ -149,11 +162,15 @@ cdef class InMemoryLookupKB(KnowledgeBase):
previous_alias_nr = self.get_size_aliases()
# Throw an error if the length of entities and probabilities are not the same
if not len(entities) == len(probabilities):
- raise ValueError(Errors.E132.format(alias=alias,
- entities_length=len(entities),
- probabilities_length=len(probabilities)))
+ raise ValueError(
+ Errors.E132.format(
+ alias=alias,
+ entities_length=len(entities),
+ probabilities_length=len(probabilities))
+ )
- # Throw an error if the probabilities sum up to more than 1 (allow for some rounding errors)
+ # Throw an error if the probabilities sum up to more than 1 (allow for
+ # some rounding errors)
prob_sum = sum(probabilities)
if prob_sum > 1.00001:
raise ValueError(Errors.E133.format(alias=alias, sum=prob_sum))
@@ -170,40 +187,47 @@ cdef class InMemoryLookupKB(KnowledgeBase):
for entity, prob in zip(entities, probabilities):
entity_hash = self.vocab.strings[entity]
- if not entity_hash in self._entry_index:
+ if entity_hash not in self._entry_index:
raise ValueError(Errors.E134.format(entity=entity))
entry_index =
-
-
- - Get a custom spaCy pipeline, tailor-made for your NLP problem by - spaCy's core developers. - -
-
- spaCy v3.0 features all new transformer-based pipelines{' '}
- that bring spaCy's accuracy right up to the current{' '}
- state-of-the-art. You can use any pretrained transformer to
- train your own pipelines, and even share one transformer between multiple
- components with multi-task learning. Training is now fully
- configurable and extensible, and you can define your own custom models using{' '}
- PyTorch, TensorFlow and other frameworks.
+
+
+ + Get a custom spaCy pipeline, tailor-made for your NLP problem by + spaCy's core developers. + +
+
-
+