spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-23 23:51:09 +03:00

History

Ines Montani ad2a514cdf Show warning if phrase pattern Doc was overprocessed (#3255 ) In most cases, the PhraseMatcher will match on the verbatim token text or as of v2.1, sometimes the lowercase text. This means that we only need a tokenized Doc, without any other attributes. If phrase patterns are created by processing large terminology lists with the full `nlp` object, this easily can make things a lot slower, because all components will be applied, even if we don't actually need the attributes they set (like part-of-speech tags, dependency labels). The warning message also includes a suggestion to use nlp.make_doc or nlp.tokenizer.pipe for even faster processing. For now, the validation has to be enabled explicitly by setting validate=True.		2019-02-13 01:45:31 +11:00
..
cli	Tidy up and fix small bugs and typos	2019-02-08 14:14:49 +01:00
data	Make spacy/data a package	2017-03-18 20:04:22 +01:00
displacy	Tidy up and fix small bugs and typos	2019-02-08 14:14:49 +01:00
lang	Tidy up and fix small bugs and typos	2019-02-08 14:14:49 +01:00
matcher	Show warning if phrase pattern Doc was overprocessed (#3255 )	2019-02-13 01:45:31 +11:00
pipeline	💫 Break up large pipeline.pyx (#3246 )	2019-02-10 12:14:51 +01:00
syntax	💫 Prevent parser from predicting unseen classes (#3075 )	2018-12-20 16:12:22 +01:00
tests	Show warning if phrase pattern Doc was overprocessed (#3255 )	2019-02-13 01:45:31 +11:00
tokens	Only run noun chunks iterator in Span if available (closes #3199 )	2019-02-08 18:33:16 +01:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Tidy up and format remaining files	2018-11-30 17:43:08 +01:00
__main__.py	💫 New JSON helpers, training data internals & CLI rewrite (#2932 )	2018-11-30 20:16:14 +01:00
_align.pyx	Improve alignment around quotes	2018-08-16 01:04:34 +02:00
_ml.py	💫 Better support for semi-supervised learning (#3035 )	2018-12-10 16:25:33 +01:00
about.py	Set version to v2.1.0a7.dev1	2019-02-08 01:54:01 +11:00
attrs.pxd	Fix LANG symbol	2018-02-17 18:10:50 +01:00
attrs.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
compat.py	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 )	2018-12-03 01:28:22 +01:00
errors.py	Show warning if phrase pattern Doc was overprocessed (#3255 )	2019-02-13 01:45:31 +11:00
glossary.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
gold.pxd	Add support for sent_start to GoldParse	2017-08-25 20:03:14 -05:00
gold.pyx	Add gold.spans_from_biluo_tags helper (#3227 )	2019-02-06 21:50:26 +11:00
language.py	Improve entry points and allow custom language classes via entry points (#3080 )	2018-12-20 23:58:43 +01:00
lemmatizer.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
lexeme.pxd	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
lexeme.pyx	💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197 )	2018-05-21 01:22:38 +02:00
morphology.pxd	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
morphology.pyx	Fix lemmatization	2018-07-05 13:56:02 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
scorer.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
strings.pxd	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
strings.pyx	Add get_string_id helper to spacy.strings	2018-12-10 16:09:26 +01:00
structs.pxd	Make NORM a token attribute (#3029 )	2018-12-08 10:49:10 +01:00
symbols.pxd	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
symbols.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
tokenizer.pxd	Disable tokenizer cache for special-cases. Fixes #1250	2017-10-24 16:08:05 +02:00
tokenizer.pyx	Replacing regex library with re to increase tokenization speed (#3218 )	2019-02-01 18:05:22 +11:00
typedefs.pxd	Work on changing StringStore to return hashes.	2017-05-28 12:36:27 +02:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Auto-format	2019-02-07 21:00:04 +01:00
vectors.pyx	Fix KeyError in Vectors.most_similar. Fixes #2648	2018-12-10 16:19:18 +01:00
vocab.pxd	💫 Small efficiency fixes to tokenizer (#2587 )	2018-07-24 23:35:54 +02:00
vocab.pyx	Prevent exceptions from setting POS but not TAG. Closes #1773	2018-12-30 13:16:05 +01:00