spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-17 09:32:42 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	07617b6b7f	Set version to v2.1.0a7.dev12	2019-02-16 17:30:29 +01:00
Matthew Honnibal	808ae7521b	Require thinc 7.0.1	2019-02-16 17:29:57 +01:00
Matthew Honnibal	1dc314bada	Set version to v2.1.0a7.dev11	2019-02-16 17:02:49 +01:00
Matthew Honnibal	eea3001b98	Depend on thinc 7.0.1.dev2	2019-02-16 17:02:30 +01:00
Matthew Honnibal	2ef227c313	Set version to v2.1.0a7.dev1	2019-02-16 16:22:46 +01:00
Matthew Honnibal	f456b673d4	Require thinc 7.0.1.dev1	2019-02-16 16:22:26 +01:00
Matthew Honnibal	22923b9cb1	Set version to v2.1.0a7.dev9	2019-02-16 15:47:19 +01:00
Matthew Honnibal	11e826ac3b	Require thinc v7.0.1.dev0	2019-02-16 15:47:02 +01:00
Matthew Honnibal	e0c91a4c8d	Set version to 2.1.0a7	2019-02-16 14:43:38 +01:00
Matthew Honnibal	92b6bd2977	Refinements to retokenize.split() function (#3282 ) * Change retokenize.split() API for heads * Pass lists as values for attrs in split * Fix test_doc_split filename * Add error for mismatched tokens after split * Raise error if new tokens don't match text * Fix doc test * Fix error * Move deps under attrs * Fix split tests * Fix retokenize.split	2019-02-15 17:32:31 +01:00
Matthew Honnibal	2dbc61bc26	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2019-02-15 14:03:54 +01:00
Ines Montani	1aa57690dc	Add xfailing test for orth mismatch in retokenizer.split	2019-02-15 13:55:04 +01:00
Ines Montani	819768483f	Add xfailing test for out-of-bounds heads	2019-02-15 13:09:07 +01:00
Ines Montani	d8051e89ca	Tidy up tests	2019-02-15 12:56:51 +01:00
Matthew Honnibal	58aac58631	Set version to v2.1.0a7.dev8	2019-02-15 12:39:26 +01:00
Matthew Honnibal	4c49f5f7b0	Update Thinc dependency	2019-02-15 12:39:08 +01:00
Matthew Honnibal	5f1abe2cc7	Set version to v2.1.0a7.dev7	2019-02-15 10:30:53 +01:00
Matthew Honnibal	a66e8e0c8a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2019-02-15 10:30:22 +01:00
Ines Montani	c31a9dabd5	💫 Add en/em dash to prefixes and suffixes (#3281 ) * Auto-format * Add en/em dash to prefixes and suffixes	2019-02-15 10:29:59 +01:00
Ines Montani	5651a0d052	💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280 ) * Add deprecation warning to Doc.merge and Span.merge * Replace {Doc,Span}.merge with Doc.retokenize	2019-02-15 10:29:44 +01:00
Matthew Honnibal	dcf79c5ef3	Set version to v2.1.0a7.dev6	2019-02-14 20:12:02 +01:00
Matthew Honnibal	0371ac23e7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2019-02-14 20:09:10 +01:00
Ines Montani	f146121092	💫 Make handling of [Pipe].labels consistent (#3273 ) * Make handling of [Pipe].labels consistent * Un-xfail passing test * Update spacy/pipeline/pipes.pyx Co-Authored-By: ines <ines@ines.io> * Update spacy/pipeline/pipes.pyx Co-Authored-By: ines <ines@ines.io> * Update spacy/tests/pipeline/test_pipe_methods.py Co-Authored-By: ines <ines@ines.io> * Update spacy/pipeline/pipes.pyx Co-Authored-By: ines <ines@ines.io> * Move error message to spacy.errors * Fix textcat labels and test * Make EntityRuler.labels return tuple as well	2019-02-15 06:03:19 +11:00
Ines Montani	3d577b77c6	Auto-formatting	2019-02-14 19:56:38 +01:00
Ines Montani	2569339a98	Formatting and whitespace [ci skip]	2019-02-14 18:05:07 +01:00
Matthew Honnibal	aebf71bc72	Set version to v2.1.0a7.dev5	2019-02-14 15:51:42 +01:00
Matthew Honnibal	6ccd67c682	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2019-02-14 15:51:12 +01:00
Ines Montani	e104e47c21	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2019-02-14 15:35:34 +01:00
Ines Montani	0cd01a8c5e	Merge branch 'master' into develop	2019-02-14 15:35:20 +01:00
Ines Montani	2e31921d0a	💫 Add base Language classes for more languages (#3276 ) * Add base classes for more languages * Add test for language class initialization Make sure language can be initialize – otherwise, it's difficult to catch serious errors in the test suite, because languages are lazy-loaded	2019-02-15 01:31:19 +11:00
Grivaz	39815513e2	Add split one token into several (resolves #2838 ) (#3253 ) * Add split one token into several (resolves #2838) * Improve error message for token splitting * Make retokenizer.split() tests use a Token object Change retokenizer.split() to use a Token object, instead of an index. * Pass Token into retokenize.split() Tweak retokenize.split() API so that we pass the `Token` object, not the index. * Fix token.idx in retokenize.split() * Test that token.idx is correct after split * Fix token.idx for split tokens * Fix retokenize.split() * Fix retokenize.split * Fix retokenize.split() test	2019-02-15 01:27:13 +11:00
Ines Montani	743ecf728c	Tidy up conftest	2019-02-14 13:27:13 +01:00
Ines Montani	106d95b01a	Fix typo	2019-02-14 12:26:56 +01:00
Ines Montani	11d6b874db	Update stop_words.py	2019-02-14 12:25:19 +01:00
Ines Montani	60c2a3bb65	Also raise original error message in util.get_lang_class Otherwise, the true error that happens within a Language subclass is swallowed, because if it's imported lazily like that, it'll always be an ImportError	2019-02-13 16:52:25 +01:00
Ines Montani	4d2438f985	Tidy up and auto-format	2019-02-13 15:29:08 +01:00
Ines Montani	fbf9f1edf1	Also raise error in Span.__reduce__	2019-02-13 13:22:05 +01:00
Matthew Honnibal	1831e1423d	Set version to v2.1.0a7.dev4	2019-02-13 23:08:40 +11:00
Matthew Honnibal	bed956c698	Drop regex dependency	2019-02-13 23:08:22 +11:00
Matthew Honnibal	63dc4234a3	Set version to v2.1.0a7.dev3	2019-02-13 22:53:10 +11:00
Matthew Honnibal	b7ea39564f	Set version to v2.1.0a7.dev2	2019-02-13 22:52:43 +11:00
Ines Montani	2d0c3c73f4	Raise better error if token is pickled (resolves #2833 ) (#3267 )	2019-02-13 11:27:04 +01:00
Ines Montani	2f45bd94c0	Auto-formatting	2019-02-12 18:30:11 +01:00
Ines Montani	0184a95340	Merge branch 'master' into develop	2019-02-12 18:29:24 +01:00
Akhilesh	a78db10941	add kannada support (#3264 ) * add kannada support * add few more stop words * add support for Kannada Language	2019-02-12 18:28:39 +01:00
Ines Montani	b589b945db	Fix PhraseMatcher pickling and length (resolves #3248 ) (#3252 )	2019-02-12 18:27:54 +01:00
Ines Montani	5dd39d8697	Update universe.json	2019-02-12 18:05:51 +01:00
Abhijit Balaji	75a40f56fc	added spacy-langdetect to universe.json (#3266 )	2019-02-12 18:04:38 +01:00
Ines Montani	483dddc9bc	💫 Add token match pattern validation via JSON schemas (#3244 ) * Add custom MatchPatternError * Improve validators and add validation option to Matcher * Adjust formatting * Never validate in Matcher within PhraseMatcher If we do decide to make validate default to True, the PhraseMatcher's Matcher shouldn't ever validate. Here, we create the patterns automatically anyways (and it's currently unclear whether the validation has performance impacts at a very large scale).	2019-02-13 01:47:26 +11:00
Ines Montani	ad2a514cdf	Show warning if phrase pattern Doc was overprocessed (#3255 ) In most cases, the PhraseMatcher will match on the verbatim token text or as of v2.1, sometimes the lowercase text. This means that we only need a tokenized Doc, without any other attributes. If phrase patterns are created by processing large terminology lists with the full `nlp` object, this easily can make things a lot slower, because all components will be applied, even if we don't actually need the attributes they set (like part-of-speech tags, dependency labels). The warning message also includes a suggestion to use nlp.make_doc or nlp.tokenizer.pipe for even faster processing. For now, the validation has to be enabled explicitly by setting validate=True.	2019-02-13 01:45:31 +11:00

... 10 11 12 13 14 ...

10042 Commits