spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-23 04:26:46 +03:00

Author	SHA1	Message	Date
Adriane Boyd	09d442f5ad	Merge remote-tracking branch 'upstream/master' into feature/ud-tokenization-da	2020-03-25 09:41:52 +01:00
Adriane Boyd	cba2d1d972	Disable failing abbreviation test UD_Danish-DDT has (as far as I can tell) hallucinated periods after abbreviations, so the changes are an artifact of the corpus and not due to anything meaningful about Danish tokenization.	2020-03-25 09:39:26 +01:00
Adriane Boyd	79737adb90	Improved tokenization for UD_Norwegian-Bokmaal	2020-03-25 08:54:02 +01:00
Ines Montani	5f2afa0479	Merge pull request #5185 from adrianeboyd/bugfix/de-punctuation-style Improve German tokenizer settings style	2020-03-24 16:38:32 +01:00
Ines Montani	3fc2309c48	Merge pull request #5174 from Baciccin/master Add Ligurian language	2020-03-24 16:33:59 +01:00
Ines Montani	f434d6aaa9	Merge pull request #5190 from guerda/patch-1 Remove max_length parameter in PhraseMatcher example	2020-03-24 16:32:12 +01:00
Philip Gillißen	128acb9ee1	Update guerda.md	2020-03-24 10:42:30 +01:00
Philip Gillißen	5d067bcc5e	Add SCA for guerda	2020-03-24 10:42:10 +01:00
Philip Gillißen	f8b4407a29	Remove max_length parameter The parameter max_length is deprecated in PhraseMatcher, as stated here: https://spacy.io/api/phrasematcher#init	2020-03-24 10:22:12 +01:00
Ines Montani	494ec23adb	Merge pull request #5187 from adrianeboyd/update/azure-images Update from macOS-10.13 to macOS-10.14	2020-03-23 20:47:49 +01:00
Adriane Boyd	30d862d4d8	Update from macOS-10.13 to macOS-10.14	2020-03-23 19:52:57 +01:00
Adriane Boyd	2897a73559	Improve German tokenizer settings style	2020-03-23 19:23:47 +01:00
nlptechbook	b52e1ab677	Update universe.json A bot powered by Clarifai Predict API and spaCy. Can be found in Telegram messenger at @pic2phrase_bot	2020-03-21 11:39:15 -04:00
Baciccin	3b53617a69	Add Ligurian language	2020-03-19 21:37:01 -07:00
Ines Montani	80e7e1347e	Update universe.json [ci skip]	2020-03-17 22:21:34 +01:00
Ines Montani	eda6eff8b1	Update universe.json [ci skip]	2020-03-17 22:19:29 +01:00
Ines Montani	16e7301d34	Merge pull request #5161 from pmbaumgartner/master add gobbli to spacy-universe 🥳	2020-03-17 22:18:30 +01:00
Peter B	b04057c204	add mentions of spaCy use	2020-03-17 15:03:43 -04:00
Ines Montani	b2b01a5c8b	Update universe.json [ci skip]	2020-03-17 19:53:31 +01:00
Peter B	d2ffb406ad	add gobbli to spacy-universe 🥳	2020-03-17 08:30:29 -04:00
Ines Montani	17bd9ed84f	Merge pull request #5153 from pinealan/fix/website-docs Fix website typos and weird sentences	2020-03-16 15:03:01 +01:00
Ines Montani	2044216bd5	Merge pull request #5150 from sloev/master add spacy_syllables to universe	2020-03-16 15:02:12 +01:00
Ines Montani	fb74679559	Merge pull request #5147 from mabraham/master Fix broken link in docs	2020-03-16 14:59:52 +01:00
Ines Montani	c68f20b398	Merge pull request #5146 from adrianeboyd/bugfix/assert-docs-equal-sents Fix sents comparison in test util	2020-03-16 14:59:32 +01:00
Alan Chan	1ae01684cf	Fill in contributor agreement	2020-03-15 03:45:20 +08:00
Alan Chan	2124be100d	Tweak run-on sentence	2020-03-15 03:45:20 +08:00
Alan Chan	7c3a4ce933	Missing word in api/cli doc	2020-03-15 03:45:20 +08:00
Alan Chan	36e3532475	Remove unfinished sentence	2020-03-15 03:45:17 +08:00
nihil	9cde7eb08c	add spacy_syllables to universe + sign contributor agreement	2020-03-13 18:09:42 +01:00
Mark Abraham	a0ffa346c0	Fix broken link in docs	2020-03-13 14:07:26 +01:00
Adriane Boyd	423849f94a	Fix sents comparison in test util Due to changes to `Span` (#5005), spans from different documents are now never equal. Check `Token.is_sent_start` values instead.	2020-03-13 09:25:23 +01:00
Matthew Honnibal	26a90f011b	Set version to v2.2.4	2020-03-12 11:30:41 +01:00
Ines Montani	c669435c62	Merge pull request #5125 from renaud/patch-1 small typo in code sample	2020-03-12 11:19:12 +01:00
Ines Montani	4130fef4ec	Merge pull request #5127 from svlandeg/docs/empty-doc is_XXX is True if doc is empty	2020-03-12 11:18:10 +01:00
Ines Montani	3497b2973d	Merge pull request #5130 from merrcury/patch-1 DOC : Update LICENSE Year	2020-03-12 11:17:38 +01:00
Himanshu Garg	27d1300bdb	Create merrcury.md	2020-03-10 15:11:07 +05:30
Himanshu Garg	ba47d5a5cb	Update LICENSE Year	2020-03-10 15:03:29 +05:30
svlandeg	c4d030dbf6	remove accidental commit	2020-03-09 18:10:54 +01:00
svlandeg	1724a4f75b	additional information if doc is empty	2020-03-09 18:08:18 +01:00
Renaud Richardet	eccf6b1686	small typo in code sample	2020-03-09 14:49:11 +01:00
Adriane Boyd	0c31f03ec5	Update docs [ci skip]	2020-03-09 13:41:17 +01:00
Adriane Boyd	1139247532	Revert changes to token_match priority from #4374 * Revert changes to priority of `token_match` so that it has priority over all other tokenizer patterns * Add lookahead and potentially slow lookbehind back to the default URL pattern * Expand character classes in URL pattern to improve matching around lookaheads and lookbehinds related to #4882 * Revert changes to Hungarian tokenizer * Revert (xfail) several URL tests to their status before #4374 * Update `tokenizer.explain()` and docs accordingly	2020-03-09 12:09:41 +01:00
Ines Montani	1d6aec805d	Fix formatting and update docs for v2.2.4	2020-03-09 11:17:20 +01:00
Ines Montani	5f68004264	Port over gitignore changes from develop Prevents stale files when switching branches	2020-03-09 11:05:00 +01:00
Mark Abraham	0345135167	Tokenizer to_disk and from_disk now ensure paths (#5116 ) * Tokenizer to_disk and from_disk now ensure strings are converted to paths Fixes #5115 * Sign contributor agreement	2020-03-08 13:25:56 +01:00
Yohei Tamura	31755630a7	fix typ (#5106 )	2020-03-08 13:24:38 +01:00
adrianeboyd	9dd98a4b27	Improve Makefile (#5105 ) * Explicitly upgrade pip * Include spacy-lookups-data in pex	2020-03-08 13:24:19 +01:00
adrianeboyd	993758c58f	Remove unnecessary iterator in Language.pipe (#5101 ) Remove iterator over `raw_texts` with `iterator.tee()` in `Language.pipe` that is never consumed and consumes memory unnecessarily.	2020-03-08 13:22:25 +01:00
Ines Montani	cd79c7bd26	Merge pull request #5110 from dhpollack/dhp/fix-minor-svg-error fix typo in svg file - caused documentation build error	2020-03-06 15:32:43 +01:00
Sofie Van Landeghem	1a2b8fc264	set vector of merged entity (#5085 ) * merge_entities sets the vector in the vocab for the merged token * add unit test * import unicode_literals * move code to _merge function * only set vector if vocab has non-zero vectors	2020-03-06 14:45:28 +01:00

... 4 5 6 7 8 ...

11518 Commits