spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-15 14:17:58 +03:00

Author	SHA1	Message	Date
Matthw Honnibal	ee56c6a4e1	Implement character-based pretraining objective	2019-10-19 11:42:38 +02:00
Matthw Honnibal	36de9bf72a	Add more spacy pretrain options	2019-10-18 17:24:13 +02:00
Matthw Honnibal	f3e2aaea1e	Fix GPU selection in spacy train	2019-10-18 17:23:55 +02:00
Matthw Honnibal	49c0adc706	Add character-based bilstm tok2vec	2019-10-18 17:23:37 +02:00
Matthw Honnibal	727ede6599	Make character copy non-blocking	2019-10-18 17:23:19 +02:00
Matthw Honnibal	4da1c1c211	Try to make cuda call non-blocking	2019-10-18 17:22:16 +02:00
Matthw Honnibal	b2e8f37965	Make cuda streams non-blocking by default	2019-10-18 17:21:57 +02:00
Matthw Honnibal	ca0759b325	Pass config better in nn_parser	2019-10-17 21:10:56 +02:00
Matthw Honnibal	e737750a02	Fix bilstm_depth default in pretrain command	2019-10-17 21:10:08 +02:00
Matthw Honnibal	6aa1c53b1b	Call resume_training for base model in train CLI	2019-10-17 21:09:41 +02:00
Matthw Honnibal	3f26c50a4d	Refactor some of tok2vec	2019-10-17 17:58:00 +02:00
Matthw Honnibal	e63f28079a	Try 3 NER features	2019-10-07 16:51:03 +02:00
Matthw Honnibal	2d55ccdd27	Support option of three NER features	2019-10-07 16:50:44 +02:00
Matthw Honnibal	c8857181f8	Fix get labels for textcat	2019-10-07 16:50:15 +02:00
Matthw Honnibal	a6a2ff217f	Fix char_embed for gpu	2019-10-07 16:49:32 +02:00
Matthw Honnibal	f4040a98f0	Fix passing of cats in gold.pyx	2019-10-07 16:49:00 +02:00
Matthw Honnibal	a132da1558	Fix gold-preproc training mode	2019-10-07 02:07:03 +02:00
Matthw Honnibal	63ff233ba2	Enable GPU in pytorch n use_gpu functon	2019-10-06 19:24:21 +02:00
Matthw Honnibal	9dbaea1ab4	Use cosine loss in Cloze multitask	2019-10-06 19:23:46 +02:00
Matthw Honnibal	157d3d769b	Support bilstm_depth arg in spacy pretrain	2019-10-06 19:22:26 +02:00
Matthw Honnibal	615ebe584f	Add option to ignore zero vectors in get_cossim_loss	2019-10-06 19:20:54 +02:00
adrianeboyd	cbc2cee2c8	Improve URL_PATTERN and handling in tokenizer (#4374 ) * Move prefix and suffix detection for URL_PATTERN Move prefix and suffix detection for `URL_PATTERN` into the tokenizer. Remove associated lookahead and lookbehind from `URL_PATTERN`. Fix tokenization for Hungarian given new modified handling of prefixes and suffixes. * Match a wider range of URI schemes	2019-10-05 13:00:09 +02:00
Ines Montani	fec9433044	Make PhraseMatcher.vocab consistent with Matcher.vocab (closes #4373 )	2019-10-04 12:18:41 +02:00
Matthew Honnibal	37ef874d8b	Set version to v2.2.1	2019-10-03 14:50:39 +02:00
Sofie Van Landeghem	4e7259c6cf	Bugfix initializing DocBin with attributes (#4368 ) * docbin init fix + documentation fix + unit tests * newline * try with zlib instead of gzip (python 2 incompatibilities)	2019-10-03 14:48:45 +02:00
Ben Taylor	1db79a33cb	most_similar() return the k most similar vectors (#4364 ) * most_similar return n-most similar vectors * updated most_similar comment * add bintay contributor agreement * sign bintay contributor agreement * fix most_similar documentation typo * fixed error in prune_vectors * updated prune_vectors test	2019-10-03 14:09:44 +02:00
Matthew Honnibal	2eb31012e7	Set version to v2.2.0	2019-10-02 14:40:06 +02:00
Matthew Honnibal	796072e560	Set version to v2.2.0.dev19	2019-10-02 12:51:29 +02:00
Sofie Van Landeghem	9d3ce7cba2	Ensure training doesn't crash with empty batches (#4360 ) * unit test for previously resolved unflatten issue * prevent batch of empty docs to cause problems	2019-10-02 12:50:47 +02:00
adrianeboyd	dda86118bd	Update Ukrainian lemmatizer with new lookups (#4359 ) * Update Ukrainian lemmatizer with new lookups * Add missing import Co-authored-by: Ines Montani <ines@ines.io>	2019-10-02 12:04:06 +02:00
Ines Montani	b6670bf0c2	Use consistent spelling	2019-10-02 10:37:39 +02:00
Matthew Honnibal	38b6e69389	Merge branch 'master' of https://github.com/explosion/spaCy	2019-10-01 22:28:25 +02:00
Matthew Honnibal	d4b63bb6dd	Set version to v2.2.0	2019-10-01 22:28:13 +02:00
Ines Montani	475e3188ce	Add docs on filtering overlapping spans for merging (resolves #4352 ) [ci skip]	2019-10-01 21:59:50 +02:00
Matthew Honnibal	64a9577d43	Set version to v2.2.0.dev17	2019-10-01 21:36:59 +02:00
Ines Montani	cf65a80f36	Refactor lemmatizer and data table integration (#4353 ) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5	2019-10-01 21:36:03 +02:00
Ines Montani	3297a19545	Warn in Tagger.begin_training if no lemma tables are available (#4351 )	2019-10-01 15:13:55 +02:00
Matthew Honnibal	2fb05482dd	Set version to v2.2.0	2019-10-01 03:50:13 +02:00
Matthew Honnibal	dc22ec0aad	Set version to v2.2.0.dev17	2019-10-01 03:26:53 +02:00
Matthew Honnibal	aedfba867a	Set version to v2.2.0.dev16	2019-10-01 00:31:00 +02:00
Ines Montani	e0cf4796a5	Move lookup tables out of the core library (#4346 ) * Add default to util.get_entry_point * Tidy up entry points * Read lookups from entry points * Remove lookup tables and related tests * Add lookups install option * Remove lemmatizer tests * Remove logic to process language data files * Update setup.cfg	2019-10-01 00:01:27 +02:00
Rahul Soni	ed620daa5c	Fix example sentences in Hindi for grammatical errors (#4343 ) * Fix grammar for hindi * Fix grammar for hindi * Submit contributor agreement	2019-09-30 23:32:49 +02:00
Ines Montani	ba186299e1	Tidy up and modernize setup and config (#4344 ) * Tidy up and modernize setup and config * Update setup.cfg * Re-add pyproject.toml * Delete .flake8 * Move static meta from about to setup.cfg * Update setup.cfg Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>	2019-09-30 20:10:55 +02:00
Ines Montani	4f905ac9e6	Add test for ASCII filenames (#4345 )	2019-09-30 18:45:30 +02:00
Matthew Honnibal	b5c775dd42	Set version to v2.2.0	2019-09-30 12:47:08 +02:00
Ines Montani	f7d1736241	Skip duplicate spans in Doc.retokenize (#4339 )	2019-09-30 12:43:48 +02:00
Ines Montani	0226b3bf0e	Fix test imports	2019-09-29 17:34:56 +02:00
Ines Montani	3d8fd4b461	Revert #4334	2019-09-29 17:32:12 +02:00
adrianeboyd	ba5595c764	Fix PhraseMatcher to remember attr on pickling (#4336 ) * Fix PhraseMatcher to remember attr on pickling * Check for attr as int or long	2019-09-29 17:12:33 +02:00
Ines Montani	75514b5970	Fix Korean	2019-09-29 17:10:56 +02:00

1 2 3 4 5 ...

6473 Commits