spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-17 19:52:18 +03:00

Author	SHA1	Message	Date
Ines Montani	0729d1edca	Fix formatting	2018-09-12 15:32:08 +02:00
Ines Montani	907df53904	Add multi-threading note to Language.pipe (resolves #2582 ) [ci skip]	2018-09-12 15:03:30 +02:00
Ines Montani	885691a7ab	Describe converters more explicitly (see #2643 )	2018-09-12 14:53:03 +02:00
Grivaz	aeba99ab0d	Introduces a bulk merge function, in order to solve issue #653 (#2696 ) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions	2018-09-10 16:41:42 +02:00
tyburam	476472d181	Lex _attrs for polish language (#2750 ) * Signed spaCy contributor agreement * Added polish version of english lex_attrs	2018-09-10 11:53:57 +02:00
Sainath Adapa	77139bc03c	Basic support for Telugu language (#2751 )	2018-09-10 11:53:18 +02:00
Maxim Kupfer	97e2874225	added contributor agreement for mbkupfer (#2738 )	2018-09-10 11:32:03 +02:00
Maxim Kupfer	cebe50b5b8	Remove ')' for clarity (#2737 ) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.	2018-09-10 11:31:49 +02:00
Matthew Honnibal	b2cb1fc67d	Merge matcher tests	2018-09-06 01:39:53 +02:00
Suraj Krishnan Rajan	356af7b0a1	Fix tests	2018-09-06 01:39:36 +02:00
Piotr Żelasko	bdb2165bd1	Less norm computations in token similarity (#2730 ) * Less norm computations in token similarity * Contributor agreement	2018-09-05 21:50:23 +02:00
Aniruddha Adhikary	4530ddcc51	update bengali token rules for hyphen and digits (#2731 )	2018-09-05 21:49:00 +02:00
Nathaniel J. Smith	26849874ad	When calling getoption() in conftest.py, pass a default option (#2709 ) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement	2018-09-03 09:57:52 +02:00
Matthew Honnibal	4d2d7d5866	Fix new feature flags	2018-08-27 02:12:39 +02:00
Matthew Honnibal	598dbf1ce0	Fix character-based tokenization for Japanese	2018-08-27 01:51:38 +02:00
Matthew Honnibal	3763e20afc	Pass subword_features and conv_depth params	2018-08-27 01:51:15 +02:00
Matthew Honnibal	8051136d70	Support subword_features and conv_depth params in Tok2Vec	2018-08-27 01:50:48 +02:00
Matthew Honnibal	9c33d4d1df	Add more hyper-parameters to spacy ud-train * subword_features: Controls whether subword features are used in the word embeddings. True by default (specifically, prefix, suffix and word shape). Should be set to False for languages like Chinese and Japanese. * conv_depth: Depth of the convolutional layers. Defaults to 4.	2018-08-27 01:48:46 +02:00
Ines Montani	e9022f7b33	Remove docstrings for deprecated arguments (see #2703 )	2018-08-26 14:23:13 +02:00
Ines Montani	559f4139e3	Add FAC to spacy.explain (resolves #2706 )	2018-08-26 14:13:50 +02:00
Steve Sharp	ca747f58a4	Update _install.jade (#2688 ) Typo fix: "models" -> "model"	2018-08-22 13:16:04 +02:00
Matthew Honnibal	51a9efbf3b	Add draft Binder class	2018-08-22 13:12:51 +02:00
Arya Prabhudesai	db2c2b286c	Create aryaprabhudesai.md (#2681 )	2018-08-20 18:56:14 +02:00
Matthew Honnibal	f0e6be689a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-08-16 17:18:19 +02:00
Matthew Honnibal	5ce459d2ee	Fix error in vocab	2018-08-16 17:18:09 +02:00
Ines Montani	aeb49eb625	Update version [ci skip]	2018-08-16 16:56:02 +02:00
Ines Montani	a0eacd3293	Merge branch 'master' into develop	2018-08-16 16:55:05 +02:00
Ines Montani	c0fa9903f4	Update model directory JS [ci skip] Prevent the default release URL from being overwritten and add license type	2018-08-16 16:54:50 +02:00
Ines Montani	03f661fefb	Add Greek to models directory [ci skip]	2018-08-16 16:51:56 +02:00
Matthew Honnibal	00febda2e3	Improve alignment around quotes	2018-08-16 01:04:34 +02:00
Matthew Honnibal	66a3f2ba21	Lower-case text before alignment	2018-08-16 00:42:36 +02:00
Matthew Honnibal	595c893791	Expose noise_level option in train CLI	2018-08-16 00:41:44 +02:00
Matthew Honnibal	8365226bf3	Fix lookup of symbols in vocab.	2018-08-15 23:43:34 +02:00
Matthew Honnibal	b9f0588580	Set version to v2.1.0a1	2018-08-15 17:22:39 +02:00
Matthew Honnibal	e968016417	Note link between issues #2671 and #2675	2018-08-15 17:18:28 +02:00
Matthew Honnibal	63bdc734ba	Skip flakey test	2018-08-15 16:56:55 +02:00
Matthew Honnibal	ce512e1d47	Fix #2671 : Incorrect match ID on some patterns	2018-08-15 16:19:08 +02:00
Matthew Honnibal	f12b9190f6	Xfail test for issue #2671	2018-08-15 15:55:31 +02:00
Matthew Honnibal	7cfa665ce6	Add failing test for issue 2671: Incorrect rule ID returned from matcher	2018-08-15 15:54:33 +02:00
Matthew Honnibal	1b2a5869ab	Set version to v2.1.0a2.dev0	2018-08-15 15:38:52 +02:00
Matthew Honnibal	5080760288	Add extra comment on 'add label' in parser	2018-08-15 15:37:24 +02:00
Matthew Honnibal	6e749d3c70	Skip flakey parser test	2018-08-15 15:37:04 +02:00
Ines Montani	fd9d175a53	Update live code [ci skip]	2018-08-15 15:28:48 +02:00
Matthew Honnibal	48ed1ca29d	Add branch option to push-tag script	2018-08-15 03:16:43 +02:00
Matthew Honnibal	6ea981c839	Add converter for jsonl NER data	2018-08-14 14:04:32 +02:00
Matthew Honnibal	a9fb6d5511	Fix docs2jsonl function	2018-08-14 14:03:48 +02:00
Matthew Honnibal	ea2edd1e2c	Merge branch 'feature/docs_to_json' into develop	2018-08-14 13:23:42 +02:00
Matthew Honnibal	6ec236ab08	Fix label-clobber bug in parser.begin_training() The parser.begin_training() method was rewritten in v2.1. The rewrite introduced a regression, where if you added labels prior to begin_training(), these labels were discarded. This patch fixes that.	2018-08-14 13:20:19 +02:00
Matthew Honnibal	02c5c114d0	Fix usage of deprecated freqs.txt in init-model	2018-08-14 13:19:15 +02:00
Matthew Honnibal	2a5a61683e	Add function to get train format from Doc objects Our JSON training format is annoying to work with, and we've wanted to retire it for some time. In the meantime, we can at least add some missing functions to make it easier to live with. This patch adds a function that generates the JSON format from a list of Doc objects, one per paragraph. This should be a convenient way to handle a lot of data conversions: whatever format you have the source information in, you can use it to setup a Doc object. This approach should offer better future-proofing as well. Hopefully, we can steadily rewrite code that is sensitive to the current data-format, so that it instead goes through this function. Then when we change the data format, we won't have such a problem.	2018-08-14 13:13:10 +02:00

... 8 9 10 11 12 ...

9450 Commits