spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-23 05:24:56 +03:00

Author	SHA1	Message	Date
svlandeg	6fea5fa4bd	attempt to fix cases with weird spaces	2020-06-16 11:52:29 +02:00
svlandeg	0702a1d3fb	fix test for misaligned	2020-06-15 23:10:47 +02:00
svlandeg	a28f8f369e	Fix many-to-one IOB codes	2020-06-15 23:06:22 +02:00
svlandeg	12886b787b	fixing NER one-to-many alignment	2020-06-15 22:44:17 +02:00
Matthew Honnibal	7ff447c5a0	Set version to v2.3.0	2020-06-15 18:22:25 +02:00
Matthew Honnibal	a0bf73a5dd	Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow	2020-06-15 18:16:01 +02:00
Matthew Honnibal	c66f93299e	Remove TokenAnnotation code from nonproj	2020-06-15 18:14:47 +02:00
Matthew Honnibal	c95494739c	Fix import	2020-06-15 18:11:10 +02:00
Matthew Honnibal	8f978f2031	Fix import	2020-06-15 18:10:47 +02:00
Matthew Honnibal	95de7efaad	Draft create_gold_state for arc_eager oracle	2020-06-15 18:10:19 +02:00
svlandeg	68986a252e	additional tests for new get_aligned function	2020-06-15 17:42:40 +02:00
svlandeg	41d29983a7	start testing get_aligned	2020-06-15 17:16:01 +02:00
svlandeg	fd5f199feb	fixing language and scoring tests	2020-06-15 15:02:05 +02:00
Adriane Boyd	0d8405aafa	Updates to docstrings (#5589 )	2020-06-15 14:58:36 +02:00
Adriane Boyd	e867e9fa8f	Fix and add warnings related to spacy-lookups-data (#5588 ) * Fix warning message for lemmatization tables * Add a warning when the `lexeme_norm` table is empty. (Given the relatively lang-specific loading for `Lookups`, it seemed like too much overhead to dynamically extract the list of languages, so for now it's hard-coded.)	2020-06-15 14:58:29 +02:00
Arvind Srinivasan	f698007907	Added Tamil Example Sentences (#5583 ) * Added Examples for Tamil Sentences #### Description This PR add example sentences for the Tamil language which were missing as per issue #1107 #### Type of Change This is an enhancement. * Accepting spaCy Contributor Agreement * Signed on my behalf as an individual	2020-06-15 14:58:21 +02:00
Adriane Boyd	c94f7d0e75	Updates to docstrings (#5589 )	2020-06-15 14:56:51 +02:00
Adriane Boyd	c482f20778	Fix and add warnings related to spacy-lookups-data (#5588 ) * Fix warning message for lemmatization tables * Add a warning when the `lexeme_norm` table is empty. (Given the relatively lang-specific loading for `Lookups`, it seemed like too much overhead to dynamically extract the list of languages, so for now it's hard-coded.)	2020-06-15 14:56:04 +02:00
svlandeg	b4d914ec77	fix error catching	2020-06-15 12:56:32 +02:00
svlandeg	b9c9cbb2cd	informative error when calling to_array with wrong field	2020-06-15 11:53:31 +02:00
svlandeg	ff231e1cdd	fix merge conflict	2020-06-15 09:04:19 +02:00
svlandeg	a48553c1ed	fix error numbers	2020-06-15 08:51:31 +02:00
Matthew Honnibal	3c0fc10dc4	Remove beam for now (maybe) Remove beam_utils Update setup.py Remove beam	2020-06-14 19:53:29 +02:00
Matthew Honnibal	98ca14f577	Remove GoldParse WIP on removing goldparse Get ArcEager compiling after GoldParse excise Update setup.py Get spacy.syntax compiling after removing GoldParse Rename NewExample -> Example and clean up Clean html files Start updating tests Update Morphologizer	2020-06-14 19:53:30 +02:00
Matthew Honnibal	d53723aa4f	Merge from whatif/arrow	2020-06-14 17:43:59 +02:00
Matthew Honnibal	380cce9d8b	Update errors	2020-06-14 17:40:05 +02:00
Matthew Honnibal	706e652820	Merge from develop	2020-06-14 17:35:01 +02:00
Matthew Honnibal	9296d71a54	More GoldParse excise	2020-06-14 17:26:54 +02:00
Matthew Honnibal	60d4e5a9e0	WIP on updating transition-system	2020-06-14 17:22:14 +02:00
Matthew Honnibal	7d65615625	WIP start excising GoldParse	2020-06-14 17:11:41 +02:00
Matthew Honnibal	4362ec7084	Hack Language.evaluate	2020-06-13 23:37:42 +02:00
Matthew Honnibal	7de997c0a5	Update test	2020-06-13 23:11:45 +02:00
Matthew Honnibal	8f941ef527	Update GoldParse	2020-06-13 23:11:29 +02:00
Matthew Honnibal	3a0bbcfb4c	Add biluo_tags_from_doc function	2020-06-13 23:10:54 +02:00
Matthew Honnibal	caa7508725	Draft missing NewExample stuff	2020-06-13 23:10:21 +02:00
Matthew Honnibal	3eb8f3867e	Update test	2020-06-13 23:05:16 +02:00
Arvind Srinivasan	aa5b40fa64	Added Tamil Example Sentences (#5583 ) * Added Examples for Tamil Sentences #### Description This PR add example sentences for the Tamil language which were missing as per issue #1107 #### Type of Change This is an enhancement. * Accepting spaCy Contributor Agreement * Signed on my behalf as an individual	2020-06-13 15:56:26 +02:00
Matthew Honnibal	5564314d32	Suggest approach for GoldParse	2020-06-13 15:43:35 +02:00
Matthew Honnibal	b078b05ecd	Handle various data better in NewExample	2020-06-13 15:30:12 +02:00
svlandeg	face0de74f	fix MORPH conversion + enable unit test	2020-06-12 16:29:09 +02:00
svlandeg	a5ee082da1	cats bugfix	2020-06-12 15:49:38 +02:00
svlandeg	880dccf93e	entities on doc_annotation, parse links and check their offsets against the entities. unit test works	2020-06-12 15:47:20 +02:00
theudas	3f5e2f9d99	Added Parameter to NEL to take n sentences into account (#5548 ) * added setting for neighbour sentence in NEL * added spaCy contributor agreement * added multi sentence also for training * made the try-except block smaller	2020-06-12 15:15:03 +02:00
adrianeboyd	4724fa4cf4	Expand Japanese requirements warning (#5572 ) Include explicit install instructions in Japanese requirements warning.	2020-06-12 15:14:55 +02:00
adrianeboyd	44967a3f9c	Update pytest conf for sudachipy with Japanese (#5574 )	2020-06-12 15:14:47 +02:00
svlandeg	3aed177a35	fix ENT_IOB conversion and enable unit test	2020-06-12 11:30:24 +02:00
Matthew Honnibal	a1c5b694be	Small fixes to train defaults	2020-06-12 02:22:13 +02:00
theudas	fa46e0bef2	Added Parameter to NEL to take n sentences into account (#5548 ) * added setting for neighbour sentence in NEL * added spaCy contributor agreement * added multi sentence also for training * made the try-except block smaller	2020-06-12 02:03:23 +02:00
Sofie Van Landeghem	c0f4a1e43b	train is from-config by default (#5575 ) * verbose and tag_map options * adding init_tok2vec option and only changing the tok2vec that is specified * adding omit_extra_lookups and verifying textcat config * wip * pretrain bugfix * add replace and resume options * train_textcat fix * raw text functionality * improve UX when KeyError or when input data can't be parsed * avoid unnecessary access to goldparse in TextCat pipe * save performance information in nlp.meta * add noise_level to config * move nn_parser's defaults to config file * multitask in config - doesn't work yet * scorer offering both F and AUC options, need to be specified in config * add textcat verification code from old train script * small fixes to config files * clean up * set default config for ner/parser to allow create_pipe to work as before * two more test fixes * small fixes * cleanup * fix NER pickling + additional unit test * create_pipe as before	2020-06-12 02:02:07 +02:00
svlandeg	6a67a11682	adding tests for new example class (some still failing - WIP)	2020-06-11 17:43:40 +02:00

... 3 4 5 6 7 ...

12012 Commits