spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-27 10:26:35 +03:00

Author	SHA1	Message	Date
Ines Montani	7946fd84bb	Merge pull request #6200 from adrianeboyd/bugfix/vocab-disk-lookups-vectors Always serialize lookups and vectors to disk	2020-10-05 15:15:25 +02:00
Ines Montani	8171e28b20	Remove logging [ci skip] This would be fired on each example, which is wrong	2020-10-05 15:09:52 +02:00
svlandeg	251b3eb4e5	add initialize method for entity_ruler	2020-10-05 14:59:13 +02:00
Sofie Van Landeghem	f4f49f5877	update blis (#6198 ) * allow higher blis version * fix typo * bump to 3.0.0a34 * fix pins in other files	2020-10-05 14:58:56 +02:00
Adriane Boyd	5d19dfc9d3	Update Chinese tokenizer for spacy-pkuseg fork	2020-10-05 14:21:53 +02:00
Matthew Honnibal	6a9d14e35a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-10-05 14:17:41 +02:00
Matthew Honnibal	d2b9aafb8c	Fix augmenter	2020-10-05 14:14:49 +02:00
Ines Montani	6260fa3c10	Merge pull request #6201 from svlandeg/fix/error_nr	2020-10-05 14:00:57 +02:00
Ines Montani	6958510bda	Include spaCy version check in project CLI	2020-10-05 13:53:07 +02:00
Ines Montani	20f2a17a09	Merge test_misc and test_util	2020-10-05 13:45:57 +02:00
svlandeg	fd2d48556c	fix E902 and E903 numbering	2020-10-05 13:43:32 +02:00
Ines Montani	1c641e41c3	Remove unused import [ci skip]	2020-10-05 11:50:11 +02:00
Adriane Boyd	03cfb2d2f4	Always serialize lookups and vectors to disk	2020-10-05 09:40:20 +02:00
Adriane Boyd	b0b93854cb	Update ru/uk lemmatizers for new nlp.initialize	2020-10-05 09:27:16 +02:00
Ines Montani	549758f67d	Adjust test for now	2020-10-04 23:16:09 +02:00
Ines Montani	4b15ff7504	Increment version [ci skip]	2020-10-04 22:47:04 +02:00
Ines Montani	f1d1f78636	Make warning debug log [ci skip]	2020-10-04 22:44:21 +02:00
Ines Montani	3c36a57e84	Update data augmenters (#6196 ) * Draft lower-case augmenter * Make warning a debug log * Update lowercase augmenter, docs and tests Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-10-04 17:46:29 +02:00
Ines Montani	d38dc466c5	Adjust error [ci skip]	2020-10-04 15:26:01 +02:00
Ines Montani	496228771d	Merge pull request #6194 from explosion/master-tmp	2020-10-04 15:25:41 +02:00
Ines Montani	0307a228c8	Merge pull request #6193 from explosion/fix/adjust-pipe-init Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe	2020-10-04 15:20:54 +02:00
Ines Montani	59deeb7da6	Merge branch 'develop' into master-tmp	2020-10-04 14:52:20 +02:00
Ines Montani	43d7652635	Merge pull request #6192 from explosion/feature/init-attr-ruler	2020-10-04 14:46:37 +02:00
Ines Montani	8f018e47f8	Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe	2020-10-04 14:43:45 +02:00
Matthew Honnibal	84ae197dd6	Fix logger	2020-10-04 14:16:53 +02:00
Ines Montani	11347f34da	Tidy up, tests and docs	2020-10-04 13:54:05 +02:00
Matthew Honnibal	96b636c2d3	Update attribute ruler	2020-10-04 13:08:21 +02:00
Ines Montani	bcd52e5486	Tidy up errors and warnings	2020-10-04 11:16:31 +02:00
Ines Montani	ff914f4e6f	Lazy-load xx	2020-10-04 11:10:26 +02:00
Ines Montani	d3b3663942	Adjust error message and add test	2020-10-04 10:11:27 +02:00
Ines Montani	2110e8f86d	Auto-format	2020-10-04 10:06:49 +02:00
Ines Montani	cc08c88a89	Merge pull request #6187 from svlandeg/fix/begin_training_pipe	2020-10-04 10:01:02 +02:00
svlandeg	3f657ed3a1	implement warning in __init_subclass__ instead	2020-10-03 22:34:10 +02:00
Matthew Honnibal	3b2a78720c	Upd morphologizer	2020-10-03 19:35:19 +02:00
Matthew Honnibal	835070cedc	Upd test	2020-10-03 19:35:10 +02:00
Matthew Honnibal	70b9de8e58	Set version to v3.0.0a32	2020-10-03 19:26:52 +02:00
Matthew Honnibal	85ede32680	Format	2020-10-03 19:26:23 +02:00
Matthew Honnibal	b305f2ff5a	Fix loggers	2020-10-03 19:26:10 +02:00
Matthew Honnibal	4fccd2ceaf	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-10-03 19:13:55 +02:00
Matthew Honnibal	8ea8b7d940	Support loading labels in morphologizer	2020-10-03 19:13:42 +02:00
Ines Montani	c2401fca41	Add tests for Pipe.label_data	2020-10-03 19:12:46 +02:00
Ines Montani	80603f0fa5	Make SentenceRecognizer.label_data return None Overwrite the method from the base class (Tagger) but don't export anything in "init labels"	2020-10-03 18:54:09 +02:00
Ines Montani	d6c967401f	Increment version	2020-10-03 17:20:47 +02:00
Ines Montani	3bc3c05fcc	Tidy up and auto-format	2020-10-03 17:20:18 +02:00
Ines Montani	7c4ab7e82c	Fix Lemmatizer.get_lookups_config	2020-10-03 17:16:10 +02:00
Ines Montani	dd542ec6a4	Fix label initialization of textcat component (#6190 )	2020-10-03 17:07:38 +02:00
Ines Montani	989a96308f	Tidy up, auto-format, types	2020-10-03 16:31:58 +02:00
Matthew Honnibal	7b127f307e	Set version to v3.0.0a30	2020-10-03 16:06:42 +02:00
Matthew Honnibal	db419f6b2f	Improve control of training progress and logging (#6184 ) * Make logging and progress easier to control * Update docs * Cleanup errors * Fix ConfigValidationError * Pass stdout/stderr, not wasabi.Printer * Fix type * Upd logging example * Fix logger example * Fix type	2020-10-03 14:57:46 +02:00
Ines Montani	ae15c9de79	Raise error from caught KeyError to preserve traceback	2020-10-03 11:43:56 +02:00
Ines Montani	f758804401	Save one line of code	2020-10-03 11:41:28 +02:00
Stanislav Schmidt	3589a64d44	Change type of texts argument in pipe to iterable (#6186 ) * Change type of texts argument in pipe to iterable * Add contributor agreement	2020-10-02 21:00:11 +02:00
svlandeg	02247cccaf	Merge remote-tracking branch 'upstream/develop' into feature/small-fixes	2020-10-02 20:48:11 +02:00
svlandeg	fb48de349c	bwd compat for pipe.begin_training	2020-10-02 20:31:14 +02:00
Matthew Honnibal	6965cdf16d	Fix comment	2020-10-02 17:26:21 +02:00
Ines Montani	3cf10a0729	Merge pull request #6183 from adrianeboyd/feature/quickstart-morphologizer Add morphologizer to quickstart template	2020-10-02 17:08:01 +02:00
Adriane Boyd	62ccd5c4df	Relax model meta performance schema (#6185 ) Allow more embedded per_x in `ModelMetaSchema`	2020-10-02 16:37:21 +02:00
Sofie Van Landeghem	09dcb75076	small UX fix for DocBin (#6167 ) * add informative warning when messing up store_user_data DocBin flags * add informative warning when messing up store_user_data DocBin flags * cleanup test * rename to patterns_path	2020-10-02 15:43:32 +02:00
Ines Montani	f0b30aedad	Make lemmatizers use initialize logic (#6182 ) * Make lemmatizer use initialize logic and tidy up * Fix typo * Raise for uninitialized tables	2020-10-02 15:42:36 +02:00
Adriane Boyd	22158dc24a	Add morphologizer to quickstart template	2020-10-02 15:06:16 +02:00
Ines Montani	d2aa662ab2	Merge pull request #6179 from adrianeboyd/feature/token-morph-refactor-2 [ci skip]	2020-10-02 12:10:27 +02:00
Ines Montani	c41a4332e4	Add test for custom data augmentation	2020-10-02 11:37:56 +02:00
svlandeg	acc391c2a8	remove redundant str() call	2020-10-02 11:05:59 +02:00
Ines Montani	3856048437	Merge pull request #6178 from explosion/feature/file-readers Integrate file readers via srsly, update orth_variants loading	2020-10-02 10:26:09 +02:00
Adriane Boyd	f83dfe62da	Fix test	2020-10-02 10:17:26 +02:00
Adriane Boyd	65dfaa4f4b	Also accept MorphAnalysis in set_morph	2020-10-02 08:33:43 +02:00
Adriane Boyd	77e08c398f	Switch reset value for set_morph to None	2020-10-02 08:25:15 +02:00
Ines Montani	568768643e	Increment version [ci skip]	2020-10-02 01:50:13 +02:00
Ines Montani	01c1538c72	Integrate file readers	2020-10-02 01:36:06 +02:00
Ines Montani	af282ae732	Fix import	2020-10-02 01:12:34 +02:00
Ines Montani	e59ecb12c0	Auto-format	2020-10-02 01:12:30 +02:00
Matthew Honnibal	75a1569908	Merge	2020-10-01 23:07:53 +02:00
Matthew Honnibal	300e5a9928	Avoid relying on NORM in default v3 models (#6176 ) * Allow CharacterEmbed to specify feature * Default to LOWER in character embed * Update tok2vec * Use LOWER, not NORM	2020-10-01 23:05:55 +02:00
Ines Montani	5762876dcc	Update default config [ci skip]	2020-10-01 22:27:37 +02:00
Adriane Boyd	86c3ec9c2b	Refactor Token morph setting (#6175 ) * Refactor Token morph setting * Remove `Token.morph_` * Add `Token.set_morph()` * `0` resets `token.c.morph` to unset * Any other values are passed to `Morphology.add` * Add token.morph setter to set from MorphAnalysis	2020-10-01 22:21:46 +02:00
Matthew Honnibal	b854bca15c	Default to LOWER in character embed	2020-10-01 22:17:58 +02:00
Matthew Honnibal	684a77870b	Allow CharacterEmbed to specify feature	2020-10-01 22:17:26 +02:00
Ines Montani	da30701cd1	Increment version [ci skip]	2020-10-01 21:58:11 +02:00
Ines Montani	d48ddd6c9a	Remove default initialize lookups	2020-10-01 21:54:33 +02:00
Ines Montani	1700c8541e	Increment version [ci skip]	2020-10-01 17:57:16 +02:00
Ines Montani	f2627157c8	Update docs [ci skip]	2020-10-01 17:38:17 +02:00
Ines Montani	7f68f4bd92	Hide jsonl_loc on init vectors and tidy up [ci skip]	2020-10-01 16:44:17 +02:00
Adriane Boyd	27cbffff1b	Minor edit to CoNLL-U converter (#6172 ) This doesn't make a difference given how the `merged_morph` values override the `morph` values for all the final docs, but could have led to unexpected bugs in the future if the converter is modified.	2020-10-01 16:23:42 +02:00
Sofie Van Landeghem	a22215f427	Add FeatureExtractor from Thinc (#6170 ) * move featureextractor from Thinc * Update website/docs/api/architectures.md Co-authored-by: Ines Montani <ines@ines.io> * Update website/docs/api/architectures.md Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Ines Montani <ines@ines.io>	2020-10-01 16:22:48 +02:00
Adriane Boyd	73538782a0	Switch Doc.__init__(ents=) to IOB tags (#6173 ) * Switch Doc.__init__(ents=) to IOB tags * Fix check for "-" * Allow "" or None as missing IOB tag	2020-10-01 16:22:18 +02:00
Adriane Boyd	df98d3ef9f	Update import from collections.abc (#6174 )	2020-10-01 16:21:49 +02:00
Yohei Tamura	3243ddac8f	Fix/span.sent (#6083 ) * add fail test * fix test * fix span.sent * Remove incorrect implicit check Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-10-01 14:01:52 +02:00
Ines Montani	0a8a124a6e	Update docs [ci skip]	2020-10-01 12:15:53 +02:00
Ines Montani	44160cd52f	Tidy up [ci skip]	2020-10-01 10:41:19 +02:00
Ines Montani	381258b75b	Merge pull request #6165 from explosion/feature/update-tokenizers-initialize	2020-10-01 09:49:47 +02:00
svlandeg	6787e56315	print debugging warning before raising error if model not properly initialized	2020-10-01 09:21:00 +02:00
svlandeg	5121972930	add types of Tok2Vec embedding layers	2020-10-01 09:20:09 +02:00
Ines Montani	4b6afd3611	Remove English [initialize] default block for now to get tests to pass	2020-09-30 23:49:29 +02:00
Ines Montani	6f29f68f69	Update errors and make Tokenizer.initialize args less strict	2020-09-30 23:48:47 +02:00
Ines Montani	a103ab5f1a	Update augmenter lookups and docs	2020-09-30 23:03:47 +02:00
Matthew Honnibal	5128298964	Add missing augmenter	2020-09-30 20:18:45 +02:00
Matthew Honnibal	59294e91aa	Restore the 'jsonl' arg for init vectors The lexemes.jsonl file is still used in our English vectors, and it may be required by users as well. I think it's worth supporting the option.	2020-09-30 19:06:50 +02:00
Matthew Honnibal	c379a4274a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-30 16:52:42 +02:00
Matthew Honnibal	e58dca3028	Add read_labels	2020-09-30 16:52:27 +02:00
Ines Montani	23c63eefaf	Tidy up env vars [ci skip]	2020-09-30 15:15:11 +02:00

1 2 3 4 5 ...

8142 Commits