spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-18 07:37:03 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	492d1ec5de	Prevent alignment when texts don't match (#5867 ) * remove empty gold.pyx * add alignment unit test (to be used in docs) * ensure that Alignment is only used on equal texts * additional test using example.alignment * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-04 16:29:18 +02:00
Matthew Honnibal	ecb3c4e8f4	Create corpus iterator and batcher from registry during training (#5865 ) * Move batchers into their own module (and registry) * Update CLI * Update Corpus and batcher * Update tests * Update one config * Merge 'evaluation' block back under [training] * Import batchers in gold __init__ * Fix batchers * Update config * Update schema * Update util * Don't assume train and dev are actually paths * Update onto-joint config * Fix missing import * Format * Format * Update spacy/gold/corpus.py Co-authored-by: Ines Montani <ines@ines.io> * Fix name * Update default config * Fix get_length option in batchers * Update test * Add comment * Pass path into Corpus * Update docstring * Update schema and configs * Update config * Fix test * Fix paths * Fix print * Fix create_train_batches * [training.read_train] -> [training.train_corpus] * Update onto-joint config Co-authored-by: Ines Montani <ines@ines.io>	2020-08-04 15:09:37 +02:00
Sofie Van Landeghem	82347110f5	Default empty KB in EL component (#5872 ) * EL field documentation * documentation consistent with docs * default empty KB, initialize vocab separately * formatting * add test for changing the default entity vector length * update comment	2020-08-04 14:34:09 +02:00
Adriane Boyd	b7e3018d97	Recalculate alignment if tokenization differs (#5868 ) * Recalculate alignment if tokenization differs * Refactor cached alignment data	2020-08-04 14:31:32 +02:00
Adriane Boyd	c62fd878a3	Allow Doc.char_span to snap to token boundaries (#5849 ) * Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`.	2020-08-04 13:36:32 +02:00
Adriane Boyd	b841248589	Add Span index boundary checks (#5861 ) * Add Span index boundary checks * Return Span-specific IndexError in all cases * Simplify and fix if/else	2020-08-04 13:35:25 +02:00
Adriane Boyd	cd59979ab4	Fix span boundary handling in Spanish noun_chunks (#5860 )	2020-08-03 13:53:15 +02:00
Ines Montani	934447a611	Merge pull request #5855 from svlandeg/fix/cli-debug	2020-08-03 13:09:20 +02:00
Li Zhe	296f8b65b4	fix the wrong hash url in adding-languages.md file (#5810 ) * fix the wrong hash url in adding-languages.md file change the #101 url hash path to #language-data * filled in the spaCy Contributor Agreement filled in the spaCy Contributor Agreement	2020-08-02 23:15:56 +02:00
Ines Montani	4c055f0aa7	Add init CLI and init config (#5854 ) * Add init CLI and init config draft * Improve config validation * Auto-format * Don't export anything in debug config * Update docs	2020-08-02 15:18:30 +02:00
svlandeg	6f4e46ee93	Merge remote-tracking branch 'upstream/develop' into fix/cli-debug # Conflicts: # pyproject.toml # requirements.txt # setup.cfg	2020-08-01 18:38:59 +02:00
Ines Montani	e393ebd78b	Merge pull request #5851 from explosion/feature/better-pipe-analysis	2020-08-01 14:20:27 +02:00
Ines Montani	b40f44419b	Simplify pipe analysis - remove unused code - don't print by default - integrate attrs info into analysis output	2020-08-01 13:40:06 +02:00
Ines Montani	93144bde97	Update code block style [ci skip]	2020-07-31 18:55:55 +02:00
Ines Montani	98c6a85c8b	Update docs [ci skip]	2020-07-31 18:55:38 +02:00
Ines Montani	b68c53858c	Remove global	2020-07-31 18:37:58 +02:00
Ines Montani	30a76fcf6f	Integrate and simplify pipe analysis	2020-07-31 18:34:35 +02:00
svlandeg	9b719dfb1a	use divider inbetween steps	2020-07-31 18:06:48 +02:00
svlandeg	51ffc4a166	rename pipe_name to component	2020-07-31 17:58:55 +02:00
svlandeg	878327d38e	printing final predictions by default to False	2020-07-31 17:36:32 +02:00
Ines Montani	2d955fbf98	Fix linting [ci skip]	2020-07-31 17:05:28 +02:00
Ines Montani	e9e8fa2466	Update docs and types	2020-07-31 17:02:54 +02:00
Ines Montani	dab31426e1	Pin to latest Thinc	2020-07-31 17:00:14 +02:00
svlandeg	cc2f58a1b0	use data_validation context manager	2020-07-31 16:49:42 +02:00
Adriane Boyd	ac14ce7c30	Prefer earlier spans in EntityRuler (#5843 ) Similar to #4414, update the sorting in EntityRuler to prefer the first span in overlapping spans.	2020-07-31 16:09:32 +02:00
svlandeg	5fa3235d06	set DATA_VALIDATION to False for debug_model (upgrade thinc)	2020-07-31 15:21:01 +02:00
svlandeg	08d3c36c20	bugfix in train CLI	2020-07-31 15:03:43 +02:00
Ines Montani	6365837ca9	Merge pull request #5833 from explosion/feature/scorer-adjustments	2020-07-31 14:00:39 +02:00
Ines Montani	5a221f79c2	Revert "Remove keyword-only from Scorer API docs" [ci skip] This reverts commit `7a6ac47dc1`.	2020-07-31 14:00:21 +02:00
Ines Montani	160f1a5f94	Update docs [ci skip]	2020-07-31 13:26:39 +02:00
Adriane Boyd	9b509aa87f	Move Language.evaluate scorer config to new arg Move `Language.evaluate` scorer config from `component_cfg` to separate argument `scorer_cfg`.	2020-07-31 11:05:16 +02:00
Adriane Boyd	901801b33b	Fix default arguments in DependencyParser.score	2020-07-31 10:55:44 +02:00
Adriane Boyd	9d79916792	Merge branch 'develop' into feature/scorer-adjustments	2020-07-31 10:48:14 +02:00
Sofie Van Landeghem	ca491722ad	The Parser is now a Pipe (2) (#5844 ) * moving syntax folder to _parser_internals * moving nn_parser and transition_system * move nn_parser and transition_system out of internals folder * moving nn_parser code into transition_system file * rename transition_system to transition_parser * moving parser_model and _state to ml * move _state back to internals * The Parser now inherits from Pipe! * small code fixes * removing unnecessary imports * remove link_vectors_to_models * transition_system to internals folder * little bit more cleanup * newlines	2020-07-30 23:30:54 +02:00
svlandeg	0b23594953	pipe_name instead of section in debug_model	2020-07-30 20:06:28 +02:00
holubvl3	d16c0f2c3a	Create holubvl3 (#5845 ) * Create holubvl3 * Rename holubvl3 to holubvl3.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-07-30 17:40:31 +02:00
Rahul Gupta	f76fae0e8d	English: adds ordinal numbers (#5830 )	2020-07-29 20:22:47 +02:00
Ines Montani	3449c45fd9	Update docs [ci skip]	2020-07-29 19:48:26 +02:00
Ines Montani	9c80cb673d	Update docs [ci skip]	2020-07-29 19:41:34 +02:00
Ines Montani	9f69afdd1e	Update docs [ci skip]	2020-07-29 19:09:44 +02:00
Ines Montani	7a21775cd0	Merge pull request #5834 from explosion/feature/vectors	2020-07-29 18:49:26 +02:00
Gustavo Zadrozny Leyendecker	90b958fd01	Fix on EntityRendered to support break lines (after last entity) (closes #5838 )	2020-07-29 18:48:39 +02:00
Ines Montani	6a5c853edb	Fix docs [ci skip]	2020-07-29 18:45:12 +02:00
Ines Montani	158d8c1e48	Update docs [ci skip]	2020-07-29 18:44:10 +02:00
Matthew Honnibal	f7adc9d3b7	Start rewriting vectors docs	2020-07-29 17:10:06 +02:00
Ines Montani	b0f57a0cac	Update docs and consistency	2020-07-29 15:14:07 +02:00
Matthew Honnibal	a2d573c039	Merge branch 'feature/vectors' of https://github.com/explosion/spaCy into feature/vectors	2020-07-29 14:56:27 +02:00
Matthew Honnibal	ebdb3f5f04	Fix config	2020-07-29 14:56:11 +02:00
Matthew Honnibal	2af741d7e3	Fix train arg	2020-07-29 14:56:01 +02:00
Matthew Honnibal	c27309f839	Merge branch 'develop' into feature/vectors	2020-07-29 14:54:10 +02:00

... 54 55 56 57 58 ...

15224 Commits