spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-19 13:41:00 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	af0b3bc4d8	Inc version	2021-01-25 19:02:27 +11:00
Matthew Honnibal	5b2440a1fd	Try to use real histories, not oracle	2021-01-25 18:59:52 +11:00
Matthew Honnibal	c3c462e562	Inc version	2021-01-25 16:48:58 +11:00
Matthew Honnibal	bd04ea0b02	Fix transition has_gold	2021-01-25 16:48:45 +11:00
Matthew Honnibal	b2044d510e	Inc version	2021-01-25 16:21:54 +11:00
Matthew Honnibal	585ee4c81c	Inc version	2021-01-25 15:27:05 +11:00
Matthew Honnibal	38ad6c7b6a	Fix parser oracle	2021-01-25 15:26:43 +11:00
Matthew Honnibal	46b6197248	Inc version	2021-01-25 14:52:14 +11:00
Matthew Honnibal	19747d98d1	Fix	2021-01-25 14:51:46 +11:00
Matthew Honnibal	772248f84a	Inc version	2021-01-25 14:40:31 +11:00
Matthew Honnibal	456c881ae3	Try to fix parser training	2021-01-25 14:40:05 +11:00
Matthew Honnibal	3a6b93ae3a	Inc version	2021-01-25 13:29:08 +11:00
Matthew Honnibal	cef93d3ae7	Handle final states in get_oracle_sequence	2021-01-25 13:28:57 +11:00
Matthew Honnibal	a49975343e	Inc version	2021-01-25 13:06:27 +11:00
Matthew Honnibal	be155ead9b	Fix set_annotations during parser update	2021-01-25 11:56:36 +11:00
Matthew Honnibal	c631c355d1	Revert "Fix set_annotations in parser.update" This reverts commit `c6df0eafd0`.	2021-01-25 11:22:57 +11:00
Matthew Honnibal	65f2270d59	Revert "Fix parser set_annotations during update" This reverts commit `eb138c89ed`.	2021-01-25 11:22:43 +11:00
Matthew Honnibal	eb138c89ed	Fix parser set_annotations during update	2021-01-25 10:52:40 +11:00
Matthew Honnibal	c6df0eafd0	Fix set_annotations in parser.update	2021-01-25 09:50:48 +11:00
Matthew Honnibal	bb15d5b22f	Fix copying SpanGroups	2021-01-25 09:50:29 +11:00
Matthew Honnibal	8f07e6c901	Upd version	2021-01-25 01:22:06 +11:00
Matthew Honnibal	351ce600c5	Fix dict proxy copy	2021-01-25 01:21:47 +11:00
Matthew Honnibal	827fb51e6c	Fix set_annotations during Parser.update	2021-01-25 00:52:00 +11:00
Matthew Honnibal	492c948937	Add SpanGroups.copy method	2021-01-25 00:51:38 +11:00
Matthew Honnibal	8a22161b59	Change version	2021-01-25 00:23:43 +11:00
Matthew Honnibal	6117adcd6d	Make vocab always own lexemes	2021-01-25 00:23:02 +11:00
Matthew Honnibal	4048ca01eb	Set dev version	2021-01-25 00:08:49 +11:00
Matthew Honnibal	d5b1673790	Try to fix doc.copy	2021-01-24 23:54:36 +11:00
Matthew Honnibal	ffc371350a	Avoid assuming encode.get_dim('nO') is set in tok2vec (#6800 )	2021-01-24 14:37:33 +11:00
KeshavG-lb	0a86d833d7	Spacy Cli info method causing backward compatibility issues (#6793 ) * Spacy Cli info method causing backward compatibility issues #6791 fix backward compatibility by setting default value to exclude in info method. * setting empty list as default argument is dangerous. so setting default to None and then setting it to emptylist, if None. Reference : https://nikos7am.com/posts/mutable-default-arguments/	2021-01-23 11:21:43 +01:00
Luigi Coniglio	e83c818a78	DependencyMatcher improvements (fix #6678 ) (#6744 ) * Adding contributor agreement for user werew * [DependencyMatcher] Comment and clean code * [DependencyMatcher] Use defaultdicts * [DependencyMatcher] Simplify _retrieve_tree method * [DependencyMatcher] Remove prepended underscores * [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop * [DependencyMatcher] Remove _nodes attribute * [DependencyMatcher] Use enumerate in _retrieve_tree method * [DependencyMatcher] Clean unused vars and use camel_case naming * [DependencyMatcher] Memoize node+operator map * Add root property to Token * [DependencyMatcher] Groups matches by root * [DependencyMatcher] Remove unused _keys_to_token attribute * [DependencyMatcher] Use a list to map tokens to matcher's keys * [DependencyMatcher] Remove recursion * [DependencyMatcher] Use a generator to retrieve matches * [DependencyMatcher] Remove unused memory pool * [DependencyMatcher] Hide private methods and attributes * [DependencyMatcher] Improvements to the matches validation * Apply suggestions from code review Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> * [DependencyMatcher] Fix keys_to_position_maps * Remove Token.root property * [DependencyMatcher] Remove functools' lru_cache Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-22 11:20:08 +11:00
Sofie Van Landeghem	d93cd3b7c0	remove artificially duplicated test [ci skip]	2021-01-21 10:53:16 +01:00
Sofie Van Landeghem	e680efc7cc	Set annotations in update (#6767 ) * bump to 3.0.0rc4 * do set_annotations in component update calls * update docs and remove set_annotations flag * fix EL test	2021-01-20 11:49:25 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Matthew Honnibal	88acbfc050	Copy the Example objects (and their predicted Doc) in nlp.evaluate() and nlp.update() (#6765 ) * Make copy of examples in nlp.update and nlp.evaluate * Avoid circular import * Fix evaluate	2021-01-19 16:47:44 +01:00
Sofie Van Landeghem	bfc212e68f	fix duplicate from merge [ci skip]	2021-01-19 12:14:35 +01:00
Sofie Van Landeghem	c8761b0e6e	rewrite Maxout layer as separate layers to avoid shape inference trouble (#6760 )	2021-01-19 07:37:17 +08:00
Adriane Boyd	26c34ab8b0	Fix parser resizing for cupy (#6758 )	2021-01-18 20:43:15 +01:00
Matthew Honnibal	c2a18e4fa3	Update textcat ensemble model	2021-01-19 02:53:02 +11:00
Ines Montani	e697609fef	Update docstrings and types [ci skip]	2021-01-18 22:31:26 +11:00
Ines Montani	f4d547b73c	Fix error code	2021-01-18 11:43:45 +11:00
Ines Montani	1090d3d675	Merge branch 'develop' into feature/spacy-legacy	2021-01-18 11:43:39 +11:00
Sofie Van Landeghem	fed8f48965	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 ) * raise NotImplementedError when noun_chunks iterator is not implemented * bring back, fix and document span.noun_chunks * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-17 19:56:05 +08:00
Adriane Boyd	bf0cdae8d4	Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory	2021-01-17 19:54:41 +08:00
Adriane Boyd	185fc62f4d	Remove unused is_base_form for mk lemmatizer (#6743 ) Remove unimplemented/incorrect is_base_form for Macedonian lemmatizer.	2021-01-17 09:41:35 +01:00
Adriane Boyd	43a752a2a0	Fix assertion in default get oracle sequence usage (#6738 ) Remove assertion for default debug value in `get_oracle_sequence_from_state`.	2021-01-16 16:07:39 +01:00
Ines Montani	a552db2819	Include available registry names in error	2021-01-16 14:35:03 +11:00
Matthew Honnibal	f0c696b4aa	Fix failed merge of #6694 patch	2021-01-16 13:44:11 +11:00
Ines Montani	d12be459f6	Raise RegistryError	2021-01-16 12:57:13 +11:00
Adriane Boyd	c8b4370865	Add all strings from source models (#6736 ) Add all strings from the source model when adding a pipe from a source model. Minor: * Skip `disable=["vocab", "tokenizer"]` when loading a source model from the config, since this doesn't do anything and is misleading.	2021-01-16 12:26:15 +11:00

1 2 3 4 5 ...

8427 Commits