spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-01-10 09:16:31 +03:00

Author	SHA1	Message	Date
svlandeg	a31648d28b	further code cleanup	2019-06-19 09:15:43 +02:00
svlandeg	478305cd3f	small tweaks and documentation	2019-06-18 18:38:09 +02:00
svlandeg	0d177c1146	clean up code, remove old code, move to bin	2019-06-18 13:20:40 +02:00
svlandeg	ffae7d3555	sentence encoder only (removing article/mention encoder)	2019-06-18 00:05:47 +02:00
Kabir Khan	1e19f34e29	Add optional `id` property to EntityRuler patterns (#3591 ) * Adding support for entity_id in EntityRuler pipeline component * Adding Spacy Contributor aggreement * Updating EntityRuler to use string.format instead of f strings * Update Entity Ruler to support an 'id' attribute per pattern that explicitly identifies an entity. * Fixing tests * Remove custom extension entity_id and use built in ent_id token attribute. * Changing entity_id to ent_id for consistent naming * entity_ids => ent_ids * Removing kb, cleaning up tests, making util functions private, use rsplit instead of split	2019-06-16 13:29:04 +02:00
svlandeg	b312f2d0e7	redo training data to be independent of KB and entity-level instead of doc-level	2019-06-14 15:55:26 +02:00
svlandeg	78dd3e11da	write entity linking pipe to file and keep vocab consistent between kb and nlp	2019-06-13 16:25:39 +02:00
svlandeg	b12001f368	small fixes	2019-06-12 22:05:53 +02:00
svlandeg	6521cfa132	speeding up training	2019-06-12 13:37:05 +02:00
svlandeg	fe1ed432ef	eval on dev set, varying combo's of prior and context scores	2019-06-11 11:40:58 +02:00
svlandeg	83dc7b46fd	first tests with EL pipe	2019-06-10 21:25:26 +02:00
Matthew Honnibal	a931d72459	Add merge_subtokens as parser post-process. Re #3830	2019-06-07 20:40:41 +02:00
svlandeg	7de1ee69b8	training loop in proper pipe format	2019-06-07 15:55:10 +02:00
svlandeg	0486ccabfd	introduce goldparse.links	2019-06-07 13:54:45 +02:00
svlandeg	a5c061f506	storing NEL training data in GoldParse objects	2019-06-07 12:58:42 +02:00
svlandeg	61f0e2af65	code cleanup	2019-06-06 20:22:14 +02:00
svlandeg	5c723c32c3	entity vectors in the KB + serialization of them	2019-06-05 18:29:18 +02:00
svlandeg	9abbd0899f	separate entity encoder to get 64D descriptions	2019-06-05 00:09:46 +02:00
svlandeg	fb37cdb2d3	implementing el pipe in pipes.pyx (not tested yet)	2019-06-03 21:32:54 +02:00
svlandeg	dd691d0053	debugging	2019-05-17 17:44:11 +02:00
Sofie	a4a6bfa4e1	Merge branch 'master' into feature/el-framework	2019-03-26 11:00:02 +01:00
svlandeg	8814b9010d	entity as one field instead of both ID and name	2019-03-25 18:10:41 +01:00
Matthew Honnibal	6c783f8045	Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs	2019-03-23 16:44:44 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00
svlandeg	5318ce88fa	'entity_linker' instead of 'el'	2019-03-22 13:55:10 +01:00
svlandeg	1ee0e78fd7	select candidate with highest prior probabiity	2019-03-22 11:36:45 +01:00
svlandeg	c593607ce2	minimal EL pipe	2019-03-22 11:36:45 +01:00
svlandeg	735fc2a735	annotate kb_id through ents in doc	2019-03-22 11:36:44 +01:00
svlandeg	d849eb2455	adding kb_id as field to token, el as nlp pipeline component	2019-03-22 11:34:46 +01:00
Ines Montani	278e9d2eb0	Merge branch 'master' into feature/lemmatizer	2019-03-16 13:44:22 +01:00
Ines Montani	cb5dbfa63a	Tidy up references to n_threads and fix default	2019-03-15 16:24:26 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Matthew Honnibal	0f12082465	Refactor morphologizer	2019-03-09 22:54:59 +00:00
Matthew Honnibal	41a3016019	Refactor morphologizer class map	2019-03-09 20:55:33 +01:00
Matthew Honnibal	f742900f83	Set pos attribute in morphologizer	2019-03-09 11:51:11 +00:00
Matthew Honnibal	b6d60d0041	Merge branch 'feature/lemmatizer' of https://github.com/explosion/spaCy into feature/lemmatizer	2019-03-09 00:41:53 +00:00
Matthew Honnibal	42bc3ad73b	Fix class mapping for morphologizer	2019-03-09 00:20:29 +00:00
Matthew Honnibal	cc2b2dba14	Neaten set_morphology option on Tagger	2019-03-08 19:16:02 +01:00
Matthew Honnibal	afa227e25b	Fix setter	2019-03-08 19:10:01 +01:00
Matthew Honnibal	b27bd42613	Fix compile error	2019-03-08 19:06:02 +01:00
Matthew Honnibal	c91577db02	Add set_morphology cfg option for Tagger	2019-03-08 19:03:17 +01:00
Matthew Honnibal	49cf002ac4	Add missing import	2019-03-08 18:59:25 +01:00
Matthew Honnibal	d7ec1d62cb	Fix Morphologizer	2019-03-08 18:54:25 +01:00
Matthew Honnibal	3908911da4	Fix import	2019-03-08 17:04:14 +01:00
Matthew Honnibal	8a9181d95a	Merge __init__	2019-03-08 16:58:42 +01:00
Matthew Honnibal	4cf897e8e1	Update from develop	2019-03-08 16:56:54 +01:00
Ines Montani	d260aa17fd	Merge branch 'develop' into feature/lemmatizer	2019-03-08 13:25:00 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Matthew Honnibal	fed0371db7	Remove enums from morphology	2019-03-07 17:14:57 +01:00
Matthew Honnibal	8805966460	Fix moved Morphologizer class	2019-03-07 10:46:27 +01:00
Matthew Honnibal	fc1cc4c529	Move morphologizer under spacy/pipes	2019-03-07 01:36:26 +01:00
Matthew Honnibal	bfa52d9d8a	Move morphologizer within spacy/pipes	2019-03-07 01:34:32 +01:00
Matthew Honnibal	6b0008afc6	Clean up TextCategorizer slightly	2019-02-23 12:28:06 +01:00
Matthew Honnibal	ce1e4eace2	Default to former TextCategorizer model * Keep TextCategorizer default model same as v2.0 * Add option 'architecture' that allows "simple_cnn" to switch to simpler model. * Add option exclusive_classes, defaulting to False. If set to True, the model treats classes as mutually exclusive, i.e. only one class can be true per instance.	2019-02-23 11:55:16 +01:00
Matthew Honnibal	a137e8b418	Fix Pipe.to_bytes() when model uninitialized Closes #3289	2019-02-21 09:42:02 +01:00
Ines Montani	5651a0d052	💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280 ) * Add deprecation warning to Doc.merge and Span.merge * Replace {Doc,Span}.merge with Doc.retokenize	2019-02-15 10:29:44 +01:00
Ines Montani	f146121092	💫 Make handling of [Pipe].labels consistent (#3273 ) * Make handling of [Pipe].labels consistent * Un-xfail passing test * Update spacy/pipeline/pipes.pyx Co-Authored-By: ines <ines@ines.io> * Update spacy/pipeline/pipes.pyx Co-Authored-By: ines <ines@ines.io> * Update spacy/tests/pipeline/test_pipe_methods.py Co-Authored-By: ines <ines@ines.io> * Update spacy/pipeline/pipes.pyx Co-Authored-By: ines <ines@ines.io> * Move error message to spacy.errors * Fix textcat labels and test * Make EntityRuler.labels return tuple as well	2019-02-15 06:03:19 +11:00
Ines Montani	a9f8d17632	💫 Break up large pipeline.pyx (#3246 ) * Break up large pipeline.pyx * Merge some components back together * Fix typo	2019-02-10 12:14:51 +01:00

1 2 3 4

158 Commits