spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-12 13:55:48 +03:00

Author	SHA1	Message	Date
richardpaulhudson	a1e07f0d14	Request to include Holmes in spaCy Universe (#3685 ) * Request to add Holmes to spaCy Universe Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model. * Added	2019-05-08 02:42:03 +02:00
Ines Montani	505c9e0e19	Add util.filter_spans helper (#3686 )	2019-05-08 02:33:40 +02:00
svlandeg	9f33732b96	using entity descriptions and article texts as input embedding vectors for training	2019-05-07 16:03:42 +02:00
F0rge1cE	dd1e6b0bc6	Fix offset bug in loading pre-trained word2vec. (#3689 ) * Fix offset bug in loading pre-trained word2vec. * add contributor agreement	2019-05-06 23:00:38 +02:00
Bram Vanroy	4762f56062	Re-added Universe readme (#3688 ) (closes #3680 )	2019-05-06 21:10:58 +02:00
Bram Vanroy	8e6f8deaf6	Re-added Universe readme (#3688 ) (closes #3680 )	2019-05-06 21:08:01 +02:00
Ines Montani	78cb807a9a	Auto-format [ci skip]	2019-05-06 16:58:29 +02:00
svlandeg	7e348d7f7f	baseline evaluation using highest-freq candidate	2019-05-06 15:13:50 +02:00
Ines Montani	dd153b2b33	Simplify helper (see #3681 ) [ci skip]	2019-05-06 15:13:10 +02:00
Ines Montani	f8fce6c03c	Fix typo (see #3681 )	2019-05-06 15:02:11 +02:00
Ines Montani	f2a56c1b56	Rewrite example to use Retokenizer (resolves #3681 ) Also add helper to filter spans	2019-05-06 14:51:18 +02:00
svlandeg	6961215578	refactor code to separate functionality into different files	2019-05-06 10:56:56 +02:00
Brad Jascob	955b95cb8b	Fix inconsistant lemmatizer issue #3484 (#3646 ) * Fix inconsistant lemmatizer issue #3484 * Remove test case	2019-05-04 18:16:03 +02:00
svlandeg	f5190267e7	run only 100M of WP data as training dataset (9%)	2019-05-03 18:09:09 +02:00
svlandeg	4e929600e5	fix WP id parsing, speed up processing and remove ambiguous strings in one doc (for now)	2019-05-03 17:37:47 +02:00
svlandeg	34600c92bd	try catch per article to ensure the pipeline goes on	2019-05-03 15:10:09 +02:00
Ines Montani	b4d142e3c4	Adjust wording and formatting [ci skip]	2019-05-03 12:00:31 +02:00
Ines Montani	04658ebbb2	Relax jsonschema pin (closes #3628 )	2019-05-03 11:58:58 +02:00
d5555	ba4bcbf285	Update universe.json (#3653 ) [ci skip] * Update universe.json * Update universe.json	2019-05-03 11:50:12 +02:00
svlandeg	bbcb9da466	creating training data with clean WP texts and QID entities true/false	2019-05-03 10:44:29 +02:00
svlandeg	cba9680d13	run NER on clean WP text and link to gold-standard entity IDs	2019-05-02 17:24:52 +02:00
svlandeg	581dc9742d	parsing clean text from WP articles to use as input data for NER and NEL	2019-05-02 17:09:56 +02:00
svlandeg	8353552191	cleanup	2019-05-01 23:26:16 +02:00
svlandeg	1ae41daaa9	allow small rounding errors	2019-05-01 23:05:40 +02:00
Dobita21	f95ecedd83	Add Thai lex_attrs (#3655 ) * test sPacy commit to git fri 04052019 10:54 * change Data format from my format to master format * ทัทั้งนี้ ---> ทั้งนี้ * delete stop_word translate from Eng * Adjust formatting and readability * add Thai norm_exception * Add Dobita21 SCA * editรึ : หรือ, * Update Dobita21.md * Auto-format * Integrate norms into language defaults * add acronym and some norm exception words * add lex_attrs * Add lexical attribute getters into the language defaults * fix LEX_ATTRS Co-authored-by: Donut <dobita21@gmail.com> Co-authored-by: Ines Montani <ines@ines.io>	2019-05-01 12:03:14 +02:00
张晓飞	ba1ff00370	update response after calling add_pipe (#3661 ) * update response after calling add_pipe component:print_info is appened in the last, so need show it at the end of pipeline * Create henry860916.md	2019-05-01 12:02:18 +02:00
BreakBB	8952004dfc	Update French example sents and add two German stop words (#3662 ) * Update french example sentences * Add 'anderem' and 'ihren' to German stop words	2019-05-01 12:01:35 +02:00
svlandeg	3629a52ede	reading all persons in wikidata	2019-05-01 01:00:59 +02:00
svlandeg	60b54ae8ce	bulk entity writing and experiment with regex wikidata reader to speed up processing	2019-05-01 00:00:38 +02:00
svlandeg	653b7d9c87	calculate entity raw counts offline to speed up KB construction	2019-04-30 11:39:42 +02:00
Ramiro Gómez	8ee4100f8f	Remove dangling M (#3657 ) I assume this is a typo. Sorry if it has a meaning that I'm not aware of.	2019-04-29 19:44:43 +02:00
Amit Chaudhary	167d63af31	Fix broken link to Dive Into Python 3 website (#3656 ) * Fix broken link to Dive Into Python 3 website * Sign spaCy Contributor Agreement	2019-04-29 19:44:00 +02:00
Ramiro Gómez	e7e5999ddc	Create yaph.md so I can contribute (#3658 )	2019-04-29 19:43:06 +02:00
svlandeg	19e8f339cb	deduce entity freq from WP corpus and serialize vocab in WP test	2019-04-29 17:37:29 +02:00
svlandeg	387263d618	simplify chains	2019-04-29 13:58:07 +02:00
Brad Jascob	6fcafcc564	Doc changes for local website setup (#3651 )	2019-04-27 13:28:23 +02:00
Ivan Tham	fa94f83697	Improve redundant variable name (#3643 ) * Improve redundant variable name * Apply suggestions from code review Co-Authored-By: pickfire <pickfire@riseup.net>	2019-04-26 16:50:14 +02:00
Ines Montani	bf92625ede	Update from master	2019-04-26 13:19:50 +02:00
Ines Montani	dc87fb805d	Merge branch 'master' of https://github.com/explosion/spaCy	2019-04-26 13:17:57 +02:00
Ines Montani	62060ae9c6	Merge branch 'spacy.io'	2019-04-26 13:17:52 +02:00
Brad Jascob	9afa0d6723	Update Universe Website for pyInflect (#3641 )	2019-04-26 13:17:36 +02:00
svlandeg	54d0cea062	unit test for KB serialization	2019-04-24 23:52:34 +02:00
svlandeg	3e0cb69065	KB aliases to and from file	2019-04-24 20:24:24 +02:00
svlandeg	ad6c5e581c	writing and reading number of entries to/from header	2019-04-24 15:31:44 +02:00
svlandeg	6e3223f234	bulk loading in proper order of entity indices	2019-04-24 11:26:38 +02:00
Ines Montani	db7c0dbfd6	Update seo.js	2019-04-23 18:39:30 +02:00
svlandeg	694fea597a	dumping all entryC entries + (inefficient) reading back in	2019-04-23 18:36:50 +02:00
svlandeg	8e70a564f1	custom reader and writer for _EntryC fields (first stab at it - not complete)	2019-04-23 16:33:40 +02:00
Dobita21	721e1fc86c	update norm_exceptions (#3627 ) * test sPacy commit to git fri 04052019 10:54 * change Data format from my format to master format * ทัทั้งนี้ ---> ทั้งนี้ * delete stop_word translate from Eng * Adjust formatting and readability * add Thai norm_exception * Add Dobita21 SCA * editรึ : หรือ, * Update Dobita21.md * Auto-format * Integrate norms into language defaults * add acronym and some norm exception words	2019-04-23 12:48:03 +02:00
Ines Montani	ec0d840ab5	Document early stopping	2019-04-22 14:31:32 +02:00

... 6 7 8 9 10 ...

10451 Commits