spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-19 18:42:37 +03:00

Author	SHA1	Message	Date
Kamolsit Mongkolsrisawat	dcc67f3f51	Update Thai tokenizer_exception list (#3529 ) * add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq * update tokenizer_exceptions word list * add contributor file	2019-04-03 09:13:36 +02:00
ivigamberdiev	5e5641616d	Update links and http -> https (#3532 ) * update links and http -> https * SCA	2019-04-02 17:36:22 +02:00
svlandeg	85b4319f33	specify encoding in files	2019-04-02 15:05:31 +02:00
svlandeg	673c81bbb4	unicode string for python 2.7	2019-04-02 13:52:07 +02:00
svlandeg	eca9cc5417	fixing Issue #3521 by adding all hyphen variants for each stopword	2019-04-02 13:24:59 +02:00
svlandeg	e7062cf699	failing test for Issue #3521	2019-04-02 13:15:35 +02:00
svlandeg	1424b12b09	failing test for Issue #3449	2019-04-02 13:06:37 +02:00
Ines Montani	24cecdb44f	Update compatibility [ci skip]	2019-04-01 16:25:16 +02:00
jeannefukumaru	6cdb7b2e04	added tag_map for indonesian (#3515 ) * added tag_map for indonesian * changed tag map from .py to .txt to see if tests pass * added symbols import * added utf8 encoding flag * added missing SCONJ symbol * Auto-format * Remove unused imports * Make tag map available in Indonesian defaults	2019-04-01 12:27:48 +02:00
Ines Montani	c23e234d65	Auto-format	2019-04-01 12:11:27 +02:00
Ines Montani	5821b020d5	Merge branch 'spacy.io'	2019-04-01 11:47:59 +02:00
Ines Montani	0a0b1087b0	Make tag map available in Indonesian defaults	2019-04-01 11:46:51 +02:00
Ines Montani	5d9212c44c	Remove unused imports	2019-04-01 11:46:25 +02:00
Ines Montani	8d6b544632	Auto-format	2019-04-01 11:45:43 +02:00
jeannefukumaru	6567f27849	added missing SCONJ symbol	2019-04-01 17:02:53 +08:00
jeannefukumaru	082a0a2232	added utf8 encoding flag	2019-04-01 16:37:11 +08:00
jeannefukumaru	a741bed7a7	added symbols import	2019-04-01 16:21:06 +08:00
jeannefukumaru	745cf0c914	changed tag map from .py to .txt to see if tests pass	2019-04-01 07:04:50 +08:00
jeannefukumaru	3cc897102f	added tag_map for indonesian	2019-04-01 00:00:08 +08:00
Matthew Honnibal	e64b241f9c	Merge branch 'master' of https://github.com/explosion/spaCy	2019-03-31 13:58:38 +02:00
Ines Montani	b070e0caf7	Update landing.js	2019-03-30 22:26:46 +01:00
Ines Montani	9d1221943b	Merge branch 'master' into spacy.io	2019-03-30 20:32:14 +01:00
Ines Montani	037ffdfd3f	Add spaCy IRL to landing [ci skip]	2019-03-30 20:32:03 +01:00
Ines Montani	68900066e0	Merge pull request #3459 from svlandeg/feature/el-framework Basic framework and APIs for entity linker	2019-03-29 14:02:22 +01:00
Hiromu Hota	914b9ff3d2	Tags are joined with a comma and padded with asterisks (#3491 ) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Fix a bug in the test of JapaneseTokenizer. This PR may require @polm's review. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> Bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-28 16:17:31 +01:00
Ines Montani	730f759b4f	Merge branch 'master' into spacy.io	2019-03-28 15:26:17 +01:00
Ines Montani	7d033a7b89	Fix met a description in universe projects [ci skip]	2019-03-28 15:26:01 +01:00
Ines Montani	fe2cb642ac	Merge branch 'master' into spacy.io	2019-03-28 15:13:39 +01:00
David	74e738dd4d	adds textpipe to universe (#3500 ) [ci skip] * Adds textpipe to universe * signed contributor agreement * Adjust formatting, code style and use "standalone" category	2019-03-28 15:13:19 +01:00
Ines Montani	04a9fb1a02	Merge branch 'master' into spacy.io	2019-03-28 13:34:46 +01:00
Samuel Kane	06a1846379	fix(util): fix decaying function output (#3495 ) * fix(util): fix decaying function output * fix(util): better test and adhere to code standards * fix(util): correct variable name, pytestify test, update website text	2019-03-28 13:24:47 +01:00
Duygu Altinok	5a7bc6b39d	Fix/irreg adverbs extension (#3499 ) * extended list of irreg adverbs * added test to exceptions * fixed typo	2019-03-28 13:23:33 +01:00
Bharat Raghunathan	1db3e47509	DOC: Update tokenizer docs to include default value for batch_size in pipe (#3492 )	2019-03-28 12:48:02 +01:00
Ines Montani	2ed16d82bf	Fix social image	2019-03-26 18:27:40 +01:00
Matthew Honnibal	f77bf2bdb1	Fix GPU training for textcat. Closes #3473	2019-03-26 13:36:11 +01:00
Sofie	a4a6bfa4e1	Merge branch 'master' into feature/el-framework	2019-03-26 11:00:02 +01:00
svlandeg	8814b9010d	entity as one field instead of both ID and name	2019-03-25 18:10:41 +01:00
Ines Montani	9e14b2b69f	Add Estonian to docs [ci skip] (closes #3482 )	2019-03-25 18:01:54 +01:00
Wannaphong Phatthiyaphaibun	297a051992	Update Thai tag map (#3480 ) * Update Thai tag map Update Thai tag map * Create wannaphongcom.md	2019-03-25 16:53:26 +01:00
Ines Montani	21ade53ef7	Merge branch 'master' into spacy.io	2019-03-25 13:05:00 +01:00
Ines Montani	db938ab0e3	Update favicon (closes #3475 ) [ci skip]	2019-03-25 13:04:47 +01:00
Ines Montani	c8c1baaea8	Update binderVersion	2019-03-25 12:17:03 +01:00
Matthew Honnibal	85dcd9477e	Set version to v2.1.3	2019-03-23 16:47:57 +01:00
Matthew Honnibal	f436efd8a4	Small tweak to ensemble textcat model	2019-03-23 16:47:26 +01:00
Ines Montani	200d8bdb3c	Merge branch 'spacy.io' [ci skip]	2019-03-23 16:46:34 +01:00
Ines Montani	1e5b917d75	Fix formatting [ci skip]	2019-03-23 16:45:50 +01:00
Matthew Honnibal	6c783f8045	Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs	2019-03-23 16:44:44 +01:00
Ines Montani	5944cf10c7	Add blog post to v2.1 page	2019-03-23 16:34:23 +01:00
Ines Montani	ffebdad08d	Add cheat sheet to spaCy 101	2019-03-23 16:32:55 +01:00
Ines Montani	06bf130890	💫 Add better and serializable sentencizer (#3471 ) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs	2019-03-23 15:45:02 +01:00

1 2 3 4 5 ...

10091 Commits