spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-21 15:55:05 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	af945ea8e2	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 15:09:59 +02:00
Matthew Honnibal	4b2e5e59ed	Add flush_cache method to tokenizer, to fix #1061 The tokenizer caches output for common chunks, for efficiency. This cache is be invalidated when the tokenizer rules change, e.g. when a new special-case rule is introduced. That's what was causing #1061. When the cache is flushed, we free the intermediate token chunks. I think this is safe --- but if we start getting segfaults, this patch is to blame. The resolution would be to simply not free those bits of memory. They'll be freed when the tokenizer exits anyway.	2017-07-22 15:06:50 +02:00
Ines Montani	96df9c7154	Update CONTRIBUTORS.md	2017-07-22 15:05:46 +02:00
ines	b22b18a019	Add notes on spacy.explain() to annotation docs	2017-07-22 15:02:15 +02:00
ines	e3f23f9d91	Use latest available version in examples	2017-07-22 14:57:51 +02:00
Matthew Honnibal	23a55b40ca	Default to English noun chunks iterator if no lang set	2017-07-22 14:15:25 +02:00
Matthew Honnibal	9750a0128c	Fix Span.noun_chunks. Closes #1207	2017-07-22 14:14:57 +02:00
Matthew Honnibal	d9b85675d7	Rename regression test	2017-07-22 14:14:35 +02:00
Matthew Honnibal	dfbc7e49de	Add test for Issue #1207	2017-07-22 14:14:01 +02:00
Matthew Honnibal	0ae3807d7d	Fix gaps in Lexeme API. Closes #1031	2017-07-22 13:53:48 +02:00
Matthew Honnibal	83e1b5f1e3	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 13:45:35 +02:00
Matthew Honnibal	45f6961ae0	Add __version__ symbol in __init__.py	2017-07-22 13:45:21 +02:00
Matthew Honnibal	8b9c4c5e1c	Add missing SP symbol to tag map, re #1052	2017-07-22 13:44:17 +02:00
Ines Montani	69396dcfd3	Update CONTRIBUTORS.md	2017-07-22 13:43:15 +02:00
Ines Montani	9af04ea11f	Merge pull request #1161 from AlexisEidelman/patch-1 French NUM_WORDS and ORDINAL_WORDS	2017-07-22 13:40:46 +02:00
Matthew Honnibal	8b581fdac5	Remove unused example	2017-07-22 13:36:54 +02:00
Matthew Honnibal	44dd247e73	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 13:35:30 +02:00
Matthew Honnibal	94267ec50f	Fix merge conflit in printer	2017-07-22 13:35:15 +02:00
Ines Montani	c7708dc736	Merge pull request #1177 from swierh/master Dutch NUM_WORDS and ORDINAL_WORDS	2017-07-22 13:35:08 +02:00
Matthew Honnibal	5916d46ba8	Avoid use of deepcopy in printer	2017-07-22 13:34:01 +02:00
Matthew Honnibal	a405660068	Add commit to tagger example	2017-07-22 13:32:48 +02:00
Matthew Honnibal	3fef5f642b	Rename tagger training example	2017-07-22 13:29:15 +02:00
Matthew Honnibal	8bb443be4f	Add standalone tagger training example	2017-07-22 13:28:51 +02:00
Ines Montani	7c66691790	Merge pull request #1197 from jsparedes/patch-1 Fix url broken	2017-07-21 14:05:26 +02:00
Jorge Paredes	fadacd0d47	Fix url broken The related url to custom named entities was broken	2017-07-16 10:06:32 -05:00
Ines Montani	2d22b63e09	Merge pull request #1186 from lgenerknol/master .../cli/#foo is 404	2017-07-13 17:33:55 +02:00
lgenerknol	2b219caf0d	.../cli/#foo is 404 https://spacy.io/docs/usage/cli/#package is a 404. Changed to https://spacy.io/docs/usage/cli#package Definitely a larger fix possible to deal with trailing slashes	2017-07-12 13:12:24 -04:00
Ines Montani	d79fa8743a	Merge pull request #1185 from lgenerknol/master Missing markup char	2017-07-12 17:27:42 +02:00
lgenerknol	6cf2690943	Missing markup char Frontend displayed: ``` If start_idx and do not mark[...] ``` Note the missing "end_idx" after 'and'.	2017-07-12 11:06:16 -04:00
Ines Montani	9eca6503c1	Merge pull request #1157 from polm/master Add basic Japanese Tokenizer Test	2017-07-10 13:07:11 +02:00
Paul O'Leary McCann	bc87b815cc	Add comment clarifying what LANGUAGES does	2017-07-09 16:28:55 +09:00
Paul O'Leary McCann	04e6a65188	Remove Japanese from LANGUAGES LANGUAGES is a list of languages whose tokenizers get run through a variety of generic tests. Since the generic tests don't check the JA fixture, it blows up when it can't find janome. -POLM	2017-07-09 16:23:26 +09:00
Ines Montani	2b9411bb54	Merge pull request #1181 from val314159/patch-1 make this work in python2.7	2017-07-08 00:15:47 +02:00
val314159	19d4706f69	make this work in python2.7	2017-07-07 13:18:17 -07:00
Swier	29720150f9	fix import of stop words in language data	2017-07-05 14:08:04 +02:00
Swier	f377c9c952	Rename stop_words.py to word_sets.py	2017-07-05 14:06:28 +02:00
Swier	5357874bf7	add Dutch numbers and ordinals	2017-07-05 14:03:30 +02:00
Ines Montani	84eb9d6bd3	Merge pull request #1167 from callumkift/fix/docs-ner-training Fixed error training NER documentation and example	2017-07-01 11:46:31 +02:00
Ines Montani	0c7f5af5ee	Merge pull request #1168 from gispk47/master Update zh language error	2017-07-01 11:43:12 +02:00
gispk47	669bd14213	Update __init__.py remove the empty string return from jieba.cut,this will cause the list of tokens cant be pushed assert error	2017-07-01 13:12:00 +08:00
Callum Kift	dfaeee1f37	fixed bug in training ner documentation and example	2017-06-30 09:56:33 +02:00
Paul O'Leary McCann	c336193392	Parametrize and extend Japanese tokenizer tests	2017-06-29 00:09:40 +09:00
Paul O'Leary McCann	30a34ebb6e	Add importorskip for janome	2017-06-29 00:09:20 +09:00
Alexis	1b3a5d87ba	French NUM_WORDS and ORDINAL_WORDS	2017-06-28 14:11:20 +02:00
Paul O'Leary McCann	e56fea14eb	Add basic Japanese tokenizer test	2017-06-28 01:24:25 +09:00
Paul O'Leary McCann	84041a2bb5	Make create_tokenizer work with Japanese	2017-06-28 01:18:05 +09:00
Ines Montani	f69ff15089	Update CONTRIBUTORS.md	2017-06-27 14:49:02 +02:00
Ines Montani	d6e08f2bf6	Merge pull request #1142 from garfieldnate/patch-1 fix confusing typo	2017-06-26 10:41:47 +02:00
Nathan Glenn	81166c3d56	fix confusing typo This document describes the `Vocab` class, not the `Span` class.	2017-06-21 19:22:30 +02:00
Ines Montani	9335736c20	Merge pull request #1127 from bartbroere/master Fixed a minor typo in the documentation	2017-06-13 13:15:20 +02:00

1 2 3 4 5 ...

5153 Commits