spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-10 00:02:19 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	331d338b8b	Merge pull request #1246 from polm/ja-pos-tagger [wip] Sample implementation of Japanese Tagger (ref #1214)	2017-10-09 04:00:53 +02:00
Wannaphong Phatthiyaphaibun	1abf472068	add th test	2017-09-21 12:56:58 +07:00
Paul O'Leary McCann	95050201ce	Add importorskip for Japanese fixture	2017-08-22 21:30:59 +09:00
Paul O'Leary McCann	6e9e686568	Sample implementation of Japanese Tagger (ref #1214 ) This is far from complete but it should be enough to check some things. 1. Mecab transition. Janome doesn't support Unidic, only IPAdic, but UD tag mappings are based on Unidic. This switches out Mecab for Janome to get around that. 2. Raw tag extension. A simple tag map can't meet the specifications for UD tag mappings, so this adds an extra field to ambiguous cases. For this demo it just deals with the simplest case, which only needs to look at the literal token. (In reality it may be necessary to look at the whole sentence, but that's another issue.) 3. General code structure. Seems nobody else has implemented a custom Tagger yet, so still not sure this is the correct way to pass the vocabulary around, for example. Any feedback would be greatly appreciated. -POLM	2017-08-08 01:27:15 +09:00
Paul O'Leary McCann	bc87b815cc	Add comment clarifying what LANGUAGES does	2017-07-09 16:28:55 +09:00
Paul O'Leary McCann	04e6a65188	Remove Japanese from LANGUAGES LANGUAGES is a list of languages whose tokenizers get run through a variety of generic tests. Since the generic tests don't check the JA fixture, it blows up when it can't find janome. -POLM	2017-07-09 16:23:26 +09:00
Paul O'Leary McCann	30a34ebb6e	Add importorskip for janome	2017-06-29 00:09:20 +09:00
Paul O'Leary McCann	e56fea14eb	Add basic Japanese tokenizer test	2017-06-28 01:24:25 +09:00
luvogels	d12a0b6431	Hooked up tokenizer tests	2017-04-26 23:21:41 +02:00
oeg	c693d40791	feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests	2017-04-06 18:48:45 +02:00
Ines Montani	97cb4d5e3c	Merge branch 'master' into master	2017-03-25 10:03:47 +01:00
Iddo Berger	da135bd823	add hebrew tokenizer	2017-03-24 18:27:44 +03:00
Matthew Honnibal	a630726b13	Fix typo in tests	2017-03-16 20:50:36 -05:00
Matthew Honnibal	f98b30583f	Fix tests	2017-03-16 19:48:00 -05:00
Matthew Honnibal	db51abf685	Fix tests	2017-03-16 18:53:47 -05:00
Aniruddha Adhikary	696215a3fb	add tests for Bengali	2017-03-05 11:25:12 +06:00
ines	21f09d10d7	Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"" This reverts commit `f02a2f9322`.	2017-02-10 13:17:05 +01:00
ines	f02a2f9322	Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions" This reverts commit `b95afdf39c`, reversing changes made to `b0ccf32378`.	2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque	309da78bf0	Merge branch 'master' into tokenizer_exceptions	2017-02-09 16:32:12 +01:00
Michael Wallin	35100c8bdd	[issue 805] Add regression test and the required fixture	2017-02-04 16:21:34 +02:00
Michael Wallin	1a1952afa5	[finnish] Add initial tests for tokenizer	2017-02-04 13:54:10 +02:00
Raphaël Bournhonesque	85f951ca99	Add tokenizer exceptions for French	2017-02-02 08:36:16 +01:00
Raphaël Bournhonesque	1be9c0e724	Add fr tokenization unit tests	2017-01-24 10:57:37 +01:00
Ines Montani	4bb5b89ee4	Add text_file_b fixture using BytesIO	2017-01-13 02:23:50 +01:00
Ines Montani	09acfbca01	Add Lemmatizer fixture	2017-01-12 23:38:55 +01:00
Ines Montani	514bfa2597	Add path fixture for spaCy data path	2017-01-12 23:38:47 +01:00
Ines Montani	d5d774413a	Update comments on EN and DE fixtures	2017-01-12 22:03:07 +01:00
Ines Montani	eac3f700fb	Add fixture for entity recognizer	2017-01-12 21:56:32 +01:00
Ines Montani	aeb747e10c	Adjust formatting	2017-01-12 16:51:12 +01:00
Ines Montani	9b6784bab5	Add fixture for StringStore	2017-01-12 15:05:40 +01:00
Ines Montani	09807addff	Add en_parser fixture	2017-01-11 21:29:59 +01:00
Ines Montani	928db7e419	Fix StringIO import for Python 3	2017-01-11 14:07:48 +01:00
Ines Montani	c682b8ca90	Merge conftests into one cohesive file	2017-01-11 13:56:32 +01:00
Matthew Honnibal	0cf4aff470	Set default path in EN/DE tests.	2016-10-17 01:52:49 +02:00
Matthew Honnibal	9cc9ce0f14	Load with default path=False in tests.	2016-10-15 14:13:23 +02:00
Matthew Honnibal	1318d0bc65	Test with the non-loaded versions of the English and German pipelines.	2016-10-12 19:13:31 +02:00
Matthew Honnibal	2debc4e0a2	Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class.	2016-09-26 11:57:54 +02:00
Wolfgang Seeker	7b246c13cb	reformulate noun chunk tests for English	2016-05-03 14:24:35 +02:00
Wolfgang Seeker	1786331cd8	add model sanity test	2016-05-03 12:51:47 +02:00
Henning Peters	235f094534	untangle data_path/via	2016-01-16 12:23:45 +01:00
Matthew Honnibal	aec130af56	Use util.Package class for io Previous Sputnik integration caused API change: Vocab, Tagger, etc were loaded via a from_package classmethod, that required a sputnik.Package instance. This forced users to first create a sputnik.Sputnik() instance, in order to acquire a Package via sp.pool(). Instead I've created a small file-system shim, util.Package, which allows classes to have a .load() classmethod, that accepts either util.Package objects, or strings. We can later gut the internals of this and make it a proxy for Sputnik if we need more functionality that should live in the Sputnik library. Sputnik is now only used to download and install the data, in spacy.en.download	2015-12-29 18:00:48 +01:00
Henning Peters	9027cef3bc	access model via sputnik	2015-12-07 06:01:28 +01:00
Matthew Honnibal	4e16f9e435	* Move tests underneath spacy/	2015-10-26 00:07:31 +11:00

43 Commits