spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 10:56:45 +03:00

Author	SHA1	Message	Date
Ines Montani	bc40dad7d9	Add entity rules	2016-12-18 15:36:53 +01:00
Ines Montani	eaa3b1319d	Fix formatting	2016-12-18 15:36:53 +01:00
Ines Montani	704c7442e0	Break language data components into their own files	2016-12-18 15:36:53 +01:00
Ines Montani	62655fd36f	Add ENT_ID constant	2016-12-18 15:36:53 +01:00
Matthew Honnibal	fa272fdf12	Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data	2016-12-18 15:00:21 +01:00
Matthew Honnibal	57c4341453	Refactor loading of morphology exceptions, adding a method add_special_case.	2016-12-18 14:59:44 +01:00
Ines Montani	77cf2fb0f6	Remove unnecessary argument in test	2016-12-18 14:06:27 +01:00
Ines Montani	121c310566	Remove trailing whitespace	2016-12-18 14:06:27 +01:00
Ines Montani	0fc4e45cb3	Fix tag map for German	2016-12-18 13:30:03 +01:00
Ines Montani	28326649f3	Fix typo	2016-12-18 13:30:03 +01:00
Matthew Honnibal	0595cc0635	Change test595 to mock data, instead of requiring model.	2016-12-18 13:28:51 +01:00
Matthew Honnibal	a4eb5c2bff	Check POS key in lemmatizer, to update it for new data format	2016-12-18 13:28:20 +01:00
Matthew Honnibal	28d63ec58e	Restore missing '' character in tokenizer exceptions.	2016-12-18 05:34:51 +01:00
Ines Montani	a9421652c9	Remove duplicates in tag map	2016-12-17 22:44:31 +01:00
Ines Montani	69baf1c9a8	Fix tag map	2016-12-17 22:44:22 +01:00
Ines Montani	577adad945	Fix formatting	2016-12-17 14:00:52 +01:00
Ines Montani	fc4ad17136	Fix typo	2016-12-17 14:00:47 +01:00
Ines Montani	bb94e784dc	Fix typo	2016-12-17 13:59:30 +01:00
Ines Montani	afda532595	Use symbols in tag map	2016-12-17 13:56:24 +01:00
Ines Montani	07249145c9	Fix formatting	2016-12-17 13:34:46 +01:00
Ines Montani	dd55d085b6	Reformat dutch language data to match new style	2016-12-17 13:26:01 +01:00
Ines Montani	f2c48ef504	Resolve stopwords conflict to merge Dutch	2016-12-17 13:08:16 +01:00
Matthew Honnibal	ff03ade08f	Merge pull request #688 from nlesc-sherlock/dutch Support for Dutch in SpaCy	2016-12-17 22:44:58 +11:00
Ines Montani	a22322187f	Add missing lemmas to tokenizer exceptions (fixes #674 )	2016-12-17 12:42:41 +01:00
Ines Montani	5445074cbd	Expand tokenizer exceptions with unicode apostrophe (fixes #685 )	2016-12-17 12:34:08 +01:00
Ines Montani	e0a7b5c612	Fix formatting	2016-12-17 12:33:09 +01:00
Ines Montani	08162dce67	Move shared functions and constants to global language data	2016-12-17 12:32:48 +01:00
Ines Montani	6a60a61086	Move update_exc to global language data utils	2016-12-17 12:29:02 +01:00
Ines Montani	f324311249	Add global language data utils	2016-12-17 12:27:41 +01:00
Ines Montani	487ce1e20a	Add encoding declaration	2016-12-17 12:25:44 +01:00
Ines Montani	d8d50a0334	Add tokenizer exception for "gonna" (fixes #691 )	2016-12-17 11:59:28 +01:00
Ines Montani	c69b77d8aa	Revert "Add exception for "gonna"" This reverts commit `280c03f67b`.	2016-12-17 11:56:44 +01:00
Ines Montani	280c03f67b	Add exception for "gonna"	2016-12-17 11:54:59 +01:00
Ines Montani	5031a015e2	Fix typo in stopwords (fixes #689 )	2016-12-15 17:57:06 +01:00
Janneke van der Zwaan	4a3fdcce8a	Merge github.com:explosion/spaCy into dutch	2016-12-13 09:25:23 +01:00
Matthew Honnibal	5965d3c2a7	Revert "Add acl to symbols.pyx"	2016-12-12 10:10:28 +11:00
Matthew Honnibal	6dee76dfed	Update symbols.pxd	2016-12-12 10:09:58 +11:00
Pokey Rule	18a15c0777	Add acl to symbols.pyx	2016-12-11 20:00:07 +00:00
Ines Montani	63024466a9	Add Portuguese stopwords	2016-12-08 20:45:07 +01:00
Ines Montani	7bfe2d4abc	Update Portuguese language data	2016-12-08 20:41:41 +01:00
Ines Montani	c0c5f31950	Remove unused data and download script	2016-12-08 20:39:49 +01:00
Ines Montani	0a6d529104	Remove unused data	2016-12-08 20:36:56 +01:00
Ines Montani	1b3b043660	Add French stopwords	2016-12-08 20:12:43 +01:00
Ines Montani	8863e504eb	Update French language data	2016-12-08 20:07:14 +01:00
Ines Montani	7cb9f51be6	Add Italian stopwords	2016-12-08 20:05:25 +01:00
Ines Montani	470a0e0bea	Update Italian language data	2016-12-08 19:52:18 +01:00
Ines Montani	1a284d342e	Add Spanish language data	2016-12-08 19:47:03 +01:00
Ines Montani	0c39654786	Remove unused import	2016-12-08 19:46:53 +01:00
Ines Montani	e47ee94761	Split punctuation into its own file	2016-12-08 19:46:43 +01:00
Ines Montani	70b51ed7c8	Remove time from German language data	2016-12-08 19:45:50 +01:00
Ines Montani	e8ae588be9	Add emoticons	2016-12-08 19:45:18 +01:00
Ines Montani	5908c0ed9f	Fix formatting	2016-12-08 19:45:11 +01:00
Ines Montani	311b30ab35	Reorganize exceptions for English and German	2016-12-08 13:58:32 +01:00
Ines Montani	66c7348cda	Add update_exc util function	2016-12-08 13:58:12 +01:00
Ines Montani	1256232fad	Fix formatting	2016-12-08 13:56:40 +01:00
Ines Montani	8e977cc71c	Fix formatting	2016-12-08 13:56:17 +01:00
Ines Montani	0176b99004	Fix formatting	2016-12-08 12:48:02 +01:00
Ines Montani	877f09218b	Add more custom rules for abbreviations	2016-12-08 12:47:01 +01:00
Ines Montani	bfaa42636c	Update language data for German	2016-12-08 12:01:09 +01:00
Ines Montani	ec44bee321	Fix capitalization on morphological features	2016-12-08 12:00:54 +01:00
Ines Montani	ce979553df	Resolve conflict	2016-12-07 21:16:52 +01:00
Ines Montani	8350d65695	Change morphology and lemmatizer API Take morphology features as object instead of keyword arguments	2016-12-07 21:12:49 +01:00
Ines Montani	52e7d634df	Remove trailing whitespace	2016-12-07 21:12:19 +01:00
Ines Montani	0d07d7fc80	Apply emoticon exceptions to tokenizer	2016-12-07 21:11:59 +01:00
Ines Montani	71f0f34cb3	Fix formatting	2016-12-07 21:11:29 +01:00
Ines Montani	9413bcd9ee	Declare encoding and unicode literals	2016-12-07 21:10:34 +01:00
Ines Montani	a280ff2657	Fix __all__	2016-12-07 21:10:12 +01:00
Ines Montani	ba8721953c	Add missing emoticons	2016-12-07 21:09:44 +01:00
Ines Montani	1285c4ba93	Update English language data	2016-12-07 20:33:28 +01:00
Ines Montani	79dce0aabe	Add emoticons	2016-12-07 20:33:28 +01:00
Ines Montani	a662a95294	Add line breaks	2016-12-07 20:33:28 +01:00
Ines Montani	07f0efb102	Add test for tokenizer regular expressions	2016-12-07 20:33:28 +01:00
Ines Montani	e0712d1b32	Reformat language data	2016-12-07 20:33:28 +01:00
Matthew Honnibal	0c0f4c965d	Increment version	2016-12-03 11:16:52 +01:00
Matthew Honnibal	f6e356aada	Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667	2016-12-02 11:05:50 +01:00
Janneke van der Zwaan	88869e0e07	Merge github.com:explosion/spaCy into dutch	2016-11-30 17:13:39 +01:00
Janneke van der Zwaan	51ade86b86	Update language data with tag map from UD_Dutch	2016-11-30 14:41:23 +01:00
Janneke van der Zwaan	90f6ff12c9	Update Dutch language data - Use Dutch tag map - remove tokenizer exceptions	2016-11-30 11:59:39 +01:00
dafnevk	7b8f4c49f2	Added language Dutch to init file	2016-11-29 16:42:05 +01:00
Matthew Honnibal	296d33a4fc	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-26 12:36:18 +01:00
Matthew Honnibal	1f6c37c6f5	Fix create_tokenizer when nlp is None	2016-11-26 12:36:04 +01:00
Matthew Honnibal	c7889492f9	Fix model saving error for Python 3	2016-11-25 18:04:30 -06:00
Matthew Honnibal	bc0a202c9c	Fix unicode problem in nonproj module	2016-11-25 17:29:17 -06:00
Matthew Honnibal	6dd3b94fa6	Filter out deprecated attributes when reading special-case tokenization rules.	2016-11-25 09:57:18 -06:00
Matthew Honnibal	e879c79b8c	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-25 09:18:28 -06:00
Matthew Honnibal	a335c6dcc2	Exclude morphs from deprecated token attributes for now	2016-11-25 16:17:32 +01:00
Matthew Honnibal	f799a07f25	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-25 09:16:43 -06:00
Matthew Honnibal	159e8c46e1	Merge old training fixes with newer state	2016-11-25 09:16:36 -06:00
Matthew Honnibal	846e80f2f4	Exclude morphs from deprecated token attributes for now	2016-11-25 16:14:54 +01:00
Matthew Honnibal	664f2dd1c0	Allow dep to be None in scorer, for missing labels.	2016-11-25 09:02:49 -06:00
Matthew Honnibal	39341598bb	Fix NER label calculation	2016-11-25 09:02:22 -06:00
Matthew Honnibal	ca773a1f53	Tweak arc_eager n_gold to deal with negative costs, and improve error message.	2016-11-25 09:01:52 -06:00
Matthew Honnibal	a2f55e7015	Pass cfg through loading, for training.	2016-11-25 09:01:20 -06:00
Matthew Honnibal	608d8f5421	Pass cfg through parser, and have is_valid default to 1, not 0 when resetting state	2016-11-25 09:00:21 -06:00
Matthew Honnibal	cc7e607a8a	Fix gold.pyx for 1.0	2016-11-25 08:57:59 -06:00
root	080d29e092	Fix train.py for 1.0	2016-11-25 08:55:33 -06:00
Matthew Honnibal	6652f2a135	Test #656 , #624 : special case rules for tokenizer with attributes.	2016-11-25 12:44:13 +01:00
Matthew Honnibal	1e0f566d95	Fix #656 , #624 : Support arbitrary token attributes when adding special-case rules.	2016-11-25 12:43:24 +01:00
Matthew Honnibal	87613edf8f	Add set_struct_attr staticmethod to token	2016-11-25 12:41:47 +01:00
Matthew Honnibal	fb69aa648f	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-25 11:35:44 +01:00

1 2 3 4 5 ...

2073 Commits