spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-08 21:35:00 +03:00

Author	SHA1	Message	Date
Ines Montani	ea20b72c08	💫 Make like_num work for prefixed numbers (#2808 ) * Only split + prefix if not numbers * Make like_num work for prefixed numbers * Add test for like_num	2018-10-01 10:49:14 +02:00
John Stewart	9faea3ff10	Update Keras Example for (Parikh et al, 2016) implementation (#2803 ) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround	2018-10-01 10:28:45 +02:00
Ioannis Daras	405a826436	Correct error in spacy universe docs concerning spacy-lookup (#2814 )	2018-10-01 10:24:50 +02:00
Filipe Caixeta	6c498f9ff4	Update Portuguese Language (#2790 ) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language	2018-09-29 09:51:45 +02:00
Matthew Honnibal	b39810d692	Fix copy_reg compatibility on _serialize module	2018-09-28 15:23:14 +02:00
Matthew Honnibal	f82f8ba5dd	Fix serialization when empty parser model. Closes #2482	2018-09-28 15:18:52 +02:00
Matthew Honnibal	d5a6c63b62	Add regression test for #2482	2018-09-28 15:18:30 +02:00
Matthew Honnibal	e3e9fe18d4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-09-28 14:27:35 +02:00
Matthew Honnibal	0323f5be0c	Fix _serialize module	2018-09-28 14:27:24 +02:00
Matthew Honnibal	05b6103a0c	Try to fix version pin for msgpack-numpy	2018-09-28 14:07:00 +02:00
Ines Montani	5d56eb70d7	Tidy up tests	2018-09-27 16:41:57 +02:00
Ines Montani	1f1bab9264	Remove unused import	2018-09-27 16:41:37 +02:00
Matthew Honnibal	6430b1fe64	Restore encoding arg on msgpack-numpy	2018-09-27 15:58:21 +02:00
Matthew Honnibal	276aa83d1a	Require older msgpack-numpy	2018-09-27 15:34:24 +02:00
Matthew Honnibal	2ac69facc6	Fix Python 2 test failure	2018-09-27 15:34:16 +02:00
Matthew Honnibal	72778375fb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-09-27 13:54:49 +02:00
Matthew Honnibal	96fe314d8d	Fix bug when too many entity types. Fixes #2800	2018-09-27 13:54:34 +02:00
Suraj Rajan	bbdc6456c6	Set up dependency tree pattern matching skeleton (#2732 )	2018-09-27 13:27:18 +02:00
Matthew Honnibal	8809dc4514	Remove deprecated encoding argument to msgpack	2018-09-27 12:56:23 +02:00
Matthew Honnibal	bae6b3e2b3	Merge branch 'master' of https://github.com/explosion/spaCy	2018-09-27 12:50:31 +02:00
Ines Montani	71cdbeada7	Revert "Also include lowercase norm exceptions" This reverts commit `70f4e8adf3`.	2018-09-27 12:29:25 +02:00
Charles-Axel Dein	014dd47c70	Add jupyter=True to displacy.render in documentation (#2806 )	2018-09-27 12:28:04 +02:00
Przemysław Hojnacki	966b583d5e	agreement of contributor, may I introduce a tiny pl languge contribution (#2799 ) * Contributors agreement * Contributors agreement * Contributors agreement	2018-09-27 12:25:22 +02:00
Charles-Axel Dein	94ad3c55f1	Add charlax's contributor agreement (#2805 )	2018-09-27 12:24:42 +02:00
darindf	8227566805	Fix error (#2802 ) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement	2018-09-26 21:31:03 +02:00
Ines Montani	5e0dfb34fa	Merge branch 'master' of https://github.com/explosion/spaCy	2018-09-26 11:13:58 +02:00
Ines Montani	70f4e8adf3	Also include lowercase norm exceptions	2018-09-25 12:22:02 +02:00
Keshan	9a016d17c2	Adding basic support for Sinhala language. (#2788 ) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement	2018-09-25 12:18:25 +02:00
Pranshu Jethmalani	9fd27d777e	Fix typo (#2795 ) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer"	2018-09-25 12:12:40 +02:00
Matthew Honnibal	b42c123e5d	Fix regression introduced by `1759abf1e`	2018-09-25 11:08:58 +02:00
Matthew Honnibal	500898907b	Fix regression in parser.begin_training()	2018-09-25 11:08:31 +02:00
Ines Montani	3c4e3ade30	Fix typo (closes #2784 )	2018-09-21 10:45:11 +02:00
mauryaland	68b3c544d5	Adding French hyphenated first name (#2786 )	2018-09-21 10:38:13 +02:00
Matthew Honnibal	1759abf1e5	Fix bug in sentence starts for non-projective parses The set_children_from_heads function assumed parse trees were projective. However, non-projective parses may be passed in during deserialization, or after deprojectivising. This caused incorrect sentence boundaries to be set for non-projective parses. Close #2772.	2018-09-19 14:50:06 +02:00
Matthew Honnibal	48fd36bf05	Fix test for issue 27772	2018-09-19 14:47:27 +02:00
Matthew Honnibal	6cd920e088	Add xfail test for deprojectivization SBD bug	2018-09-19 14:00:31 +02:00
John Stewart	2d15859d2a	Fixed spaCy+Keras example (#2763 ) * bug fixes in keras example * created contributor agreement	2018-09-15 13:06:39 +02:00
Matthew Honnibal	99a6011580	Avoid adding empty layer in model, to keep models backwards compatible	2018-09-14 22:51:58 +02:00
Matthew Honnibal	c046392317	Trigger on_data hooks in parser model	2018-09-14 20:51:21 +02:00
Matthew Honnibal	5afd98dff5	Add a stepping function, for changing batch sizes or learning rates	2018-09-14 18:37:16 +02:00
Matthew Honnibal	27c00f4f22	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-09-14 12:30:57 +02:00
Andrew Ongko	81564cc4e8	Update Indonesian model (#2752 ) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file	2018-09-14 12:30:32 +02:00
Filipe Caixeta	fe515085f3	Add words to portuguese language _num_words (#2759 ) * Add words to portuguese language _num_words * Add words to portuguese language _num_words	2018-09-14 12:30:16 +02:00
Matthew Honnibal	f32b52e611	Fix bug that caused deprojectivisation to run multiple times	2018-09-14 12:12:54 +02:00
Matthew Honnibal	8f2a6367e9	Fix usage of PyTorch BiLSTM in ud_train	2018-09-13 22:54:59 +00:00
Matthew Honnibal	afeddfff26	Fix PyTorch BiLSTM	2018-09-13 22:54:34 +00:00
Matthew Honnibal	a26fe8e7bb	Small hack in Language.update to make torch work	2018-09-13 22:51:52 +00:00
Matthew Honnibal	445b81ce3f	Support bilstm_depth argument in ud-train	2018-09-13 19:30:22 +02:00
Matthew Honnibal	b43643a953	Support bilstm_depth option in parser	2018-09-13 19:29:49 +02:00
Matthew Honnibal	45032fe9e1	Support option of BiLSTM in Tok2Vec (requires pytorch)	2018-09-13 19:28:35 +02:00

... 10 11 12 13 14 ...

9609 Commits