Commit Graph

9609 Commits

Author SHA1 Message Date
Ines Montani
ea20b72c08 💫 Make like_num work for prefixed numbers (#2808)
* Only split + prefix if not numbers

* Make like_num work for prefixed numbers

* Add test for like_num
2018-10-01 10:49:14 +02:00
John Stewart
9faea3ff10 Update Keras Example for (Parikh et al, 2016) implementation (#2803)
* bug fixes in keras example

* created contributor agreement

* baseline for Parikh model

* initial version of parikh 2016 implemented

* tested asymmetric models

* fixed grevious error in normalization

* use standard SNLI test file

* begin to rework parikh example

* initial version of running example

* start to document the new version

* start to document the new version

* Update Decompositional Attention.ipynb

* fixed calls to similarity

* updated the README

* import sys package duh

* simplified indexing on mapping word to IDs

* stupid python indent error

* added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround
2018-10-01 10:28:45 +02:00
Ioannis Daras
405a826436 Correct error in spacy universe docs concerning spacy-lookup (#2814) 2018-10-01 10:24:50 +02:00
Filipe Caixeta
6c498f9ff4 Update Portuguese Language (#2790)
* Add words to portuguese language _num_words

* Add words to portuguese language _num_words

* Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols

* Extended punctuation and norm_exceptions in the Portuguese language
2018-09-29 09:51:45 +02:00
Matthew Honnibal
b39810d692 Fix copy_reg compatibility on _serialize module 2018-09-28 15:23:14 +02:00
Matthew Honnibal
f82f8ba5dd Fix serialization when empty parser model. Closes #2482 2018-09-28 15:18:52 +02:00
Matthew Honnibal
d5a6c63b62 Add regression test for #2482 2018-09-28 15:18:30 +02:00
Matthew Honnibal
e3e9fe18d4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-09-28 14:27:35 +02:00
Matthew Honnibal
0323f5be0c Fix _serialize module 2018-09-28 14:27:24 +02:00
Matthew Honnibal
05b6103a0c Try to fix version pin for msgpack-numpy 2018-09-28 14:07:00 +02:00
Ines Montani
5d56eb70d7 Tidy up tests 2018-09-27 16:41:57 +02:00
Ines Montani
1f1bab9264 Remove unused import 2018-09-27 16:41:37 +02:00
Matthew Honnibal
6430b1fe64 Restore encoding arg on msgpack-numpy 2018-09-27 15:58:21 +02:00
Matthew Honnibal
276aa83d1a Require older msgpack-numpy 2018-09-27 15:34:24 +02:00
Matthew Honnibal
2ac69facc6 Fix Python 2 test failure 2018-09-27 15:34:16 +02:00
Matthew Honnibal
72778375fb Merge branch 'master' of https://github.com/explosion/spaCy 2018-09-27 13:54:49 +02:00
Matthew Honnibal
96fe314d8d Fix bug when too many entity types. Fixes #2800 2018-09-27 13:54:34 +02:00
Suraj Rajan
bbdc6456c6 Set up dependency tree pattern matching skeleton (#2732) 2018-09-27 13:27:18 +02:00
Matthew Honnibal
8809dc4514 Remove deprecated encoding argument to msgpack 2018-09-27 12:56:23 +02:00
Matthew Honnibal
bae6b3e2b3 Merge branch 'master' of https://github.com/explosion/spaCy 2018-09-27 12:50:31 +02:00
Ines Montani
71cdbeada7 Revert "Also include lowercase norm exceptions"
This reverts commit 70f4e8adf3.
2018-09-27 12:29:25 +02:00
Charles-Axel Dein
014dd47c70 Add jupyter=True to displacy.render in documentation (#2806) 2018-09-27 12:28:04 +02:00
Przemysław Hojnacki
966b583d5e agreement of contributor, may I introduce a tiny pl languge contribution (#2799)
* Contributors agreement

* Contributors agreement

* Contributors agreement
2018-09-27 12:25:22 +02:00
Charles-Axel Dein
94ad3c55f1 Add charlax's contributor agreement (#2805) 2018-09-27 12:24:42 +02:00
darindf
8227566805 Fix error (#2802)
* Fix error
ValueError: cannot resize an array that references or is referenced
by another array in this way.  Use the resize function

* added spaCy Contributor Agreement
2018-09-26 21:31:03 +02:00
Ines Montani
5e0dfb34fa Merge branch 'master' of https://github.com/explosion/spaCy 2018-09-26 11:13:58 +02:00
Ines Montani
70f4e8adf3 Also include lowercase norm exceptions 2018-09-25 12:22:02 +02:00
Keshan
9a016d17c2 Adding basic support for Sinhala language. (#2788)
* adding Sinhala language package, stop words, examples and lex_attrs.

* Adding contributor agreement

* Updating contributor agreement
2018-09-25 12:18:25 +02:00
Pranshu Jethmalani
9fd27d777e Fix typo (#2795) [ci skip]
Fixed typo on line 6 "regcognizer --> recognizer"
2018-09-25 12:12:40 +02:00
Matthew Honnibal
b42c123e5d Fix regression introduced by 1759abf1e 2018-09-25 11:08:58 +02:00
Matthew Honnibal
500898907b Fix regression in parser.begin_training() 2018-09-25 11:08:31 +02:00
Ines Montani
3c4e3ade30 Fix typo (closes #2784) 2018-09-21 10:45:11 +02:00
mauryaland
68b3c544d5 Adding French hyphenated first name (#2786) 2018-09-21 10:38:13 +02:00
Matthew Honnibal
1759abf1e5 Fix bug in sentence starts for non-projective parses
The set_children_from_heads function assumed parse trees were
projective. However, non-projective parses may be passed in during
deserialization, or after deprojectivising. This caused incorrect
sentence boundaries to be set for non-projective parses. Close #2772.
2018-09-19 14:50:06 +02:00
Matthew Honnibal
48fd36bf05 Fix test for issue 27772 2018-09-19 14:47:27 +02:00
Matthew Honnibal
6cd920e088 Add xfail test for deprojectivization SBD bug 2018-09-19 14:00:31 +02:00
John Stewart
2d15859d2a Fixed spaCy+Keras example (#2763)
* bug fixes in keras example

* created contributor agreement
2018-09-15 13:06:39 +02:00
Matthew Honnibal
99a6011580 Avoid adding empty layer in model, to keep models backwards compatible 2018-09-14 22:51:58 +02:00
Matthew Honnibal
c046392317 Trigger on_data hooks in parser model 2018-09-14 20:51:21 +02:00
Matthew Honnibal
5afd98dff5 Add a stepping function, for changing batch sizes or learning rates 2018-09-14 18:37:16 +02:00
Matthew Honnibal
27c00f4f22 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-09-14 12:30:57 +02:00
Andrew Ongko
81564cc4e8 Update Indonesian model (#2752)
* adding e-KTP in tokenizer exceptions list

* add exception token

* removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception

* add tokenizer exceptions list

* combining base_norms with norm_exceptions

* adding norm_exception

* fix double key in lemmatizer

* remove unused import on punctuation.py

* reformat stop_words to reduce number of lines, improve readibility

* updating tokenizer exception

* implement is_currency for lang/id

* adding orth_first_upper in tokenizer_exceptions

* update the norm_exception list

* remove bunch of abbreviations

* adding contributors file
2018-09-14 12:30:32 +02:00
Filipe Caixeta
fe515085f3 Add words to portuguese language _num_words (#2759)
* Add words to portuguese language _num_words

* Add words to portuguese language _num_words
2018-09-14 12:30:16 +02:00
Matthew Honnibal
f32b52e611 Fix bug that caused deprojectivisation to run multiple times 2018-09-14 12:12:54 +02:00
Matthew Honnibal
8f2a6367e9 Fix usage of PyTorch BiLSTM in ud_train 2018-09-13 22:54:59 +00:00
Matthew Honnibal
afeddfff26 Fix PyTorch BiLSTM 2018-09-13 22:54:34 +00:00
Matthew Honnibal
a26fe8e7bb Small hack in Language.update to make torch work 2018-09-13 22:51:52 +00:00
Matthew Honnibal
445b81ce3f Support bilstm_depth argument in ud-train 2018-09-13 19:30:22 +02:00
Matthew Honnibal
b43643a953 Support bilstm_depth option in parser 2018-09-13 19:29:49 +02:00
Matthew Honnibal
45032fe9e1 Support option of BiLSTM in Tok2Vec (requires pytorch) 2018-09-13 19:28:35 +02:00