Matthew Honnibal
a0b8a26655
Fix missing regex requirement
2018-05-16 23:19:01 +02:00
Matthew Honnibal
74d5c625b3
Use rising beam update prob
2018-05-16 20:11:59 +02:00
Matthew Honnibal
544ae7f1db
Merge branch 'develop' into feature/refactor-parser
2018-05-16 02:06:49 +02:00
Matthew Honnibal
d1b27fe5aa
Revert "Improve dynamic oracle when values are missing in parse"
...
This reverts commit f56bd4736b
.
2018-05-16 00:31:52 +02:00
Matthew Honnibal
f3790bdeec
Fix appveyor for Windows
2018-05-15 21:16:39 +02:00
Matthew Honnibal
83acaa0358
Add missing name attribute for parser
2018-05-15 19:01:53 +02:00
Matthew Honnibal
f328c195ca
Fix size limits in training data
2018-05-15 19:01:41 +02:00
Matthew Honnibal
8446b35ce0
Fix parser model loading
2018-05-15 18:43:46 +02:00
Matthew Honnibal
dc1a479fbd
Merge branch 'develop' into feature/refactor-parser
2018-05-15 18:39:21 +02:00
Matthew Honnibal
13faf4e1ea
Update thinc requirement
2018-05-15 18:35:11 +02:00
Matthew Honnibal
546dd99cdf
Merge master into develop -- mostly Arabic and website
2018-05-15 18:14:28 +02:00
Matthew Honnibal
e3fdfba164
Revert hacks to travis.yml
2018-05-15 18:00:24 +02:00
Matthew Honnibal
5664ab7e6c
Revert hacks to tests
2018-05-15 18:00:09 +02:00
Matthew Honnibal
4dd1fb3c7b
Require thinc 6.11.1.dev16
2018-05-15 17:56:07 +02:00
Matthew Honnibal
7b9195657b
Restore beam_density argument for parser beam
2018-05-15 17:55:11 +02:00
Matthew Honnibal
581d318971
Fix conftest
2018-05-15 00:54:45 +02:00
Tahar Zanouda
00417794d3
Add Arabic language ( #2314 )
...
* added support for Arabic lang
* added Arabic language support
* updated conftest
2018-05-15 00:27:19 +02:00
Jani Monoses
0e08e49e87
Lemmatizer ro ( #2319 )
...
* Add Romanian lemmatizer lookup table.
Adapted from http://www.lexiconista.com/datasets/lemmatization/
by replacing cedillas with commas (ș and ț).
The original dataset is licensed under the Open Database License.
* Fix one blatant issue in the Romanian lemmatizer
* Romanian examples file
* Add ro_tokenizer in conftest
* Add Romanian lemmatizer test
2018-05-12 15:20:04 +02:00
vishnumenon
ae3719ece5
Fix the code for FACILITIY entities ( #2324 )
...
* Fix the code for FACILITIY entities
As far as I can tell, the default models all use "FAC" rather than "FACILITY"
* Added my Contributor Agreement
* Rename vishnumenon to vishnumenon.md
2018-05-12 15:19:17 +02:00
Matthew Honnibal
625ee6c464
Unhack travis.sh
2018-05-10 18:16:11 +02:00
Matthew Honnibal
299621b747
Try running sudo=true for travis
2018-05-10 18:11:11 +02:00
Matthew Honnibal
603907926f
Point thinc to libblas on Travis
2018-05-10 18:06:37 +02:00
Matthew Honnibal
1b294f4798
Tell Thinc to link against system blas on Travis
2018-05-10 18:03:44 +02:00
Matthew Honnibal
c261b5b996
Add some diagnostics to travis.yml to try to figure out why build fails
2018-05-10 17:10:44 +02:00
Matthew Honnibal
887631ca25
Disable some tests to figure out why CI fails
2018-05-10 16:42:01 +02:00
Matthew Honnibal
902a172cb7
Disable some tests to figure out why CI fails
2018-05-10 16:30:07 +02:00
Matthew Honnibal
614d45ea58
Set a more aggressive threshold on the max violn update
2018-05-10 15:38:24 +02:00
Matthew Honnibal
8e8724b55b
Default to beam_update_prob 1
2018-05-10 15:38:02 +02:00
Jani Monoses
42b34832e4
Update Romanian stopword list ( #2316 )
...
* Contributor agreement for janimo
* Update Romanian stopword list
Include the correct spellings of all the words already in the repo
that are using cedillas (ş and ţ) instead of commas (ș and ț).
Add another unrelated spelling fix.
See https://github.com/stopwords-iso/stopwords-ro/pull/1 and
https://github.com/stopwords-iso/stopwords-ro/pull/2
2018-05-10 12:16:56 +02:00
Lucas Abbade
18af53014f
Adding my contributor agreement ( #2315 )
...
* Create LRAbbade.md
* Update LRAbbade.md
2018-05-09 21:25:05 +02:00
Lucas Abbade
be7fdc59d1
Update lex_attrs.py ( #2307 )
...
* Update lex_attrs.py
Fixed spelling mistakes of some numbers (according to Brazilian Portuguese).
* Update lex_attrs.py
As requested, I've included the correct spelling for both Brazilian Portuguese and Portuguese Portuguese.
I will advise however, that the two are separated in the future. Brazilian Portuguese is a very different language from the original one, although most of the writing is unified, the way people talk in both countries is radically different. Keeping both languages as one may lead to bigger issues in the future, especially when it comes to spell checking.
2018-05-09 20:49:31 +02:00
mauryaland
5368ba028a
Update stop_words.py for French language ( #2310 )
...
* Add contraction forms of some common stopwords
All the stopwords added contain the apostrophe" ' "or " ’ ".
* Adds contributor agreement mauryaland
* Update mauryaland.md
2018-05-09 12:04:38 +02:00
Matthew Honnibal
a61fd60681
Fix error in beam gradient calculation
2018-05-09 02:44:09 +02:00
Matthew Honnibal
a6ae1ee6f7
Don't modify Token in global scope
2018-05-09 00:43:00 +02:00
Matthew Honnibal
f94f721f40
Avoid importing fused token symbol in ud-run-test, untl that's added
2018-05-09 00:28:03 +02:00
Matthew Honnibal
659ec5b975
Avoid importing fused token symbol in ud-run-test, untl that's added
2018-05-08 19:40:33 +02:00
Matthew Honnibal
4cb0494bef
Bug fixes to beam search after refactor
2018-05-08 13:48:50 +02:00
Matthew Honnibal
5ed71973b3
Add a keyword argument sink to GoldParse
2018-05-08 13:48:32 +02:00
Matthew Honnibal
8cfe326f87
Avoid relying on final gold check in beam search
2018-05-08 13:48:19 +02:00
Matthew Honnibal
fc4dd49b77
Support oracle segmentation in ud-train CLI command
2018-05-08 13:47:45 +02:00
Matthew Honnibal
c49e44349a
Fix beam parsing
2018-05-08 02:53:24 +02:00
Matthew Honnibal
99649d114d
Fix parser
2018-05-08 00:27:26 +02:00
Matthew Honnibal
8a82367a9d
Fix beam search after refactor
2018-05-08 00:20:33 +02:00
Matthew Honnibal
5a0f26be0c
Readd beam search after refactor
2018-05-08 00:19:52 +02:00
ines
7a3599c21a
Fix formatting and consistency
2018-05-07 23:02:11 +02:00
ines
37facf9b4d
Add config for no-response [ci skip]
2018-05-07 22:04:54 +02:00
ines
ac25bc4016
Add docs section on sentence segmentation [ci skip]
2018-05-07 21:25:20 +02:00
ines
14148cd147
Fix formatting and wording
2018-05-07 21:24:35 +02:00
ines
f803da609f
Add scattertext [ci skip]
2018-05-07 19:10:23 +02:00
ines
a685fff875
Merge branch 'master' of https://github.com/explosion/spaCy
2018-05-07 18:58:57 +02:00