Commit Graph

10303 Commits

Author SHA1 Message Date
Ines Montani
a8416c46f7 Use string name in setup.py
Hopefully this will trick GitHub's parser into recognising it as a Python package and show us the dependents / "used by" statistics 🤞
2019-05-28 17:11:39 +02:00
svlandeg
992fa92b66 refactor again to clusters of entities and cosine similarity 2019-05-28 00:05:22 +02:00
svlandeg
8c4aa076bc small fixes 2019-05-27 14:29:38 +02:00
Ujwal Narayan
ed7be3f64c Update norm_exceptions.py (#3778)
* Update norm_exceptions.py

Extended the Currency set to include Franc, Indian Rupee, Bangladeshi Taka, Korean Won, Mexican Dollar, and Egyptian Pound

* Fix formatting [ci skip]
2019-05-27 11:52:52 +02:00
svlandeg
cfc27d7ff9 using Tok2Vec instead 2019-05-26 23:39:46 +02:00
svlandeg
abf9af81c9 learn rate en epochs 2019-05-24 22:04:25 +02:00
estr4ng7d
604acb6ace Marathi Language Support (#3767)
* Adding Marathi language details and folder to it

* Adding few changes and running tests

* Adding few changes and running tests

* Update __init__.py

mh -> mr

* Rename spacy/lang/mh/__init__.py to spacy/lang/mr/__init__.py

* mh -> mr
2019-05-24 14:29:42 +02:00
Ines Montani
7634812172 Document Language.evaluate 2019-05-24 14:06:36 +02:00
Ines Montani
45e6855550 Update Language.update docs 2019-05-24 14:06:26 +02:00
Ines Montani
b78a8dc1d2 Update Scorer and add API docs 2019-05-24 14:06:04 +02:00
svlandeg
86ed771e0b adding local sentence encoder 2019-05-23 16:59:11 +02:00
svlandeg
4392c01b7b obtain sentence for each mention 2019-05-23 15:37:05 +02:00
svlandeg
97241a3ed7 upsampling and batch processing 2019-05-22 23:40:10 +02:00
svlandeg
1a16490d20 update per entity 2019-05-22 12:46:40 +02:00
svlandeg
eb08bdb11f hidden with for encoders 2019-05-21 23:42:46 +02:00
svlandeg
7b13e3d56f undersampling negatives 2019-05-21 18:35:10 +02:00
svlandeg
2fa3fac851 fix concat bp and more efficient batch calls 2019-05-21 13:43:59 +02:00
svlandeg
0a15ee4541 fix in bp call 2019-05-20 23:54:55 +02:00
svlandeg
89e322a637 small fixes 2019-05-20 17:20:39 +02:00
Ujwal Narayan
4d550a3055 Enhancing Kannada language Resources (#3755)
* Updated stop_words.py

Added more stopwords

* Create ujwal-narayan.md

Enhancing Kannada language resources
2019-05-20 12:56:10 +02:00
svlandeg
7edb2e1711 fix convolution layer 2019-05-20 11:58:48 +02:00
svlandeg
dd691d0053 debugging 2019-05-17 17:44:11 +02:00
svlandeg
400b19353d simplify architecture and larger-scale test runs 2019-05-17 01:51:18 +02:00
Ines Montani
321c9f5acc Fix lex_id docs (closes #3743) 2019-05-16 23:15:58 +02:00
svlandeg
d51bffe63b clean up code 2019-05-16 18:36:15 +02:00
svlandeg
b5470f3d75 various tests, architectures and experiments 2019-05-16 18:25:34 +02:00
svlandeg
9ffe5437ae calculate gradient for entity encoding 2019-05-15 02:23:08 +02:00
svlandeg
2713abc651 implement loss function using dot product and prob estimate per candidate cluster 2019-05-14 22:55:56 +02:00
BreakBB
ed18a6efbd Add check for callable to 'Language.replace_pipe' to fix #3737 (#3741) 2019-05-14 16:59:31 +02:00
svlandeg
09ed446b20 different architecture / settings 2019-05-14 08:37:52 +02:00
svlandeg
4142e8dd1b train and predict per article (saving time for doc encoding) 2019-05-13 17:02:34 +02:00
svlandeg
3b81b00954 evaluating on dev set during training 2019-05-13 14:26:04 +02:00
Ines Montani
8baff1c7c0
💫 Improve introspection of custom extension attributes (#3729)
* Add custom __dir__ to Underscore (see #3707)

* Make sure custom extension methods keep their docstrings (see #3707)

* Improve tests

* Prepend note on partial to docstring (see #3707)

* Remove print statement

* Handle cases where docstring is None
2019-05-12 00:53:11 +02:00
Ines Montani
f96af8526a Merge branch 'spacy.io' [ci skip] 2019-05-11 23:03:56 +02:00
Matthew Honnibal
3aceeeaaeb Set version to v2.1.4 2019-05-11 22:57:53 +02:00
Ines Montani
aea1c93a05 Replace cytoolz.partition_all with util.minibatch 2019-05-11 21:12:09 +02:00
Ines Montani
0bf6441863 Fix .iob converter (closes #3620) 2019-05-11 19:15:26 +02:00
Matthew Honnibal
f6e9394aa5 Fix push-tag script 2019-05-11 19:04:35 +02:00
Matthew Honnibal
a5159ddcf5 Set version to v2.1.4.dev1 2019-05-11 19:03:51 +02:00
Ines Montani
7534f7cb44 Fix return value of Language.update (closes #3692) 2019-05-11 18:40:19 +02:00
Ines Montani
503b8c85f1 Add TWiML podcast to universe [ci skip] 2019-05-11 17:48:22 +02:00
Ines Montani
0daf2422a3 Auto-format 2019-05-11 17:48:07 +02:00
Ines Montani
6b3a79ac96 Call rmtree and copytree with strings (closes #3713) 2019-05-11 15:48:35 +02:00
devforfu
21af12eb53 Make "text" key in JSONL format optional when "tokens" key is provided (#3721)
* Fix issue with forcing text key when it is not required

* Extending the docs to reflect the new behavior
2019-05-11 15:41:29 +02:00
Ines Montani
6cfa1e1f47 Fix DependencyParser.predict docs (resolves #3561) 2019-05-11 15:37:54 +02:00
Ines Montani
25f5592d57 Improve Token.prob and Lexeme.prob docs (resolves #3701) 2019-05-11 15:23:41 +02:00
Aaron Kub
719a15f23d fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
Luca Dorigo
82d034f976 Update glossary.py to match information found in documentation (#3704) (closes ##3679)
* Update glossary.py to match information found in documentation

I used regexes to add any dependency tag that was in the documentation but not in the glossary. Solves #3679 👍

* Adds forgotten colon
2019-05-10 14:23:20 +02:00
Wannaphong Phatthiyaphaibun
5a14a13f64 fix thai bug (#3693)
fix tokenize for pythainlp
2019-05-10 14:21:34 +02:00
Luca Dorigo
2663f4133c Submit contributor agreement (#3705) 2019-05-10 14:19:18 +02:00