svlandeg
b12001f368
small fixes
2019-06-12 22:05:53 +02:00
svlandeg
6521cfa132
speeding up training
2019-06-12 13:37:05 +02:00
svlandeg
fe1ed432ef
eval on dev set, varying combo's of prior and context scores
2019-06-11 11:40:58 +02:00
svlandeg
83dc7b46fd
first tests with EL pipe
2019-06-10 21:25:26 +02:00
svlandeg
7de1ee69b8
training loop in proper pipe format
2019-06-07 15:55:10 +02:00
svlandeg
0486ccabfd
introduce goldparse.links
2019-06-07 13:54:45 +02:00
svlandeg
a5c061f506
storing NEL training data in GoldParse objects
2019-06-07 12:58:42 +02:00
svlandeg
61f0e2af65
code cleanup
2019-06-06 20:22:14 +02:00
svlandeg
d8b435ceff
pretraining description vectors and storing them in the KB
2019-06-06 19:51:27 +02:00
svlandeg
5c723c32c3
entity vectors in the KB + serialization of them
2019-06-05 18:29:18 +02:00
svlandeg
9abbd0899f
separate entity encoder to get 64D descriptions
2019-06-05 00:09:46 +02:00
svlandeg
fb37cdb2d3
implementing el pipe in pipes.pyx (not tested yet)
2019-06-03 21:32:54 +02:00
svlandeg
d83a1e3052
Merge branch 'master' into feature/nel-wiki
2019-06-03 09:35:10 +02:00
Germán
86eb817b74
Overwrites default getter for like_num in Spanish by adding _num_words and like_num to lex_attrs.py ( #3810 ) ( closes #3803 ))
...
* (#3803 ) Spanish like_num returning false for number-like token
* (#3803 ) Spanish like_num now returning True for number-like token
2019-06-02 12:22:57 +02:00
Ines Montani
09e78b52cf
Improve E024 text for incorrect GoldParse ( closes #3558 )
2019-06-01 14:37:27 +02:00
Ramanan Balakrishnan
26c37c5a4d
fix all references to BILUO annotation format ( #3797 )
2019-05-31 12:19:19 +02:00
Ines Montani
a7fd42d937
Make jsonschema dependency optional ( #3784 )
2019-05-30 14:34:58 +02:00
Ujwal Narayan
ed7be3f64c
Update norm_exceptions.py ( #3778 )
...
* Update norm_exceptions.py
Extended the Currency set to include Franc, Indian Rupee, Bangladeshi Taka, Korean Won, Mexican Dollar, and Egyptian Pound
* Fix formatting [ci skip]
2019-05-27 11:52:52 +02:00
estr4ng7d
604acb6ace
Marathi Language Support ( #3767 )
...
* Adding Marathi language details and folder to it
* Adding few changes and running tests
* Adding few changes and running tests
* Update __init__.py
mh -> mr
* Rename spacy/lang/mh/__init__.py to spacy/lang/mr/__init__.py
* mh -> mr
2019-05-24 14:29:42 +02:00
Ines Montani
7634812172
Document Language.evaluate
2019-05-24 14:06:36 +02:00
Ines Montani
45e6855550
Update Language.update docs
2019-05-24 14:06:26 +02:00
Ines Montani
b78a8dc1d2
Update Scorer and add API docs
2019-05-24 14:06:04 +02:00
Ujwal Narayan
4d550a3055
Enhancing Kannada language Resources ( #3755 )
...
* Updated stop_words.py
Added more stopwords
* Create ujwal-narayan.md
Enhancing Kannada language resources
2019-05-20 12:56:10 +02:00
svlandeg
dd691d0053
debugging
2019-05-17 17:44:11 +02:00
BreakBB
ed18a6efbd
Add check for callable to 'Language.replace_pipe' to fix #3737 ( #3741 )
2019-05-14 16:59:31 +02:00
Ines Montani
8baff1c7c0
💫 Improve introspection of custom extension attributes ( #3729 )
...
* Add custom __dir__ to Underscore (see #3707 )
* Make sure custom extension methods keep their docstrings (see #3707 )
* Improve tests
* Prepend note on partial to docstring (see #3707 )
* Remove print statement
* Handle cases where docstring is None
2019-05-12 00:53:11 +02:00
Matthew Honnibal
3aceeeaaeb
Set version to v2.1.4
2019-05-11 22:57:53 +02:00
Ines Montani
aea1c93a05
Replace cytoolz.partition_all with util.minibatch
2019-05-11 21:12:09 +02:00
Ines Montani
0bf6441863
Fix .iob converter ( closes #3620 )
2019-05-11 19:15:26 +02:00
Matthew Honnibal
a5159ddcf5
Set version to v2.1.4.dev1
2019-05-11 19:03:51 +02:00
Ines Montani
6b3a79ac96
Call rmtree and copytree with strings ( closes #3713 )
2019-05-11 15:48:35 +02:00
devforfu
21af12eb53
Make "text" key in JSONL format optional when "tokens" key is provided ( #3721 )
...
* Fix issue with forcing text key when it is not required
* Extending the docs to reflect the new behavior
2019-05-11 15:41:29 +02:00
Luca Dorigo
82d034f976
Update glossary.py to match information found in documentation ( #3704 ) (closes ##3679)
...
* Update glossary.py to match information found in documentation
I used regexes to add any dependency tag that was in the documentation but not in the glossary. Solves #3679 👍
* Adds forgotten colon
2019-05-10 14:23:20 +02:00
Wannaphong Phatthiyaphaibun
5a14a13f64
fix thai bug ( #3693 )
...
fix tokenize for pythainlp
2019-05-10 14:21:34 +02:00
Ines Montani
505c9e0e19
Add util.filter_spans helper ( #3686 )
2019-05-08 02:33:40 +02:00
F0rge1cE
dd1e6b0bc6
Fix offset bug in loading pre-trained word2vec. ( #3689 )
...
* Fix offset bug in loading pre-trained word2vec.
* add contributor agreement
2019-05-06 23:00:38 +02:00
Ines Montani
78cb807a9a
Auto-format [ci skip]
2019-05-06 16:58:29 +02:00
Brad Jascob
955b95cb8b
Fix inconsistant lemmatizer issue #3484 ( #3646 )
...
* Fix inconsistant lemmatizer issue #3484
* Remove test case
2019-05-04 18:16:03 +02:00
svlandeg
1ae41daaa9
allow small rounding errors
2019-05-01 23:05:40 +02:00
Dobita21
f95ecedd83
Add Thai lex_attrs ( #3655 )
...
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
* add Thai norm_exception
* Add Dobita21 SCA
* editรึ : หรือ,
* Update Dobita21.md
* Auto-format
* Integrate norms into language defaults
* add acronym and some norm exception words
* add lex_attrs
* Add lexical attribute getters into the language defaults
* fix LEX_ATTRS
Co-authored-by: Donut <dobita21@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2019-05-01 12:03:14 +02:00
BreakBB
8952004dfc
Update French example sents and add two German stop words ( #3662 )
...
* Update french example sentences
* Add 'anderem' and 'ihren' to German stop words
2019-05-01 12:01:35 +02:00
svlandeg
60b54ae8ce
bulk entity writing and experiment with regex wikidata reader to speed up processing
2019-05-01 00:00:38 +02:00
svlandeg
19e8f339cb
deduce entity freq from WP corpus and serialize vocab in WP test
2019-04-29 17:37:29 +02:00
svlandeg
387263d618
simplify chains
2019-04-29 13:58:07 +02:00
svlandeg
54d0cea062
unit test for KB serialization
2019-04-24 23:52:34 +02:00
svlandeg
3e0cb69065
KB aliases to and from file
2019-04-24 20:24:24 +02:00
svlandeg
ad6c5e581c
writing and reading number of entries to/from header
2019-04-24 15:31:44 +02:00
svlandeg
6e3223f234
bulk loading in proper order of entity indices
2019-04-24 11:26:38 +02:00
svlandeg
694fea597a
dumping all entryC entries + (inefficient) reading back in
2019-04-23 18:36:50 +02:00
svlandeg
8e70a564f1
custom reader and writer for _EntryC fields (first stab at it - not complete)
2019-04-23 16:33:40 +02:00