Commit Graph

9879 Commits

Author SHA1 Message Date
svlandeg
a48241e9a2 use nlp's vocab for stringstore 2019-03-22 11:36:45 +01:00
svlandeg
1ee0e78fd7 select candidate with highest prior probabiity 2019-03-22 11:36:45 +01:00
svlandeg
7b708ab8a4 name per entity 2019-03-22 11:36:45 +01:00
svlandeg
c593607ce2 minimal EL pipe 2019-03-22 11:36:45 +01:00
svlandeg
c71123dd0c ensure no candidates are returned for unknown aliases 2019-03-22 11:36:45 +01:00
svlandeg
b6c3255a9f Entity class 2019-03-22 11:36:45 +01:00
svlandeg
1289cd6e8f property getters and keep track of KB internally 2019-03-22 11:36:45 +01:00
svlandeg
98ae77a682 unit test on number of candidates generated 2019-03-22 11:36:45 +01:00
svlandeg
9a46c431c3 store entity hash instead of pointer 2019-03-22 11:36:45 +01:00
svlandeg
9819dca80e create candidate object from entry pointer (not fully functional yet) 2019-03-22 11:36:45 +01:00
svlandeg
a9074e0886 check the length of entities and probabilities vector + unit test 2019-03-22 11:36:45 +01:00
svlandeg
d133ffaff9 correct size, not counting dummy elements in the vector 2019-03-22 11:36:45 +01:00
svlandeg
33f8a0fe2e check and unit test in case prior probs exceed 1 2019-03-22 11:36:45 +01:00
svlandeg
b55baaa1dc avoid value 0 in preshmap and helpful user warnings 2019-03-22 11:36:45 +01:00
svlandeg
20a7b7b1c0 raising error when adding alias for unknown entity + unit test 2019-03-22 11:36:45 +01:00
svlandeg
8843f9279c use StringStore 2019-03-22 11:36:45 +01:00
svlandeg
51560bf0ed bugfix adding aliases 2019-03-22 11:36:45 +01:00
svlandeg
c4ba942765 get candidates by alias 2019-03-22 11:36:45 +01:00
svlandeg
151b855cc8 adding and retrieving aliases 2019-03-22 11:36:45 +01:00
svlandeg
cf34113250 very minimal KB functionality working 2019-03-22 11:36:44 +01:00
svlandeg
af281c5466 adding aliases per entity in the KB 2019-03-22 11:36:44 +01:00
svlandeg
f77b99c103 fix compile errors 2019-03-22 11:36:44 +01:00
svlandeg
27483f9080 add pyx and separate method to add aliases 2019-03-22 11:36:44 +01:00
svlandeg
feb71e15fd hash the entity name 2019-03-22 11:36:44 +01:00
svlandeg
839dafa104 documented some comments and todos 2019-03-22 11:36:44 +01:00
svlandeg
7f37737878 kb snippet, draft by Matt (wip) 2019-03-22 11:36:44 +01:00
svlandeg
735fc2a735 annotate kb_id through ents in doc 2019-03-22 11:36:44 +01:00
svlandeg
d849eb2455 adding kb_id as field to token, el as nlp pipeline component 2019-03-22 11:34:46 +01:00
Matthew Honnibal
d811c97da1 Fix test that caused pytest to choke on Python3 2019-03-22 10:28:51 +01:00
Matthew Honnibal
a2ad9832e5 Add failing test for #3356 2019-03-22 02:42:37 +01:00
Matthew Honnibal
7ec64a36fd
Merge pull request #3455 from explosion/bugfix/fix-en-tag-map
💫 Bring English tag_map in line with UD Treebank
2019-03-21 21:19:30 +01:00
Matthew Honnibal
c66bd61e88 Fix lemmas 2019-03-21 14:22:12 +01:00
Matthew Honnibal
04395ffa49 Bring English tag_map in line with UD Treebank
I wrote a small script to read the UD English training data and check
that our tag map and morph rules were resulting in the best POS map.
This hadn't been done for some time, and there have been various changes
to the UD schema since it has been done. After these changes we should
see much better agreement between our POS assignments and the UD POS
tags.
2019-03-21 13:53:44 +01:00
Ines Montani
0c82a5ddb2 Merge branch 'master' of https://github.com/explosion/spaCy 2019-03-21 10:23:56 +01:00
Ines Montani
0712efc6b3 Update version requirements [ci skip] 2019-03-21 10:23:54 +01:00
Matthew Honnibal
4e3ed2ea88 Add -t2v argument to train_textcat script 2019-03-20 23:05:42 +01:00
Ines Montani
764359c952 Merge branch 'master' into spacy.io 2019-03-20 17:24:28 +01:00
Ines Montani
dac8f8ff99 Update Span.__init__ docs (see #3445) [ci skip] 2019-03-20 17:24:17 +01:00
Matthew Honnibal
c7f26abe5f
Merge pull request #3434 from Bharat123rox/narrow-unicode
Raise Error for a narrow unicode build of Python
2019-03-20 12:19:52 +01:00
Matthew Honnibal
1c8ff59185
Merge pull request #3441 from explosion/fix/cli-ud-scripts
💫 Move UD scripts to bin
2019-03-20 12:19:15 +01:00
Matthew Honnibal
72889a16d5 Fix similarity calculation if vectors are on GPU (#3440) 2019-03-20 12:09:59 +01:00
Matthew Honnibal
1612990e88 Implement cosine loss for spacy pretrain. Make default 2019-03-20 11:06:58 +00:00
Ines Montani
ae5b4d0e84 Fix formatting (hopefully also restarts build properly) 2019-03-20 09:55:45 +01:00
Ines Montani
6abc1ddb26 Update __main__.py 2019-03-20 09:43:26 +01:00
Bharat123Rox
f2547f02d6 Made changes suggested by @ines 2019-03-20 07:43:19 +05:30
Ines Montani
7400c7f8a7 Move UD scripts to bin 2019-03-20 01:19:34 +01:00
Ines Montani
685fff40cf Revert "Add --always-link flag to cli.download (see #3435)"
This reverts commit 583a566843.
2019-03-20 01:03:40 +01:00
Matthew Honnibal
6cfbb2d34e Merge branch 'master' of https://github.com/explosion/spaCy 2019-03-20 00:59:54 +01:00
Matthew Honnibal
5a53e9358a Set version to 2.1.1 2019-03-20 00:59:45 +01:00
Matthew Honnibal
02d7b41893 Fix GPU installation. Closes #3437 2019-03-20 00:59:27 +01:00