Commit Graph

37 Commits

Author SHA1 Message Date
Sofie Van Landeghem
358cbb21e3
Define candidate generator in EL config (#5876)
* candidate generator as separate part of EL config

* update comment

* ent instead of str as input for candidate generation

* Span instead of str: correct type indication

* fix types

* unit test to create new candidate generator

* fix replace_pipe argument passing

* move error message, general cleanup

* add vocab back to KB constructor

* provide KB as callable from Vocab arg

* rename to kb_loader, fix KB serialization as part of the EL pipe

* fix typo

* reformatting

* cleanup

* fix comment

* fix wrongly duplicated code from merge conflict

* rename dump to to_disk

* from_disk instead of load_bulk

* update test after recent removal of set_morphology in tagger

* remove old doc
2020-08-18 16:10:36 +02:00
Ines Montani
648f61d077
Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
Ines Montani
e3f40a6a0f Tidy up and auto-format 2020-02-18 15:38:18 +01:00
svlandeg
dae8a21282 rename entity frequency 2019-07-19 17:40:28 +02:00
svlandeg
dbc53b9870 rename to KBEntryC 2019-06-26 15:55:26 +02:00
svlandeg
a31648d28b further code cleanup 2019-06-19 09:15:43 +02:00
svlandeg
5c723c32c3 entity vectors in the KB + serialization of them 2019-06-05 18:29:18 +02:00
svlandeg
60b54ae8ce bulk entity writing and experiment with regex wikidata reader to speed up processing 2019-05-01 00:00:38 +02:00
svlandeg
54d0cea062 unit test for KB serialization 2019-04-24 23:52:34 +02:00
svlandeg
3e0cb69065 KB aliases to and from file 2019-04-24 20:24:24 +02:00
svlandeg
ad6c5e581c writing and reading number of entries to/from header 2019-04-24 15:31:44 +02:00
svlandeg
6e3223f234 bulk loading in proper order of entity indices 2019-04-24 11:26:38 +02:00
svlandeg
694fea597a dumping all entryC entries + (inefficient) reading back in 2019-04-23 18:36:50 +02:00
svlandeg
8e70a564f1 custom reader and writer for _EntryC fields (first stab at it - not complete) 2019-04-23 16:33:40 +02:00
svlandeg
9a7d534b1b enable nogil for cython functions in kb.pxd 2019-04-10 17:25:10 +02:00
svlandeg
61a33f55d2 little fixes 2019-04-10 16:06:09 +02:00
svlandeg
8814b9010d entity as one field instead of both ID and name 2019-03-25 18:10:41 +01:00
svlandeg
a48241e9a2 use nlp's vocab for stringstore 2019-03-22 11:36:45 +01:00
svlandeg
1ee0e78fd7 select candidate with highest prior probabiity 2019-03-22 11:36:45 +01:00
svlandeg
7b708ab8a4 name per entity 2019-03-22 11:36:45 +01:00
svlandeg
c593607ce2 minimal EL pipe 2019-03-22 11:36:45 +01:00
svlandeg
b6c3255a9f Entity class 2019-03-22 11:36:45 +01:00
svlandeg
1289cd6e8f property getters and keep track of KB internally 2019-03-22 11:36:45 +01:00
svlandeg
9a46c431c3 store entity hash instead of pointer 2019-03-22 11:36:45 +01:00
svlandeg
9819dca80e create candidate object from entry pointer (not fully functional yet) 2019-03-22 11:36:45 +01:00
svlandeg
b55baaa1dc avoid value 0 in preshmap and helpful user warnings 2019-03-22 11:36:45 +01:00
svlandeg
8843f9279c use StringStore 2019-03-22 11:36:45 +01:00
svlandeg
51560bf0ed bugfix adding aliases 2019-03-22 11:36:45 +01:00
svlandeg
c4ba942765 get candidates by alias 2019-03-22 11:36:45 +01:00
svlandeg
151b855cc8 adding and retrieving aliases 2019-03-22 11:36:45 +01:00
svlandeg
cf34113250 very minimal KB functionality working 2019-03-22 11:36:44 +01:00
svlandeg
af281c5466 adding aliases per entity in the KB 2019-03-22 11:36:44 +01:00
svlandeg
f77b99c103 fix compile errors 2019-03-22 11:36:44 +01:00
svlandeg
27483f9080 add pyx and separate method to add aliases 2019-03-22 11:36:44 +01:00
svlandeg
feb71e15fd hash the entity name 2019-03-22 11:36:44 +01:00
svlandeg
839dafa104 documented some comments and todos 2019-03-22 11:36:44 +01:00
svlandeg
7f37737878 kb snippet, draft by Matt (wip) 2019-03-22 11:36:44 +01:00