Sofie Van Landeghem
358cbb21e3
Define candidate generator in EL config ( #5876 )
...
* candidate generator as separate part of EL config
* update comment
* ent instead of str as input for candidate generation
* Span instead of str: correct type indication
* fix types
* unit test to create new candidate generator
* fix replace_pipe argument passing
* move error message, general cleanup
* add vocab back to KB constructor
* provide KB as callable from Vocab arg
* rename to kb_loader, fix KB serialization as part of the EL pipe
* fix typo
* reformatting
* cleanup
* fix comment
* fix wrongly duplicated code from merge conflict
* rename dump to to_disk
* from_disk instead of load_bulk
* update test after recent removal of set_morphology in tagger
* remove old doc
2020-08-18 16:10:36 +02:00
Ines Montani
648f61d077
Tidy up compiler flags and imports ( #5071 )
2020-03-02 11:48:10 +01:00
Ines Montani
e3f40a6a0f
Tidy up and auto-format
2020-02-18 15:38:18 +01:00
svlandeg
dae8a21282
rename entity frequency
2019-07-19 17:40:28 +02:00
svlandeg
dbc53b9870
rename to KBEntryC
2019-06-26 15:55:26 +02:00
svlandeg
a31648d28b
further code cleanup
2019-06-19 09:15:43 +02:00
svlandeg
5c723c32c3
entity vectors in the KB + serialization of them
2019-06-05 18:29:18 +02:00
svlandeg
60b54ae8ce
bulk entity writing and experiment with regex wikidata reader to speed up processing
2019-05-01 00:00:38 +02:00
svlandeg
54d0cea062
unit test for KB serialization
2019-04-24 23:52:34 +02:00
svlandeg
3e0cb69065
KB aliases to and from file
2019-04-24 20:24:24 +02:00
svlandeg
ad6c5e581c
writing and reading number of entries to/from header
2019-04-24 15:31:44 +02:00
svlandeg
6e3223f234
bulk loading in proper order of entity indices
2019-04-24 11:26:38 +02:00
svlandeg
694fea597a
dumping all entryC entries + (inefficient) reading back in
2019-04-23 18:36:50 +02:00
svlandeg
8e70a564f1
custom reader and writer for _EntryC fields (first stab at it - not complete)
2019-04-23 16:33:40 +02:00
svlandeg
9a7d534b1b
enable nogil for cython functions in kb.pxd
2019-04-10 17:25:10 +02:00
svlandeg
61a33f55d2
little fixes
2019-04-10 16:06:09 +02:00
svlandeg
8814b9010d
entity as one field instead of both ID and name
2019-03-25 18:10:41 +01:00
svlandeg
a48241e9a2
use nlp's vocab for stringstore
2019-03-22 11:36:45 +01:00
svlandeg
1ee0e78fd7
select candidate with highest prior probabiity
2019-03-22 11:36:45 +01:00
svlandeg
7b708ab8a4
name per entity
2019-03-22 11:36:45 +01:00
svlandeg
c593607ce2
minimal EL pipe
2019-03-22 11:36:45 +01:00
svlandeg
b6c3255a9f
Entity class
2019-03-22 11:36:45 +01:00
svlandeg
1289cd6e8f
property getters and keep track of KB internally
2019-03-22 11:36:45 +01:00
svlandeg
9a46c431c3
store entity hash instead of pointer
2019-03-22 11:36:45 +01:00
svlandeg
9819dca80e
create candidate object from entry pointer (not fully functional yet)
2019-03-22 11:36:45 +01:00
svlandeg
b55baaa1dc
avoid value 0 in preshmap and helpful user warnings
2019-03-22 11:36:45 +01:00
svlandeg
8843f9279c
use StringStore
2019-03-22 11:36:45 +01:00
svlandeg
51560bf0ed
bugfix adding aliases
2019-03-22 11:36:45 +01:00
svlandeg
c4ba942765
get candidates by alias
2019-03-22 11:36:45 +01:00
svlandeg
151b855cc8
adding and retrieving aliases
2019-03-22 11:36:45 +01:00
svlandeg
cf34113250
very minimal KB functionality working
2019-03-22 11:36:44 +01:00
svlandeg
af281c5466
adding aliases per entity in the KB
2019-03-22 11:36:44 +01:00
svlandeg
f77b99c103
fix compile errors
2019-03-22 11:36:44 +01:00
svlandeg
27483f9080
add pyx and separate method to add aliases
2019-03-22 11:36:44 +01:00
svlandeg
feb71e15fd
hash the entity name
2019-03-22 11:36:44 +01:00
svlandeg
839dafa104
documented some comments and todos
2019-03-22 11:36:44 +01:00
svlandeg
7f37737878
kb snippet, draft by Matt (wip)
2019-03-22 11:36:44 +01:00