Corrected char index instead of token index

Changed the index used to add the label because `displacy.render` apparently uses char index
This commit is contained in:
atomobianco 2017-11-26 23:55:25 +01:00 committed by GitHub
parent c132c1a143
commit f6a82da907
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -354,7 +354,8 @@ p
# append mock entity for match in displaCy style to matched_sents # append mock entity for match in displaCy style to matched_sents
# get the match span by ofsetting the start and end of the span with the # get the match span by ofsetting the start and end of the span with the
# start and end of the sentence in the doc # start and end of the sentence in the doc
match_ents = [{'start': span.start-sent.start, 'end': span.end-sent.start, match_ents = [{'start': span.start_char - sent.start_char,
'end': span.end_char - sent.start_char,
'label': 'MATCH'}] 'label': 'MATCH'}]
matched_sents.append({'text': sent.text, 'ents': match_ents }) matched_sents.append({'text': sent.text, 'ents': match_ents })