Matthew Honnibal
3da1063b36
Add beam decoding to parser, to allow NER uncertainties
2017-07-20 15:02:55 +02:00
Matthew Honnibal
0ca5832427
Improve negative example handling in NER oracle
2017-07-20 00:18:49 +02:00
Matthew Honnibal
7996d21717
Fixes for new StringStore
2017-05-28 11:09:27 -05:00
Matthew Honnibal
84e66ca6d4
WIP on stringstore change. 27 failures
2017-05-28 14:06:40 +02:00
Matthew Honnibal
99316fa631
Use ordered dict to specify actions
2017-05-27 15:50:21 -05:00
Matthew Honnibal
3d5a536eaa
Improve efficiency of parser batching
2017-05-26 11:31:23 -05:00
Matthew Honnibal
e2136232f9
Exclude states with no matching gold annotations from parsing
2017-05-22 10:30:12 -05:00
Matthew Honnibal
8b04b0af9f
Remove freqs from transition_system
2017-05-20 02:20:48 -05:00
ines
0739ae7b76
Tidy up and fix formatting and imports
2017-04-15 13:05:15 +02:00
Matthew Honnibal
354458484c
WIP on add_label bug during NER training
...
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal
2611ac2a89
Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens
2017-03-16 09:38:28 -05:00
Matthew Honnibal
931feb3360
Allow beam parsing for NER
2017-03-11 11:12:01 -06:00
Matthew Honnibal
159e8c46e1
Merge old training fixes with newer state
2016-11-25 09:16:36 -06:00
Matthew Honnibal
39341598bb
Fix NER label calculation
2016-11-25 09:02:22 -06:00
Matthew Honnibal
301f3cc898
Fix Issue #429 . Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found.
2016-10-27 18:01:55 +02:00
Matthew Honnibal
f787cd29fe
Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.
2016-10-16 21:34:57 +02:00
Matthew Honnibal
9e09b39b9f
Revert "Changes to transition systems for new StringStore scheme"
...
This reverts commit 0442e0ab1e
.
2016-09-30 20:11:49 +02:00
Matthew Honnibal
0442e0ab1e
Changes to transition systems for new StringStore scheme
2016-09-30 19:58:51 +02:00
Matthew Honnibal
a47f00901b
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
2016-02-01 02:58:14 +01:00
Matthew Honnibal
daaad66448
* Now fully proxied
2016-02-01 02:37:08 +01:00
Matthew Honnibal
10877a7791
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
2016-01-30 14:31:36 +01:00
Matthew Honnibal
c8e0011ebc
* Add iterators to the NER and parser transition systems, to get the action types
2016-01-19 19:07:43 +01:00
Matthew Honnibal
5623242b3e
* Adjust NER rules, so that U entries in gazetteer don't become B moves to the model
2015-11-12 04:48:23 +11:00
Matthew Honnibal
44fbdc7260
* Fix bug in NER transition system, that sometimes left no valid moves
2015-11-08 16:19:12 +01:00
Matthew Honnibal
e92371bb54
* Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed.
2015-11-08 22:17:51 +11:00
Matthew Honnibal
af70dc166a
* Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect.
2015-11-07 09:52:00 +11:00
Matthew Honnibal
d24b8509e4
* Correct screw ups from the previous commits
2015-11-07 06:51:41 +11:00
Matthew Honnibal
5efad178b5
* Set ent tag when close entity
2015-11-07 06:09:25 +11:00
Matthew Honnibal
01ab464383
* Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169
2015-11-07 05:30:44 +11:00
Matthew Honnibal
fe43f8cf39
* Whitespace
2015-08-09 02:31:53 +02:00
Matthew Honnibal
59c3bf60a6
* Ensure entity recognizer doesn't over-write preset types
2015-08-06 16:09:08 +02:00
Matthew Honnibal
9c1724ecae
* Gazetteer stuff working, now need to wire up to API
2015-08-06 00:35:40 +02:00
Matthew Honnibal
d5255aad77
* Update freqs for missing tags in ner, for serializer
2015-07-23 01:17:11 +02:00
Matthew Honnibal
317cbbc015
* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
2015-07-19 15:18:17 +02:00
Matthew Honnibal
75aeccc064
* Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search
2015-06-28 11:02:34 +02:00
Matthew Honnibal
579735a095
* Remove import of _state module
2015-06-23 17:25:08 +02:00
Matthew Honnibal
15e177d7a1
* Fixes to unshift/fast-forward strategy. Getting 91.55 greedy on NW dev, gold preproc
2015-06-12 01:50:23 +02:00
Matthew Honnibal
e2f9a80713
* Remove old _state imports
2015-06-10 07:09:17 +02:00
Matthew Honnibal
18cc326dc0
* Bug fixes to ner.pyx
2015-06-10 06:57:41 +02:00
Matthew Honnibal
d68c686ec1
* Move StateClass into interface of transition functions
2015-06-10 01:35:28 +02:00
Matthew Honnibal
4b98b3e9c8
* Cost functions now take StateClass argument, instead of State*.
2015-06-10 00:40:43 +02:00
Matthew Honnibal
e0cf61f591
* Move StateClass into the interface for is_valid
2015-06-09 23:23:28 +02:00
Matthew Honnibal
1fee7ade61
* Tweak to ner
2015-06-05 23:48:43 +02:00
Matthew Honnibal
33e70b167f
* Remove dead code from ner.pyx
2015-06-05 17:12:47 +02:00
Matthew Honnibal
0114e7600d
* Fix NER oracle
2015-06-05 17:11:26 +02:00
Matthew Honnibal
6bf35cecc3
* Refactor transition system to use classes with staticmethods.
2015-06-05 02:27:17 +02:00
Matthew Honnibal
a513ec500f
* Have oracle functions take a struct instead of a Python object
2015-06-02 20:01:06 +02:00
Matthew Honnibal
0786d9b3c7
* Refactor TransitionSystem, adding set_valid method
2015-06-02 18:38:07 +02:00
Matthew Honnibal
c7876aa8b6
* Add get_valid method
2015-06-01 23:06:00 +02:00
Matthew Honnibal
76300bbb1b
* Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag.
2015-05-30 01:25:46 +02:00