* Add some stream of consciousness about NER

2025-08-01 10:59:55 +03:00 · 2016-01-23 13:41:01 +01:00 · 2016-01-23 13:41:01 +01:00 · af332f5095
commit af332f5095
parent 3af84cfd6e
1 changed files with 1 additions and 1 deletions
--- a/website/src/jade/tutorials/bootstrap-ner-word2vec/index.jade
+++ b/website/src/jade/tutorials/bootstrap-ner-word2vec/index.jade
@ -31,7 +31,7 @@ include ./meta.jade

    p The BILOU system introduces some redundancy, to make the decision boundary between the classes a little sharper. The idea is that this might make them a bit more linearly separable. If an entity of type T is currently open, the relevant decision is In(T) vs Last(T). If no entity is open, the decision is Begin(x) vs. Unitary(x) vs Out. The accuracy advantage of this over the IOB scheme is well observed.

-    p A well set up tagging model will be equivalent to our parsing model. It's not very difficult to write your tagger such that invalid tag sequences are given zero probability. But to do that, you usually have to break the abstraction provided by the tagging set up. The linear chain CRF or HMM model invites you to consider the tags you're assigning as atomic. They're not --- they're structured. Similarly, unless you break abstraction, you'll probably write suboptimal features. The standard approach invites an Nth-order Markov assumption: you only get to ask questions about the last N tags. A typical N is 2 or 3. This means you can't ask a simple question, like "What's the first word of the current entity?". This is a natural question to ask, and a model that stops you from asking it is nonsense.
+    p A well set up tagging model will be equivalent to our parsing model. It's not very difficult to write your tagger such that invalid tag sequences are given zero probability. But to do that, you have to break the abstraction you've just adopted. The linear chain CRF or HMM model invites you to consider the tags you're assigning as atomic. They're not --- they're structured. Similarly, unless you break abstraction, you'll probably write suboptimal features. The standard approach invites an Nth-order Markov assumption: you only get to ask questions about the last N tags. A typical N is 2 or 3. This means you can't ask a simple question, like "What's the first word of the current entity?". This is a natural question to ask, and a model that stops you from asking it is nonsense.
    
   p So: spaCy's entity recognizer is an instance of class #[code spacy.parser.Parser]. It holds an instance of #[code spacy.parser.ner.BiluoPushDown], which owns an array of #[code Transition] structs. Each struct holds three function pointers, that are used to apply the action, determine whether the action is valid given the current state, and determine what its cost would be with respect to a gold-standard analysis.