spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-11 19:21:15 +03:00

History

Matthew Honnibal 3d182fbc43 Represent fused tokens in GoldParse Entries in GoldParse.{words, heads, tags, deps, ner} can now be lists instead of single values, to handle getting the analysis for fused tokens. For instance, let's say we have a token like "hows", while the gold-standard has two tokens, ["how", "s"]. We need to store the gold data for each of the two subtokens. Example gold.words: [["how", "s"], "it", "going"] Things get more complicated for heads, as we need to address particular subtokens. Let's say the gold heads for ["how", "s", "it", "going"] is [1, 1, 3, 1], i.e. the root "s" is within the subtoken. The gold.heads list would be: [[(0, 1), (0, 1)], 2, (0, 1)] The tuples indicate token 0, subtoken 1. A helper method _flatten_fused_heads is available that unpacks the above to [1, 1, 3, 1].		2018-04-01 17:18:18 +02:00
..
cli	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
data	Make spacy/data a package	2017-03-18 20:04:22 +01:00
displacy	Don't use deprecated Doc.merge call in displaCy	2018-01-27 11:25:05 +01:00
lang	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
syntax	Go back to letting Break work with deeper stacks	2018-04-01 14:32:15 +02:00
tests	Add test for one-to-many alignment	2018-04-01 14:53:49 +02:00
tokens	WIP on adding split-token actions to parser	2018-03-31 20:05:27 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Remove dummy variable from function calls	2018-01-05 09:37:05 +01:00
__main__.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
_align.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
_matcher2_notes.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
_ml.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
about.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
attrs.pxd	Fix LANG symbol	2018-02-17 18:10:50 +01:00
attrs.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
compat.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
glossary.py	Fix typo in glossary (resolves #1964 )	2018-02-10 11:58:41 +01:00
gold.pxd	Allocate fused tokens array in GoldParseC	2018-04-01 13:43:56 +02:00
gold.pyx	Represent fused tokens in GoldParse	2018-04-01 17:18:18 +02:00
language.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
lemmatizer.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
lexeme.pxd	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
lexeme.pyx	added new lexical feat to lexeme	2018-02-11 18:51:48 +01:00
matcher.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
morphology.pxd	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
morphology.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
pipeline.pxd	Fix names of pipeline components	2017-10-26 12:38:23 +02:00
pipeline.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
scorer.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
strings.pxd	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
strings.pyx	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
structs.pxd	Make TokenC.sent_tart an int, to allow ternary value	2017-10-08 19:58:54 +02:00
symbols.pxd	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
symbols.pyx	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
tokenizer.pxd	Disable tokenizer cache for special-cases. Fixes #1250	2017-10-24 16:08:05 +02:00
tokenizer.pyx	Merge pull request #1611 from fsonntag/master	2017-11-29 23:11:23 +01:00
typedefs.pxd	Work on changing StringStore to return hashes.	2017-05-28 12:36:27 +02:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"	2018-03-27 19:23:02 +02:00
vectors.pyx	Fix Vectors pickling	2018-03-14 16:59:37 +01:00
vocab.pxd	Add Vocab.cfg attr, to hold stuff like oov probs	2017-10-30 16:08:50 +01:00
vocab.pyx	Make Vocab.__contains__ work with ints. Fixes #1868	2018-01-23 23:26:47 +01:00