spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-11 00:32:40 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	5cac951a16	Move new parser to nn_parser.pyx, and restore old parser, to make tests pass.	2017-05-14 00:55:01 +02:00
Matthew Honnibal	613ba79e2e	Fiddle with sizings for parser	2017-05-13 17:20:23 -05:00
Matthew Honnibal	827b5af697	Update draft of parser neural network model Model is good, but code is messy. Currently requires Chainer, which may cause the build to fail on machines without a GPU. Outline of the model: We first predict context-sensitive vectors for each word in the input: (embed_lower \| embed_prefix \| embed_suffix \| embed_shape) >> Maxout(token_width) >> convolution ** 4 This convolutional layer is shared between the tagger and the parser. This prevents the parser from needing tag features. To boost the representation, we make a "super tag" with POS, morphology and dependency label. The tagger predicts this by adding a softmax layer onto the convolutional layer --- so, we're teaching the convolutional layer to give us a representation that's one affine transform from this informative lexical information. This is obviously good for the parser (which backprops to the convolutions too). The parser model makes a state vector by concatenating the vector representations for its context tokens. Current results suggest few context tokens works well. Maybe this is a bug. The current context tokens: * S0, S1, S2: Top three words on the stack * B0, B1: First two words of the buffer * S0L1, S0L2: Leftmost and second leftmost children of S0 * S0R1, S0R2: Rightmost and second rightmost children of S0 * S1L1, S1L2, S1R2, S1R, B0L1, B0L2: Likewise for S1 and B0 This makes the state vector quite long: 13T, where T is the token vector width (128 is working well). Fortunately, there's a way to structure the computation to save some expense (and make it more GPU friendly). The parser typically visits 2N states for a sentence of length N (although it may visit more, if it back-tracks with a non-monotonic transition). A naive implementation would require 2N (B, 13T) @ (13T, H) matrix multiplications for a batch of size B. We can instead perform one (BN, T) @ (T, 13*H) multiplication, to pre-compute the hidden weights for each positional feature wrt the words in the batch. (Note that our token vectors come from the CNN -- so we can't play this trick over the vocabulary. That's how Stanford's NN parser works --- and why its model is so big.) This pre-computation strategy allows a nice compromise between GPU-friendliness and implementation simplicity. The CNN and the wide lower layer are computed on the GPU, and then the precomputed hidden weights are moved to the CPU, before we start the transition-based parsing process. This makes a lot of things much easier. We don't have to worry about variable-length batch sizes, and we don't have to implement the dynamic oracle in CUDA to train. Currently the parser's loss function is multilabel log loss, as the dynamic oracle allows multiple states to be 0 cost. This is defined as: (exp(score) / Z) - (exp(score) / gZ) Where gZ is the sum of the scores assigned to gold classes. I'm very interested in regressing on the cost directly, but so far this isn't working well. Machinery is in place for beam-search, which has been working well for the linear model. Beam search should benefit greatly from the pre-computation trick.	2017-05-12 16:09:15 -05:00
Matthew Honnibal	b16ae75824	Remove serializer hacks from pipeline classes	2017-05-09 18:16:40 +02:00
Matthew Honnibal	bef89ef23d	Mergery	2017-05-08 08:29:36 -05:00
Matthew Honnibal	94e86ae00a	Predict tags with encoder	2017-05-08 07:53:45 -05:00
Matthew Honnibal	a66a4a4d0f	Replace einsums	2017-05-08 14:46:50 +02:00
Matthew Honnibal	6782eedf9b	Tmp GPU code	2017-05-07 11:04:24 -05:00
Matthew Honnibal	f99f5b75dc	working residual net	2017-05-07 03:57:26 +02:00
Matthew Honnibal	7e04260d38	Data running through, likely errors in model	2017-05-06 14:22:20 +02:00
ines	d24589aa72	Clean up imports, unused code, whitespace, docstrings	2017-04-15 12:05:47 +02:00
ines	561f2a3eb4	Use consistent formatting for docstrings	2017-04-15 11:59:21 +02:00
Matthew Honnibal	354458484c	WIP on add_label bug during NER training Currently when a new label is introduced to NER during training, it causes the labels to be read in in an unexpected order. This invalidates the model.	2017-04-14 23:52:17 +02:00
Matthew Honnibal	2f63806ddb	Update config when adding label. Re #910	2017-03-25 22:35:44 +01:00
Raphaël Bournhonesque	f332bf05be	Remove unused import statements	2017-03-21 21:08:54 +01:00
Matthew Honnibal	7769bc31e3	Add beam-search classes	2017-03-15 09:27:41 -05:00
Matthew Honnibal	fa23278ee3	Add classes for beam parser and beam NER	2017-03-11 12:45:37 -06:00
Matthew Honnibal	f77a5bb60a	Switch back to greedy parser	2017-03-11 11:11:30 -06:00
Matthew Honnibal	dcce9ca3f3	Use beam parser	2017-03-11 07:00:20 -06:00
Matthew Honnibal	b86f8af0c1	Fix doc strings	2016-11-01 12:25:36 +01:00
Matthew Honnibal	3e688e6d4b	Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness.	2016-10-23 17:45:44 +02:00
Matthew Honnibal	f787cd29fe	Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.	2016-10-16 21:34:57 +02:00
Matthew Honnibal	4bb73b1a93	Fix parser labels in pipeline	2016-10-16 17:03:22 +02:00
Matthew Honnibal	a079677984	Fix omission of O action when creating blank entity recognizer	2016-10-16 11:43:25 +02:00
Matthew Honnibal	509b30834f	Add a pipeline module, to collect and wrap processes for annotation	2016-10-16 01:47:12 +02:00

1 2 3

125 Commits