spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-06 15:29:47 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	f85c8464f7	Draft support of regression loss in parser	2017-05-13 17:17:27 -05:00
Matthew Honnibal	827b5af697	Update draft of parser neural network model Model is good, but code is messy. Currently requires Chainer, which may cause the build to fail on machines without a GPU. Outline of the model: We first predict context-sensitive vectors for each word in the input: (embed_lower \| embed_prefix \| embed_suffix \| embed_shape) >> Maxout(token_width) >> convolution ** 4 This convolutional layer is shared between the tagger and the parser. This prevents the parser from needing tag features. To boost the representation, we make a "super tag" with POS, morphology and dependency label. The tagger predicts this by adding a softmax layer onto the convolutional layer --- so, we're teaching the convolutional layer to give us a representation that's one affine transform from this informative lexical information. This is obviously good for the parser (which backprops to the convolutions too). The parser model makes a state vector by concatenating the vector representations for its context tokens. Current results suggest few context tokens works well. Maybe this is a bug. The current context tokens: * S0, S1, S2: Top three words on the stack * B0, B1: First two words of the buffer * S0L1, S0L2: Leftmost and second leftmost children of S0 * S0R1, S0R2: Rightmost and second rightmost children of S0 * S1L1, S1L2, S1R2, S1R, B0L1, B0L2: Likewise for S1 and B0 This makes the state vector quite long: 13T, where T is the token vector width (128 is working well). Fortunately, there's a way to structure the computation to save some expense (and make it more GPU friendly). The parser typically visits 2N states for a sentence of length N (although it may visit more, if it back-tracks with a non-monotonic transition). A naive implementation would require 2N (B, 13T) @ (13T, H) matrix multiplications for a batch of size B. We can instead perform one (BN, T) @ (T, 13*H) multiplication, to pre-compute the hidden weights for each positional feature wrt the words in the batch. (Note that our token vectors come from the CNN -- so we can't play this trick over the vocabulary. That's how Stanford's NN parser works --- and why its model is so big.) This pre-computation strategy allows a nice compromise between GPU-friendliness and implementation simplicity. The CNN and the wide lower layer are computed on the GPU, and then the precomputed hidden weights are moved to the CPU, before we start the transition-based parsing process. This makes a lot of things much easier. We don't have to worry about variable-length batch sizes, and we don't have to implement the dynamic oracle in CUDA to train. Currently the parser's loss function is multilabel log loss, as the dynamic oracle allows multiple states to be 0 cost. This is defined as: (exp(score) / Z) - (exp(score) / gZ) Where gZ is the sum of the scores assigned to gold classes. I'm very interested in regressing on the cost directly, but so far this isn't working well. Machinery is in place for beam-search, which has been working well for the linear model. Beam search should benefit greatly from the pre-computation trick.	2017-05-12 16:09:15 -05:00
Matthew Honnibal	b44f7e259c	Clean up unused parser code	2017-05-08 15:42:04 +02:00
Matthew Honnibal	17efb1c001	Change width	2017-05-08 08:40:13 -05:00
Matthew Honnibal	5dffb85184	Don't use gpu	2017-05-08 08:39:59 -05:00
Matthew Honnibal	bef89ef23d	Mergery	2017-05-08 08:29:36 -05:00
Matthew Honnibal	245372973d	Don't use tagger to predict tags	2017-05-08 07:55:34 -05:00
Matthew Honnibal	50ddc9fc45	Fix infinite loop bug	2017-05-08 07:54:26 -05:00
Matthew Honnibal	94e86ae00a	Predict tags with encoder	2017-05-08 07:53:45 -05:00
Matthew Honnibal	56073a11ef	Don't use tags when calculating token vectors	2017-05-08 07:52:24 -05:00
Matthew Honnibal	7a33f1e2b7	Add dep to supertag.	2017-05-08 07:50:01 -05:00
Matthew Honnibal	66252f3e71	Change vector width	2017-05-08 14:47:11 +02:00
Matthew Honnibal	a66a4a4d0f	Replace einsums	2017-05-08 14:46:50 +02:00
Matthew Honnibal	8d2eab74da	Use PretrainableMaxouts	2017-05-08 14:24:55 +02:00
Matthew Honnibal	807cb2e370	Add PretrainableMaxouts	2017-05-08 14:24:43 +02:00
Matthew Honnibal	2e2268a442	Precomputable hidden now working	2017-05-08 11:36:37 +02:00
Matthew Honnibal	10682d35ab	Get pre-computed version working	2017-05-08 00:38:35 +02:00
Matthew Honnibal	35458987e8	Checkpoint -- nearly finished reimpl	2017-05-07 23:05:01 +02:00
Matthew Honnibal	4441866f55	Checkpoint -- nearly finished reimpl	2017-05-07 22:47:06 +02:00
Matthew Honnibal	6782eedf9b	Tmp GPU code	2017-05-07 11:04:24 -05:00
Matthew Honnibal	e420e5a809	Tmp	2017-05-07 07:31:09 -05:00
Matthew Honnibal	12039e80ca	Switch to single matmul for state layer	2017-05-07 14:26:34 +02:00
Matthew Honnibal	700979fb3c	CPU/GPU compat	2017-05-07 04:01:11 +02:00
Matthew Honnibal	f99f5b75dc	working residual net	2017-05-07 03:57:26 +02:00
Matthew Honnibal	bdf2dba9fb	WIP on refactor, with hidde pre-computing	2017-05-07 02:02:43 +02:00
Matthew Honnibal	b439e04f8d	Learning smoothly	2017-05-06 20:38:12 +02:00
Matthew Honnibal	08bee76790	Learns things	2017-05-06 18:24:38 +02:00
Matthew Honnibal	04ae1c01f1	Learns things	2017-05-06 18:21:02 +02:00
Matthew Honnibal	bcf4cd0a5f	Learns things	2017-05-06 17:37:36 +02:00
Matthew Honnibal	8e48b58cd6	Gradients look correct	2017-05-06 16:47:15 +02:00
Matthew Honnibal	7e04260d38	Data running through, likely errors in model	2017-05-06 14:22:20 +02:00
Matthew Honnibal	fa7c1990b6	Restore tok2vec function	2017-05-05 20:12:03 +02:00
Matthew Honnibal	efe9630e1c	Bug fixes	2017-05-05 20:09:50 +02:00
Matthew Honnibal	ef4fa594aa	Draft of NN parser, to be tested	2017-05-05 19:20:39 +02:00
Matthew Honnibal	7d1df50aec	Draft up Parser model	2017-05-04 13:31:40 +02:00
Matthew Honnibal	ccaf26206b	Pseudocode for parser	2017-05-04 12:17:59 +02:00
Ines Montani	6e1fad92a1	Update CONTRIBUTORS.md	2017-05-03 10:01:40 +02:00
ines	e2380d8789	Update README.rst	2017-05-03 10:00:04 +02:00
ines	f9384b0fbd	Update alpha languages and add aside for tokenizer dependencies	2017-05-03 09:58:31 +02:00
Ines Montani	f0d7a87e18	Merge pull request #1035 from uetchy/japanese-support Japanese support	2017-05-03 09:44:54 +02:00
Ines Montani	3ea23a3f4d	Fix formatting	2017-05-03 09:44:38 +02:00
Ines Montani	d730eb0c0d	Raise custom ImportError if importing janome fails	2017-05-03 09:43:29 +02:00
Ines Montani	949ad6594b	Add newline	2017-05-03 09:38:43 +02:00
Ines Montani	d12ca587ea	Add newline	2017-05-03 09:38:29 +02:00
Ines Montani	8676cd0135	Add newline	2017-05-03 09:38:07 +02:00
Yasuaki Uechi	0e7a9b9fac	Add Japanese to 'Alpha support’ section	2017-05-03 13:56:45 +09:00
Yasuaki Uechi	c8f83aeb87	Add basic japanese support	2017-05-03 13:56:21 +09:00
Ines Montani	f26a3b5a50	Merge pull request #1025 from Ferdous-Al-Imran/master	2017-04-27 14:36:37 +02:00
Ines Montani	fb96f88b59	Update info on CoNLL format and include link	2017-04-27 14:36:08 +02:00
Matthew Honnibal	31ec9e1371	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-27 13:21:39 +02:00

1 2 3 4 5 ...

5054 Commits