Matthew Honnibal
e3db39dd21
* Fix compiler warning about signed/unsigned comparison
2016-02-01 09:08:07 +01:00
Matthew Honnibal
b3802562d6
Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2
2016-02-01 08:59:24 +01:00
Matthew Honnibal
4b08a3fafd
* Fix merge conflict
2016-02-01 08:58:18 +01:00
Matthew Honnibal
5188f6d9d8
* Fix parseC function
2016-02-01 08:48:48 +01:00
Matthew Honnibal
bcf8f7ba40
* Add a parse_batch method to Parser, that releases the GIL around a batch of documents.
2016-02-01 08:34:55 +01:00
Matthew Honnibal
d5579cd0d8
Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2
2016-02-01 03:08:49 +01:00
Matthew Honnibal
490ba65398
* Use openmp in parser
2016-02-01 03:08:42 +01:00
Matthew Honnibal
cb78d91ec5
* Fix ArcEager.set_valid
2016-02-01 03:07:37 +01:00
Matthew Honnibal
28e5ad62bc
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
2016-02-01 03:00:15 +01:00
Matthew Honnibal
a47f00901b
* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents
2016-02-01 02:58:14 +01:00
Matthew Honnibal
daaad66448
* Now fully proxied
2016-02-01 02:37:08 +01:00
Matthew Honnibal
7a0e3bb9c1
* Continue proxying. Some problem currently
2016-02-01 02:22:21 +01:00
Matthew Honnibal
2169bbb7ea
* Shadow StateClass with StateC, to start proxying
2016-02-01 01:16:14 +01:00
Matthew Honnibal
2fa228458e
* Add _state file, which StateClass will proxy to
2016-02-01 01:09:21 +01:00
Matthew Honnibal
9410e74c92
* Switch parser to use nogil functions
2016-01-30 20:27:07 +01:00
Matthew Honnibal
10877a7791
* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser
2016-01-30 14:31:36 +01:00
Matthew Honnibal
84c5dfbfc3
* Clean up debugging python list
2016-01-19 20:10:32 +01:00
Matthew Honnibal
04d0686b26
* Make TransitionSystem.add_action idempotent, i.e. ignore duplicate added actions.
2016-01-19 20:10:04 +01:00
Matthew Honnibal
65c5bc4988
* Add add_label method, to allow users to register new entity types and dependency labels.
2016-01-19 19:11:02 +01:00
Matthew Honnibal
151aa0b0e2
* Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model
2016-01-19 19:09:33 +01:00
Matthew Honnibal
c8e0011ebc
* Add iterators to the NER and parser transition systems, to get the action types
2016-01-19 19:07:43 +01:00
Matthew Honnibal
04177debd0
* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184 , but may cause further problems. Needs testing.
2016-01-19 02:54:15 +01:00
Matthew Honnibal
3dc398b727
* Fix merge conflict in requirements.txt
2016-01-16 16:20:49 +01:00
Matthew Honnibal
c025a0c64b
* Check for KeyboardInerrupt in parser.__call__
2016-01-16 16:18:44 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal
5623242b3e
* Adjust NER rules, so that U entries in gazetteer don't become B moves to the model
2015-11-12 04:48:23 +11:00
Matthew Honnibal
44fbdc7260
* Fix bug in NER transition system, that sometimes left no valid moves
2015-11-08 16:19:12 +01:00
Matthew Honnibal
e92371bb54
* Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed.
2015-11-08 22:17:51 +11:00
Matthew Honnibal
6f47074214
* Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling.
2015-11-07 18:25:17 +11:00
Matthew Honnibal
1cfa20fb17
* Fix sentence-final whitespace issue
2015-11-07 17:34:46 +11:00
Matthew Honnibal
888c05a7fa
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 11:02:44 +11:00
Matthew Honnibal
fc2185bfe3
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 10:48:31 +11:00
Matthew Honnibal
954442a807
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 10:30:45 +11:00
Matthew Honnibal
af70dc166a
* Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect.
2015-11-07 09:52:00 +11:00
Matthew Honnibal
a06e3c8963
* Fix bone-headed mistake in StateClass.E
2015-11-07 07:35:28 +11:00
Matthew Honnibal
d24b8509e4
* Correct screw ups from the previous commits
2015-11-07 06:51:41 +11:00
Matthew Honnibal
5efad178b5
* Set ent tag when close entity
2015-11-07 06:09:25 +11:00
Matthew Honnibal
9285f01d26
* Fix broken StateClass.E tracking
2015-11-07 06:06:39 +11:00
Matthew Honnibal
19136b0e7d
* Add better debug message for illegal move
2015-11-07 05:34:37 +11:00
Matthew Honnibal
2733816b7b
* Fix whitespace
2015-11-07 05:31:06 +11:00
Matthew Honnibal
01ab464383
* Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169
2015-11-07 05:30:44 +11:00
Matthew Honnibal
b65633f270
* Fix function that returns nth entity in StateClass. Was only returning the first.
2015-11-07 05:29:11 +11:00
Matthew Honnibal
3c162dcac3
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
2015-11-07 03:24:30 +11:00
Matthew Honnibal
b9991fbd20
* Update to use thinc 3.0
2015-11-06 00:25:59 +11:00
Matthew Honnibal
68f479e821
* Rename Doc.data to Doc.c
2015-11-04 00:15:14 +11:00
Matthew Honnibal
329ae57520
* Fix whitespace attachment thing
2015-10-13 09:46:38 +02:00
Matthew Honnibal
37919eac82
* Fix whitespace attachment in simpler way. Leaves problem with setting left/right children.
2015-10-13 18:23:24 +11:00
Matthew Honnibal
c70eb776ae
* Fix whitespace attachment, so that left/right children are consistent with head.
2015-10-13 15:58:22 +11:00
Matthew Honnibal
20fd36a0f7
* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.
2015-10-13 13:44:41 +11:00
Matthew Honnibal
9dd2f25c74
* Fix Issue #131 : Force whitespace characters to attach syntactically to previous token, and ensure they cannot serve as stand-alone 'sentence' units.
2015-10-10 15:53:30 +11:00