Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
eaf2ad59f1
* Fix use of mock Package object
2015-12-31 04:13:15 +01:00
Matthew Honnibal
029136a007
* Fix resource loading for Matcher
2015-12-31 02:45:12 +01:00
Matthew Honnibal
a6ba43ecaf
* Fix errors in packaging revision
2015-12-29 18:37:26 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Henning Peters
8359bd4d93
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
2015-12-18 09:52:55 +01:00
Henning Peters
9027cef3bc
access model via sputnik
2015-12-07 06:01:28 +01:00
Matthew Honnibal
68f479e821
* Rename Doc.data to Doc.c
2015-11-04 00:15:14 +11:00
Matthew Honnibal
6727a46bb5
* Fix Issue #118 : Matcher behaves unpredictably when matches overlap.
2015-10-19 16:45:32 +11:00
Matthew Honnibal
c99285b8b9
* Clean up C++ usage in spacy/matcher.pyx
2015-10-18 17:20:50 +11:00
Matthew Honnibal
20fd36a0f7
* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.
2015-10-13 13:44:41 +11:00
Matthew Honnibal
85ce36ab11
* Refactor symbols, so that frequency rank can be derived from the orth id of a word.
2015-10-13 13:44:39 +11:00
Matthew Honnibal
801d55a6d9
* Fix phrase matcher
2015-10-09 02:00:45 +11:00
Matthew Honnibal
1def5a6cbe
* Fix print statements in matcher
2015-09-08 15:38:19 +02:00
Matthew Honnibal
86c888667f
* Merge in changes from de branch
2015-09-06 19:49:28 +02:00
Matthew Honnibal
d2fc104a26
* Begin merge of Gazetteer and DE branches
2015-09-06 19:45:15 +02:00
Matthew Honnibal
6427a3fcac
* Temporarily import flag attributes in matcher
2015-09-06 17:53:12 +02:00
Matthew Honnibal
430affc347
* Fix missing n_patterns property in Matcher class. Fix from_dir method
2015-08-26 19:17:02 +02:00
Matthew Honnibal
6f1743692a
* Work on language-independent refactoring
2015-08-23 20:49:18 +02:00
Matthew Honnibal
cad0cca4e3
* Tmp
2015-08-22 22:04:34 +02:00
Matthew Honnibal
9f65879991
* Fix shape attr bug, and fix handling of false positive matches
2015-08-06 17:28:14 +02:00
Matthew Honnibal
383dfabd67
* Fix matcher setting of entities
2015-08-06 16:27:01 +02:00
Matthew Honnibal
cd7d1682cd
* Fix loading of gazetteer.json file
2015-08-06 16:08:25 +02:00
Matthew Honnibal
5737115e1e
* Work on gazetteer matching
2015-08-06 14:33:21 +02:00
Matthew Honnibal
9c1724ecae
* Gazetteer stuff working, now need to wire up to API
2015-08-06 00:35:40 +02:00
Matthew Honnibal
5bc0e83f9a
* Reimplement matching in Cython, instead of Python.
2015-08-05 01:05:54 +02:00
Matthew Honnibal
4c87a696b3
* Add draft dfa matcher, in Python. Passing tests.
2015-08-04 15:55:28 +02:00