Henning Peters
846fa49b2a
distinct load() and from_package() methods
2016-01-16 10:00:57 +01:00
Henning Peters
211913d689
add about.py, adapt setup.py
2016-01-15 18:57:01 +01:00
Henning Peters
f8a8f97d25
cleanup
2016-01-15 18:13:37 +01:00
Henning Peters
780cb847c9
add default_model to about
2016-01-15 18:07:15 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Matthew Honnibal
478a79a3d5
* Add test for Issue #220 : Whitespace being tagged as noun
2016-01-15 16:17:07 +01:00
Henning Peters
d9471f684f
fix typo
2016-01-14 12:14:12 +01:00
Henning Peters
9b75d872b0
fix model download
2016-01-14 12:02:56 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
3fbfba575a
* xfail the contractions test
2015-12-31 13:16:28 +01:00
Matthew Honnibal
3bd910ccad
* Merge therell test
2015-12-31 11:55:18 +01:00
Matthew Honnibal
eaf2ad59f1
* Fix use of mock Package object
2015-12-31 04:13:15 +01:00
Matthew Honnibal
029136a007
* Fix resource loading for Matcher
2015-12-31 02:45:12 +01:00
Matthew Honnibal
55bcdf8bdd
* Fix errors
2015-12-29 22:32:03 +01:00
Matthew Honnibal
a6ba43ecaf
* Fix errors in packaging revision
2015-12-29 18:37:26 +01:00
Matthew Honnibal
4b4eec8b47
* Fix Issue #201 : Tokenization of there'll
2015-12-29 18:09:09 +01:00
Matthew Honnibal
86ee9d046d
* Remove test that belongs to a change for master
2015-12-29 18:07:23 +01:00
Matthew Honnibal
a2dfdec85d
* Clean up spacy.util
2015-12-29 18:06:09 +01:00
Matthew Honnibal
aec130af56
Use util.Package class for io
...
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal
0e2498da00
* Replace from_package with load() classmethod in Vocab
2015-12-29 16:56:51 +01:00
Matthew Honnibal
c5902f2b4b
* Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod
2015-12-29 16:56:02 +01:00
Matthew Honnibal
4131e45543
* Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way
2015-12-29 16:55:03 +01:00
Matthew Honnibal
f5dea1406d
* Fix silly mistake in Language.__init__
2015-12-28 18:48:57 +01:00
Matthew Honnibal
187960606f
* Fix pickle problems
2015-12-28 16:54:03 +01:00
Matthew Honnibal
8c7e149ec9
* Replace kwargs argument of Language.__init__ with explicit arguments, to fix pickle bug
2015-12-28 15:56:27 +01:00
Henning Peters
32d655b6e1
bump version
2015-12-28 09:34:39 +01:00
Matthew Honnibal
8b61d45ed0
* Fix merge conflicts for headers branch
2015-12-27 17:46:25 +01:00
Matthew Honnibal
6bb9c7f311
Merge pull request #202 from henningpeters/sputnik
...
access model via sputnik
2015-12-28 03:29:53 +11:00
Henning Peters
0e321a7105
get mingw32 to work
2015-12-22 23:25:38 +01:00
Henning Peters
d8d348bb55
allow to specify version constraint within model name
2015-12-18 19:12:08 +01:00
Henning Peters
7f7299cafb
Merge branch 'tmpdir' into headers
2015-12-18 12:25:25 +01:00
Henning Peters
cfa187aaf0
fix tests
2015-12-18 10:58:02 +01:00
Henning Peters
8359bd4d93
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
2015-12-18 09:52:55 +01:00
Henning Peters
970278a3d6
no need to link data dir anymore
2015-12-18 09:49:45 +01:00
Henning Peters
4f3efb8eaf
avoid writing to /tmp (not cross-platform compatible)
2015-12-16 19:56:40 +01:00
Henning Peters
4ada39f472
avoid writing to /tmp (not cross-platform compatible)
2015-12-16 19:53:06 +01:00
Henning Peters
2d4efe40f9
fix sputnik call
2015-12-13 14:46:08 +01:00
Henning Peters
ac318b568c
new approach to dependency headers
2015-12-13 11:49:17 +01:00
Henning Peters
345dda6f53
small fixes, add package build step
2015-12-07 06:50:26 +01:00
Henning Peters
9027cef3bc
access model via sputnik
2015-12-07 06:01:28 +01:00
Henning Peters
73e5650be5
change index server
2015-11-18 18:09:46 +01:00
Henning Peters
50d15ea5d2
fix
2015-11-18 17:35:21 +01:00
Henning Peters
02a1dcec76
add data dir
2015-11-18 11:48:55 +01:00
Henning Peters
919a4f0b04
change data path, add repository
2015-11-18 11:40:46 +01:00
Henning Peters
12de895e60
fix version
2015-11-15 16:38:16 +01:00
Henning Peters
03d2f98cd5
add sputnik
2015-11-15 15:58:21 +01:00
Matthew Honnibal
ec7d36c3a4
* Add test for matcher end-point problem
2015-11-12 05:00:40 +11:00
Matthew Honnibal
d309622a27
* Add test for matcher end-point problem
2015-11-12 04:59:11 +11:00
Matthew Honnibal
56ea20a886
* Add test for matcher end-point problem
2015-11-12 04:58:53 +11:00
Matthew Honnibal
cfa4062147
* Add test for matcher end-point problem
2015-11-12 04:56:07 +11:00
Matthew Honnibal
5623242b3e
* Adjust NER rules, so that U entries in gazetteer don't become B moves to the model
2015-11-12 04:48:23 +11:00
Matthew Honnibal
d67d7d5a86
* Add test for NER inconsistency bug
2015-11-08 16:19:33 +01:00
Matthew Honnibal
44fbdc7260
* Fix bug in NER transition system, that sometimes left no valid moves
2015-11-08 16:19:12 +01:00
Matthew Honnibal
ab5aac5b2f
* Add .rank property to Token and Lexeme, for frequency rank
2015-11-08 16:18:25 +01:00
Matthew Honnibal
fde9a22ec2
* Add new test for ner
2015-11-08 13:57:15 +01:00
Matthew Honnibal
e92371bb54
* Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed.
2015-11-08 22:17:51 +11:00
Matthew Honnibal
3b74739c3e
* Download updated data
2015-11-08 21:24:25 +11:00
Matthew Honnibal
31da42eb27
* Mark tests that require models
2015-11-07 19:27:38 +11:00
Matthew Honnibal
8e26a28616
* Mark tests that require models
2015-11-07 19:10:56 +11:00
Matthew Honnibal
15eab7354f
* Remove extraneous test files
2015-11-07 18:45:13 +11:00
Matthew Honnibal
6f47074214
* Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling.
2015-11-07 18:25:17 +11:00
Matthew Honnibal
1cfa20fb17
* Fix sentence-final whitespace issue
2015-11-07 17:34:46 +11:00
Matthew Honnibal
7663970d5f
* Removed unused i variable from Span, and set attributes to read-only
2015-11-07 17:06:15 +11:00
Matthew Honnibal
4b3c96d76d
* Fix zero-length spans
2015-11-07 17:05:16 +11:00
Matthew Honnibal
888c05a7fa
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 11:02:44 +11:00
Matthew Honnibal
fc2185bfe3
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 10:48:31 +11:00
Matthew Honnibal
954442a807
* Fix variable naming in StepwiseState, for thinc 4.0
2015-11-07 10:30:45 +11:00
Matthew Honnibal
06f26d258e
* Fix test_basic_create
2015-11-07 10:04:37 +11:00
Matthew Honnibal
1d3884c46d
* Fix test_basic_create
2015-11-07 10:03:56 +11:00
Matthew Honnibal
cc8febcbe1
* Fix Span comparison
2015-11-07 09:54:14 +11:00
Matthew Honnibal
af70dc166a
* Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect.
2015-11-07 09:52:00 +11:00
Matthew Honnibal
a9b612abdf
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
2015-11-07 09:01:12 +11:00
Matthew Honnibal
56499d89ef
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
2015-11-07 08:55:34 +11:00
Andreas Grivas
83ca4e0b93
* use old merge tests - add more
2015-11-07 07:57:04 +11:00
Andreas Grivas
4be7fda453
* span start, end -> properties. autoupdate after merge
2015-11-07 07:57:04 +11:00
Andreas Grivas
562db6d2d0
* merge add lex last - add index finder funcs
2015-11-07 07:57:04 +11:00
Matthew Honnibal
a06e3c8963
* Fix bone-headed mistake in StateClass.E
2015-11-07 07:35:28 +11:00
Matthew Honnibal
d24b8509e4
* Correct screw ups from the previous commits
2015-11-07 06:51:41 +11:00
Matthew Honnibal
5efad178b5
* Set ent tag when close entity
2015-11-07 06:09:25 +11:00
Matthew Honnibal
9285f01d26
* Fix broken StateClass.E tracking
2015-11-07 06:06:39 +11:00
Matthew Honnibal
19136b0e7d
* Add better debug message for illegal move
2015-11-07 05:34:37 +11:00
Matthew Honnibal
2733816b7b
* Fix whitespace
2015-11-07 05:31:06 +11:00
Matthew Honnibal
01ab464383
* Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169
2015-11-07 05:30:44 +11:00
Matthew Honnibal
b65633f270
* Fix function that returns nth entity in StateClass. Was only returning the first.
2015-11-07 05:29:11 +11:00
Matthew Honnibal
410b6f9ec1
* Remove deprecated _ml.pyx. We now use the nicer APIs provided by thinc 4.0, and subclass the AveragedPerceptron class.
2015-11-07 05:13:10 +11:00
Matthew Honnibal
3c162dcac3
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
2015-11-07 03:24:30 +11:00
Matthew Honnibal
9d1b2a103a
* Fix capitalization in lemmatizer
2015-11-06 05:44:35 +11:00
Matthew Honnibal
6ed3aedf79
* Merge vocab changes
2015-11-06 00:48:08 +11:00
Matthew Honnibal
72abbb43fb
* Add type declarations in strings.pyx
2015-11-06 00:47:26 +11:00
Matthew Honnibal
5b2af4864f
* When lemmatizing non-noun, non-verb, non-adj words, output lower-case
2015-11-06 00:45:09 +11:00
Matthew Honnibal
754bf04162
* Remove declaration of Model.update
2015-11-06 00:31:15 +11:00
Matthew Honnibal
e18bdff23a
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-11-06 00:26:15 +11:00
Matthew Honnibal
b9991fbd20
* Update to use thinc 3.0
2015-11-06 00:25:59 +11:00
Matthew Honnibal
864a8f45d8
* Use unicode in StringStore.intern, instead of unreliably casting to bytes.
2015-11-05 11:32:19 +00:00
Matthew Honnibal
b18204cd52
* Fix StringStore._realloc, re Issue #155
2015-11-05 11:28:26 +00:00
Matthew Honnibal
f8004c5f65
* Begin upgrading to improved thinc API
2015-11-05 03:53:03 +11:00
Matthew Honnibal
adc7bbd6cf
* Fix name of like_num in default_lex_attrs
2015-11-04 22:02:47 +11:00
Matthew Honnibal
e96faf29e7
* Rename like_number to like_num, to fix inconsistency re Issue #166
2015-11-04 22:01:44 +11:00
Matthew Honnibal
65934b7cd4
* Enforce import of ujson in strings.pyx, because otherwise it's too slow
2015-11-04 00:32:02 +11:00
Matthew Honnibal
1ce5d5602d
* Rename Doc.data to Doc.c
2015-11-04 00:17:13 +11:00