Matthew Honnibal
bede11b67c
Improve label management in parser and NER ( #2108 )
...
This patch does a few smallish things that tighten up the training workflow a little, and allow memory use during training to be reduced by letting the GoldCorpus stream data properly.
Previously, the parser and entity recognizer read and saved labels as lists, with extra labels noted separately. Lists were used becaue ordering is very important, to ensure that the label-to-class mapping is stable.
We now manage labels as nested dictionaries, first keyed by the action, and then keyed by the label. Values are frequencies. The trick is, how do we save new labels? We need to make sure we iterate over these in the same order they're added. Otherwise, we'll get different class IDs, and the model's predictions won't make sense.
To allow stable sorting, we map the new labels to negative values. If we have two new labels, they'll be noted as having "frequency" -1 and -2. The next new label will then have "frequency" -3. When we sort by (frequency, label), we then get a stable sort.
Storing frequencies then allows us to make the next nice improvement. Previously we had to iterate over the whole training set, to pre-process it for the deprojectivisation. This led to storing the whole training set in memory. This was most of the required memory during training.
To prevent this, we now store the frequencies as we stream in the data, and deprojectivize as we go. Once we've built the frequencies, we can then apply a frequency cut-off when we decide how many classes to make.
Finally, to allow proper data streaming, we also have to have some way of shuffling the iterator. This is awkward if the training files have multiple documents in them. To solve this, the GoldCorpus class now writes the training data to disk in msgpack files, one per document. We can then shuffle the data by shuffling the paths.
This is a squash merge, as I made a lot of very small commits. Individual commit messages below.
* Simplify label management for TransitionSystem and its subclasses
* Fix serialization for new label handling format in parser
* Simplify and improve GoldCorpus class. Reduce memory use, write to temp dir
* Set actions in transition system
* Require thinc 6.11.1.dev4
* Fix error in parser init
* Add unicode declaration
* Fix unicode declaration
* Update textcat test
* Try to get model training on less memory
* Print json loc for now
* Try rapidjson to reduce memory use
* Remove rapidjson requirement
* Try rapidjson for reduced mem usage
* Handle None heads when projectivising
* Stream json docs
* Fix train script
* Handle projectivity in GoldParse
* Fix projectivity handling
* Add minibatch_by_words util from ud_train
* Minibatch by number of words in spacy.cli.train
* Move minibatch_by_words util to spacy.util
* Fix label handling
* More hacking at label management in parser
* Fix encoding in msgpack serialization in GoldParse
* Adjust batch sizes in parser training
* Fix minibatch_by_words
* Add merge_subtokens function to pipeline.pyx
* Register merge_subtokens factory
* Restore use of msgpack tmp directory
* Use minibatch-by-words in train
* Handle retokenization in scorer
* Change back-off approach for missing labels. Use 'dep' label
* Update NER for new label management
* Set NER tags for over-segmented words
* Fix label alignment in gold
* Fix label back-off for infrequent labels
* Fix int type in labels dict key
* Fix int type in labels dict key
* Update feature definition for 8 feature set
* Update ud-train script for new label stuff
* Fix json streamer
* Print the line number if conll eval fails
* Update children and sentence boundaries after deprojectivisation
* Export set_children_from_heads from doc.pxd
* Render parses during UD training
* Remove print statement
* Require thinc 6.11.1.dev6. Try adding wheel as install_requires
* Set different dev version, to flush pip cache
* Update thinc version
* Update GoldCorpus docs
* Remove print statements
* Fix formatting and links [ci skip]
2018-03-19 02:58:08 +01:00
Matthew Honnibal
318c23d318
Increment thinc
2018-03-16 13:12:53 +01:00
Matthew Honnibal
39c50225e8
Update thinc
2018-03-16 03:57:47 +01:00
Matthew Honnibal
7be561c8be
Fix thinc requirement
2018-03-16 03:34:12 +01:00
Matthew Honnibal
53df6d867b
Require new thinc
2018-03-16 03:20:01 +01:00
Matthew Honnibal
f2fa8481c4
Require thinc v6.11
2018-03-13 13:59:35 +01:00
ines
9c8a0f6eba
Version-lock msgpack-python (see #2015 )
2018-02-22 19:42:03 +01:00
ines
f5f4de98d1
Version-lock msgpack-python (see #2015 )
2018-02-22 16:02:32 +01:00
Matthew Honnibal
f46bf2a7e9
Build _align.pyx
2018-02-20 17:32:13 +01:00
ines
6bba1db4cc
Drop six and related hacks as a dependency
2018-02-18 13:29:56 +01:00
ines
002ee80ddf
Add html5lib to setup.py to fix six error (see #1924 )
2018-02-02 20:32:08 +01:00
Matthew Honnibal
2e449c1fbf
Fix compiler flags, addressing #1591
2018-01-14 14:34:36 +01:00
Matthew Honnibal
04a92bd75e
Pin msgpack-numpy requirement
2017-12-06 03:24:24 +01:00
Hugo
aa898ab4e4
Drop support for EOL Python 2.6 and 3.3
2017-11-26 19:46:24 +02:00
Matthew Honnibal
716ccbb71e
Require thinc 6.10.1
2017-11-15 14:59:34 +01:00
Matthew Honnibal
314f5b9cdb
Require thinc 6.10.0
2017-10-28 18:20:10 +00:00
Matthew Honnibal
64e4ff7c4b
Merge 'tidy-up' changes into branch. Resolve conflicts
2017-10-28 13:16:06 +02:00
ines
7946464742
Remove spacy.tagger (now in pipeline)
2017-10-27 19:45:04 +02:00
Matthew Honnibal
531142a933
Merge remote-tracking branch 'origin/develop' into feature/better-parser
2017-10-27 12:34:48 +00:00
Matthew Honnibal
642eb28c16
Don't compile with OpenMP by default
2017-10-27 10:16:58 +00:00
Matthew Honnibal
90d1d9b230
Remove obsolete parser code
2017-10-26 13:22:45 +02:00
Matthew Honnibal
79fcf8576a
Compile with march=native
2017-10-18 21:46:34 +02:00
Matthew Honnibal
2eb0fe4957
Fix setup.py
2017-10-03 21:40:04 +02:00
Matthew Honnibal
b49cc8153a
Require correct thinc
2017-09-26 10:00:18 -05:00
ines
68f66aebf8
Use pkg_resources instead of pip for is_package ( resolves #1293 )
2017-09-16 20:27:59 +02:00
Matthew Honnibal
07cdbd1219
Require thinc 6.8.1, for Windows
2017-09-15 22:47:53 +02:00
Matthew Honnibal
96a4a9070b
Compile _beam_utils
2017-08-18 21:56:19 +02:00
Matthew Honnibal
f9ae86b01c
Fix requirement
2017-08-18 20:56:53 +02:00
Matthew Honnibal
69bcacdc09
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-18 20:47:13 +02:00
Matthew Honnibal
de7f3509d2
Compile CFile, for vector loading
2017-08-18 20:46:41 +02:00
Matthew Honnibal
426f84937f
Resolve conflicts when merging new beam parsing stuff
2017-08-18 13:38:32 -05:00
Matthew Honnibal
60d8111245
Require thinc 6.8.1
2017-08-15 03:12:26 -05:00
Matthew Honnibal
52c180ecf5
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit ea8de11ad5
, reversing
changes made to 08e443e083
.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
b353e4d843
Work on parser beam training
2017-08-12 14:47:45 -05:00
ines
495e042429
Add entry point-style auto alias for "spacy"
...
Simplest way to run commands as spacy xxx instead of python -m spacy
xxx, while avoiding environment conflicts
2017-08-09 12:17:30 +02:00
Matthew Honnibal
ff7418b0d9
Update requirements
2017-07-25 18:58:15 +02:00
Matthew Honnibal
b4cdd05466
Add vectors.pyx in setup
2017-06-05 12:45:29 +02:00
Matthew Honnibal
c811790095
Register vectors.pyx in setup
2017-06-05 12:32:22 +02:00
ines
152dc018a6
Remove syntax iterators from setup.py
2017-06-05 12:30:22 +02:00
Matthew Honnibal
a4dcc96c54
Require thinc bugfix
2017-06-05 04:02:52 -05:00
ines
71954d5fe7
Update Thinc version
2017-06-03 10:32:53 +02:00
ines
f45cd174bf
Update Thinc version
2017-06-02 18:48:16 +02:00
Matthew Honnibal
ae8010b526
Move weight serialization to Thinc
2017-06-01 02:56:12 -05:00
Matthew Honnibal
2e364f7ecd
Require msgpack
2017-05-29 13:47:29 +02:00
ines
3cc6fe1484
Add pip to requirements.txt and setup.py
2017-05-17 12:04:03 +02:00
Matthew Honnibal
48de4ed49f
Require thinc 6.6, and compile the nn_parser module
2017-05-14 01:20:28 +02:00
Matthew Honnibal
825c6403d8
Remove serializer
2017-05-09 17:28:30 +02:00
ines
564939391a
Remove spacy.orth
2017-05-09 01:21:47 +02:00
ines
229b8c3974
Tidy up
2017-05-07 18:36:35 +02:00
ines
a793174ae9
Use setuptools.find_packages()
2017-05-03 20:11:02 +02:00
Yasuaki Uechi
c8f83aeb87
Add basic japanese support
2017-05-03 13:56:21 +09:00
Ines Montani
7da9cefd25
Merge pull request #1022 from luvogels/master
...
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
417f430d23
Relax version contstraint
2017-04-20 15:39:24 +02:00
Gyorgy Orosz
4a06a2572c
Using ftfy for handling broken encoded strings.
2017-04-20 13:34:51 +02:00
luvogels
ff900ffd7c
Update setup.py
...
added nb
2017-04-19 21:02:26 +02:00
Matthew Honnibal
e482c369eb
Package converters module
2017-04-07 18:51:48 +02:00
Matthew Honnibal
cc24b6d8d5
Fix setup.py
2017-04-07 17:53:22 +02:00
Matthew Honnibal
eedafd8d82
Fix regex version pin
2017-04-07 17:47:11 +02:00
ines
c691caa9d3
Fix requests version
2017-04-07 17:35:35 +02:00
Matthew Honnibal
a001365c42
Require regex library
2017-04-07 15:43:34 +02:00
ines
7e4befec88
Add Hebrew to init and setup.py
2017-03-29 10:34:57 +02:00
Matthew Honnibal
9c17fb472f
Add tag for spaCy v3.6 compatibility
2017-03-19 01:40:24 +01:00
Matthew Honnibal
5941fb9e92
Make spacy/data a package
2017-03-18 20:04:22 +01:00
Matthew Honnibal
aa8ff9257f
Add spacy.en.lemmatizer to setup.py
2017-03-18 19:02:33 +01:00
Matthew Honnibal
afb94e5702
Add cli to setup.py
2017-03-18 19:00:39 +01:00
ines
387e34a3c5
Update plac version in requirements and setup
2017-03-18 15:14:02 +01:00
ines
4c53eed35a
Remove sputnik from dependencies and docs
2017-03-15 17:39:25 +01:00
ines
b62322d602
Add requests to requirements
2017-03-15 17:39:08 +01:00
Matthew Honnibal
cb39b6e337
Require recent thinc
2017-03-11 12:45:22 -06:00
Matthew Honnibal
93ab888d1d
Require recent preshed
2017-03-11 12:33:56 -06:00
Matthew Honnibal
0ed2afde89
Compile beam parser
2017-03-10 11:22:22 -06:00
ines
ffe0f0c6c4
Add dill to requirements
2017-03-08 14:11:54 +01:00
Aniruddha Adhikary
5a4fc09576
add basic Bengali support
2017-02-28 07:48:37 +06:00
Matthew Honnibal
c744ce4b6d
Fix bad change to cythonize.py script, re subprocess call
2017-02-16 19:01:25 +01:00
Matthew Honnibal
0836cbe064
Pass shell to cythonize.py. See Issue #791
2017-02-17 01:06:06 +11:00
Michael Wallin
73f66ec570
Add preliminary support for Finnish
2017-02-04 13:54:10 +02:00
Raphaël Bournhonesque
0c2e5539ce
Specify version number for ujson and plac
...
The required version was specified for plac in requirements.txt but not in setup.py, which could cause a conflicting version error.
Similarly, set the version of ujson in requirements.txt to be the same as in setup.py
2017-01-28 18:38:14 +01:00
Matthew Honnibal
48c712f1c1
Merge branch 'master' of ssh://github.com/explosion/spaCy
2017-01-16 13:18:06 +01:00
Matthew Honnibal
d4e6d4c1c4
Use new thinc
2017-01-16 13:17:14 +01:00
Ines Montani
a308703f47
Remove old tests
2017-01-13 01:34:48 +01:00
Ines Montani
f8803808ce
Remove old unused tests and conftest files
2017-01-12 15:09:05 +01:00
Ines Montani
26d018d874
Add tests for StringStore
2017-01-12 15:07:31 +01:00
Ines Montani
ffcaba9017
Remove old and/or redundant tests
2017-01-12 02:10:18 +01:00
Ines Montani
33800c9367
Rename "tokens" tests to "doc"
2017-01-11 18:59:01 +01:00
Matthew Honnibal
c9fdd9917c
Require older thinc
2017-01-09 10:12:41 -06:00
Matthew Honnibal
7108ad9d80
Require thinc 6.1
2017-01-09 14:37:00 +01:00
Matthew Honnibal
e4862d1dab
Merge branch 'develop'
2017-01-09 13:36:01 +01:00
Ines Montani
d87ca84028
Remove old website example tests from setup.py
2017-01-08 22:42:54 +01:00
Matthew Honnibal
af81ac8bb0
Use thinc 6.0
2016-12-29 11:58:42 +01:00
Gyorgy Orosz
35aa54765d
Hungarian module is exposed in spacy.
2016-12-21 20:45:36 +01:00
Magnus Burton
db5a077d2b
Initial commit for Swedish
2016-12-20 11:05:06 +01:00
Matthew Honnibal
0c7720e162
Remove unit and integration test packages
2016-12-19 00:26:56 +01:00
Matthew Honnibal
6c0c43c267
Add comment
2016-12-19 00:20:16 +01:00
Matthew Honnibal
b2cebdcca7
List more test packages in the setup.py
2016-12-19 00:15:11 +01:00
Matthew Honnibal
97521c95b3
List the language_data package in the setup.py
2016-12-19 00:14:09 +01:00
dafnevk
d8c7ac203a
Added nl module for dutch
2016-11-24 16:39:49 +01:00
Matthew Honnibal
36bcd46244
Integrate patch from @mikepb re building OpenMP-supporting wheels for macOS / OSX. I'm running blind on this, so this commit might not be 100%. Rollback if there are any problems. See Issue #267 .
2016-11-06 11:58:50 +01:00
Matthew Honnibal
bc8d04abc0
Package alpha es, fr, it and pt directories.
2016-11-04 20:02:53 +01:00
Adam Ever Hadani
452b766d82
added ujson dependency to setup.py
2016-10-20 14:57:18 -07:00
Matthew Honnibal
b5a74f8ad2
Don't automatically include a data/ directory.
2016-10-20 20:50:32 +02:00
Matthew Honnibal
811dc4da75
Fix setup.py script
2016-10-19 00:27:57 +02:00
Matthew Honnibal
818dc83e26
Fix encoding error in setup.py
2016-10-19 00:05:53 +02:00
Matthew Honnibal
509b30834f
Add a pipeline module, to collect and wrap processes for annotation
2016-10-16 01:47:12 +02:00
Matthew Honnibal
53d5bd62ee
Add the data/ directory as package data
2016-10-15 14:34:33 +02:00
Matthew Honnibal
2f998f8ed0
Require pathlib
2016-10-13 14:19:57 +02:00
Matthew Honnibal
7c5fe84b80
Require older preshed, for thinc compatibility.
2016-10-09 12:25:53 +02:00
Matthew Honnibal
d61feffe24
Require new preshed
2016-09-30 18:41:01 +02:00
Matthew Honnibal
24337175df
* Register zh package in setup.py
2016-05-03 14:36:59 +02:00
Henning Peters
2bf34687ea
add stdint.h fallback (vs 2008)
2016-04-28 22:10:43 +02:00
Henning Peters
bb3238bcdd
pin numpy to >=1.7, ship headers
2016-04-19 19:50:42 +02:00
Henning Peters
6215272786
remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels
2016-04-12 11:28:07 +02:00
Henning Peters
5f699883dd
make openmp on windows optional
2016-04-12 10:12:57 +02:00
SJ
91b3f1c12f
Enable OpenMP compiler option for MSVC
...
Enable OpenMP compiler option for MSVC to support Multi-Threading for nlp.pipe()
2016-04-09 15:22:17 -07:00
Henning Peters
29ad621825
add de
2016-04-08 14:52:29 +02:00
Matthew Honnibal
872695759d
Merge pull request #306 from wbwseeker/german_noun_chunks
...
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Wolfgang Seeker
5e2e8e951a
add baseclass DocIterator for iterators over documents
...
add classes for English and German noun chunks
the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Henning Peters
54f3447b5f
cleanup
2016-03-14 01:46:33 +01:00
Henning Peters
1fe29c6919
cleanup
2016-03-13 18:12:32 +01:00
Henning Peters
49f499ca1c
cleanup
2016-03-12 14:30:24 +01:00
Henning Peters
5701686272
cleanup
2016-03-12 13:47:10 +01:00
Wolfgang Seeker
03fb498dbe
introduce lang field for LexemeC to hold language id
...
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Wolfgang Seeker
d9312bc9ea
add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators
2016-03-09 16:18:48 +01:00
Henning Peters
5b3b3ebc8e
upgrade to latest sputnik
2016-03-08 15:30:17 +01:00
Matthew Honnibal
fcaa0ad7ce
Merge pull request #280 from wbwseeker/german_parser
...
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker
3448cb40a4
integrated pseudo-projective parsing into parser
...
- nonproj.pyx holds a class PseudoProjectivity which currently holds
all functionality to implement Nivre & Nilsson 2005's pseudo-projective
parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
structures
2016-03-01 10:09:08 +01:00
Henning Peters
12d58a7099
remove text-unidecode dependency
2016-02-24 08:01:59 +01:00
Henning Peters
9cc4f8d5b3
avoid shadowing __name__
2016-02-15 01:33:39 +01:00
Henning Peters
4c9e3c7911
upgrade spuntik, enforce data api via model version constraints
2016-02-14 16:03:17 +01:00
Henning Peters
3b5f1e753b
py26 compatibility
2016-02-10 14:32:54 +01:00
Henning Peters
c00dd43fe0
add sun data
2016-02-09 16:42:55 +01:00
Matthew Honnibal
860fd11e98
* Don't import include files --- use the repository
2016-02-06 23:59:47 +01:00
Matthew Honnibal
8bd16ce8f7
* Try to fix win32 compilation
2016-02-05 14:43:52 +01:00
Matthew Honnibal
add8f07f61
* Conditionally link against openmp, on not-darwin
2016-02-05 12:19:51 +01:00
Matthew Honnibal
c9aa91041d
* Don't expect openmp in options
2016-02-02 13:50:25 +01:00
Matthew Honnibal
490ba65398
* Use openmp in parser
2016-02-01 03:08:42 +01:00
Matthew Honnibal
9c34ca9e5d
* Add _stack to mod_names
2016-02-01 03:00:53 +01:00
Matthew Honnibal
bc0f0d284c
* Require different thinc version
2016-01-30 20:29:24 +01:00
Henning Peters
65aeac24cb
remove package version constraint
2016-01-21 17:40:51 +01:00
Henning Peters
211913d689
add about.py, adapt setup.py
2016-01-15 18:57:01 +01:00
Henning Peters
ccd87ad7fb
add default_model to about
2016-01-15 18:12:01 +01:00
Henning Peters
780cb847c9
add default_model to about
2016-01-15 18:07:15 +01:00
Henning Peters
788f734513
refactored data_dir->via, add zip_safe, add spacy.load()
2016-01-15 18:01:02 +01:00
Henning Peters
bc229790ac
integrate with sputnik
2016-01-13 19:46:17 +01:00
Matthew Honnibal
e38205a838
* Pin versions to ranges, to escape version lock
2015-12-31 02:09:55 +01:00
Henning Peters
1c4352c42e
bump version
2015-12-28 13:53:26 +01:00
Henning Peters
a404bfec38
bump preshed version
2015-12-22 22:38:25 +01:00
Henning Peters
46fe3a7327
bump thinc version
2015-12-22 13:21:24 +01:00
Henning Peters
1643e63c31
bump preshed version
2015-12-22 11:23:25 +01:00
Henning Peters
4a1d843682
bump murmurhash version
2015-12-21 21:59:11 +01:00
Henning Peters
74dc02a0e6
fix windows readme
2015-12-21 21:58:53 +01:00
Henning Peters
c17ce6c119
(re-)include cython sources, murmurhash header discovery
2015-12-21 12:40:44 +01:00
Henning Peters
b667020e81
refactor setup.py
2015-12-13 23:39:29 +01:00
Henning Peters
4f4b1d8f3d
refactor setup.py
2015-12-13 23:32:23 +01:00
Henning Peters
eaadca2bf2
get buildbot running
2015-12-13 14:13:46 +01:00
Henning Peters
73674a4afb
try using system-wide headers
2015-12-13 12:51:23 +01:00
Henning Peters
b2f66f7b8d
try using system-wide headers
2015-12-13 12:45:30 +01:00
Henning Peters
63d74ae8f3
try using system-wide headers
2015-12-13 12:41:46 +01:00
Henning Peters
92fabd0114
wrap virtualenv around cythonize
2015-12-13 12:32:22 +01:00
Henning Peters
ac318b568c
new approach to dependency headers
2015-12-13 11:49:17 +01:00
Matthew Honnibal
65413ad7b3
Merge pull request #186 from henningpeters/master
...
website build was broken for me, fixed it
2015-11-29 15:36:52 +11:00
Henning Peters
abe6162e7b
avoid redirect
2015-11-24 20:01:43 +01:00
Henning Peters
4e98ea4e41
bump version
2015-11-21 19:04:57 +01:00
Matthew Honnibal
d8c52560d1
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-11-19 11:00:11 +01:00
Matthew Honnibal
44e563d4e5
* Pin version of murmurhash
2015-11-19 10:59:51 +01:00
Matthew Honnibal
73d47c3010
Merge pull request #185 from henningpeters/sputnik
...
integrate sputnik
2015-11-19 20:59:09 +11:00
Matthew Honnibal
1e166eb9cd
* Upgrade spacy version
2015-11-18 17:42:56 +01:00
Henning Peters
919a4f0b04
change data path, add repository
2015-11-18 11:40:46 +01:00
Henning Peters
12de895e60
fix version
2015-11-15 16:38:16 +01:00
Matthew Honnibal
6dd37c5ee4
* Fix requirement of preshed
2015-11-08 18:09:21 +01:00
Matthew Honnibal
f9d20b1318
* Require updated thinc
2015-11-08 21:32:21 +11:00
Matthew Honnibal
3c162dcac3
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
2015-11-07 03:24:30 +11:00
Matthew Honnibal
c339783bbe
* Fix reference to tests.span in setup
2015-11-07 03:23:14 +11:00
Matthew Honnibal
802ad3d71a
* Avoid compiling theano module for now
2015-11-06 00:24:43 +11:00
Matthew Honnibal
3ddea19b2b
* Rename spans.pyx to span.pyx
2015-11-04 00:14:40 +11:00
Matthew Honnibal
9482d616bc
* Rename spans.pyx to span.pyx
2015-11-03 23:51:05 +11:00
Matthew Honnibal
f81389abe0
* Pin to specific cymem, preshed and thinc versions.
2015-11-03 23:12:13 +11:00
Matthew Honnibal
7adef3f831
* Increment version
2015-11-03 07:58:59 +01:00
Matthew Honnibal
64531d5a3a
* Define package_data in one place
2015-11-03 17:07:43 +11:00
Matthew Honnibal
5ca31e05fb
* Prune down package data, as models are distributed entirely within the data download.
2015-11-03 13:30:37 +11:00
Matthew Honnibal
f56209ef2e
* Update requirements
2015-11-03 02:40:01 +11:00
Matthew Honnibal
09e0b15629
* Package tests, for distriution in PyPi
2015-10-26 00:30:33 +11:00
Matthew Honnibal
b0ba534d4a
* Fix license descriptor in setup.py
2015-10-26 00:16:37 +11:00
Matthew Honnibal
9ee1ddab7e
* Increment version
2015-10-23 02:04:48 +02:00
Matthew Honnibal
108138366f
* Ensure .pxd files are packaged
2015-10-23 01:57:03 +02:00
Matthew Honnibal
2348a08481
* Load/dump strings with a json file, instead of the hacky strings file we were using.
2015-10-22 21:13:03 +11:00
Matthew Honnibal
579670e4c7
* Fix uget
2015-10-19 17:23:33 +11:00
Matthew Honnibal
984775e5e2
* Fix setup of uget
2015-10-19 17:19:05 +11:00
Matthew Honnibal
e25adce54d
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-10-19 17:17:33 +11:00
Matthew Honnibal
382cbc8cab
* Add uget to setup.py
2015-10-19 17:15:40 +11:00
Matthew Honnibal
a43777cef8
* Inc version
2015-10-19 07:46:42 +02:00
Henning Peters
bfde91fa49
add custom download tool (uget), replace wget with uget
2015-10-18 12:35:04 +02:00
Matthew Honnibal
fc261195f7
* Fix compilation for OSX
2015-10-18 17:19:07 +11:00
Matthew Honnibal
710e8fb168
* Fix platform condition re Issue #138
2015-10-15 20:46:08 +11:00
maxirmx
1b8fd329b8
Merge remote-tracking branch 'refs/remotes/honnibal/master'
2015-10-13 11:28:17 +03:00
Matthew Honnibal
d74a1e51d7
* Add cloudpickle requirement
2015-10-13 19:05:20 +11:00
maxirmx
3dbec0902f
Merge remote-tracking branch 'refs/remotes/honnibal/master'
...
Conflicts -- pushing preshed 0.42
requirements.txt
setup.py
2015-10-13 10:16:16 +03:00
maxirmx
237db7f519
Appveyor build #5
...
Added Wordnet download
2015-10-13 10:11:56 +03:00
Matthew Honnibal
41cbbdefe3
Merge branch 'attrs'
2015-10-13 05:03:25 +02:00
Matthew Honnibal
1ca1beff4b
* Allow preshed v0.42 in setup.py
2015-10-13 13:55:50 +11:00
Matthew Honnibal
b866f1443e
Merge branch 'master' of https://github.com/honnibal/spaCy into attrs
2015-10-13 04:52:27 +02:00