ines
fe0ff00fe1
Fix spacing
2017-03-19 11:55:37 +01:00
ines
5712da6095
Add regression test for #891
2017-03-19 11:48:01 +01:00
Raphaël Bournhonesque
7f579ae834
Remove duplicate keys in [en|fi] data dicts
2017-03-19 11:40:29 +01:00
ines
8de5108af6
Exclude common cache directories from mode list in cli.info
...
This means models called "cache" etc. won't show up in the list, but it
seems worth it.
2017-03-19 01:44:43 +01:00
Matthew Honnibal
6ee2ea1128
Increment version
2017-03-19 01:40:52 +01:00
Matthew Honnibal
797f286c38
Use import to find data package
2017-03-19 01:39:36 +01:00
Matthew Honnibal
5941fb9e92
Make spacy/data a package
2017-03-18 20:04:22 +01:00
Matthew Honnibal
bc10d06bc2
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-18 19:32:54 +01:00
Matthew Honnibal
583628c350
Import metadata into __init__
2017-03-18 19:30:03 +01:00
Matthew Honnibal
1754e0db9b
Call pip via subprocess, to make it use virtualenv
2017-03-18 19:29:36 +01:00
ines
1277abcde2
Remove print statement
2017-03-18 19:14:58 +01:00
Matthew Honnibal
dcec104643
Remove unused import
2017-03-18 18:57:45 +01:00
Matthew Honnibal
703eb7bdbd
Fix link module
2017-03-18 18:57:31 +01:00
Matthew Honnibal
f6c6c89546
Add empty data directory
2017-03-18 18:32:29 +01:00
ines
7d33104180
Use distutils.sysconfig.get_python_lib
...
site.getsitepackages seems to not work as expected in Python 2
2017-03-18 18:20:40 +01:00
Matthew Honnibal
1a53fcc685
Fix CLI for Python 2
2017-03-18 18:14:03 +01:00
ines
aefb898e37
Add title-case version of morph rules ( resolves #686 )
2017-03-18 17:27:11 +01:00
ines
64ec17abc1
Pass xpassing tests and add xfails for failures
2017-03-18 17:20:46 +01:00
ines
d0b85faf69
Pass regression test for #401 ( resolves #401 )
...
Fixed in new English models.
2017-03-18 17:06:49 +01:00
ines
be9daefbdd
Remove actual model downloading from tests
2017-03-18 17:01:10 +01:00
ines
850650221a
Use correct command in deprecated download command message
2017-03-18 17:01:01 +01:00
ines
0dd7710556
Make sure paths are paths
2017-03-18 16:48:52 +01:00
Matthew Honnibal
de0e6385b4
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-18 16:17:28 +01:00
Matthew Honnibal
fe442cac53
Fix #717 : Set correct lemma for contracted verbs
2017-03-18 16:16:10 +01:00
ines
ad934a9abd
Add regression test for #693
2017-03-18 16:12:30 +01:00
ines
f57c616830
Add regression test for #704 and test new model ( resolves #704 )
...
(using new English model)
2017-03-18 16:04:14 +01:00
Matthew Honnibal
413138de79
Fix #719 : Lemmatizer can no longer output empty string
2017-03-18 16:02:06 +01:00
ines
ab1451f997
Don't mark compatibility test as slow
2017-03-18 15:17:39 +01:00
ines
ec3e810662
Add directory cli and set up command line interface
2017-03-18 15:14:48 +01:00
ines
cd94ea1095
Use info module for spacy.info()
2017-03-18 13:01:26 +01:00
ines
e3e25c0a33
Add spacy.info module
...
Print info about spaCy installation, local setup and models. Allow
export in Markdown format to copy-paste into GitHub issues.
2017-03-18 13:01:16 +01:00
ines
0eafc0f2c6
Add util functions to print data as table or markdown list
2017-03-18 13:00:14 +01:00
ines
6b9b444065
Fix imports
2017-03-18 12:59:41 +01:00
ines
a035ebd32a
Use pathlib.Path instead of os.path
2017-03-18 12:59:21 +01:00
ines
9605cf39cc
Handle default path in Language classes
2017-03-18 12:58:45 +01:00
Matthew Honnibal
ac4b88cce9
Fix auto-linking in download command
2017-03-17 21:36:13 +01:00
ines
8a34c3e666
Fix shortcut name
2017-03-17 20:07:34 +01:00
Matthew Honnibal
6420f86f02
Merge changes to __init__.py
2017-03-17 19:51:45 +01:00
ines
e01fbacf81
Update resolve_model_name
2017-03-17 19:26:28 +01:00
ines
aedefef49d
Add function to resolve model names and link them
2017-03-17 18:47:05 +01:00
Matthew Honnibal
d013aba7b5
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-17 18:30:53 +01:00
Matthew Honnibal
854cfce7cf
Make vocabs more compatible across versions
...
Previously, symbols were inserted into the string-store
before strings were loaded. This meant that adding a symbol
would invalidate saved models. We now make sure that strings
are loaded faithfully, so that compatibility is maintained.
2017-03-17 18:29:04 +01:00
Matthew Honnibal
1cc841e600
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-17 08:18:11 -05:00
Matthew Honnibal
4bfc55b532
Auto-add words to vocab when loading vectors
...
When calling vocab.load_vectors_from_bin_loc, ensure that missing
entries are added to the vocab. Otherwise, loading vectors into an
empty vocab object resulted in no vectors being added.
2017-03-17 08:15:59 -05:00
ines
0e533ad0cc
Mark compatibility table test as slow (temporary)
...
Prevent Travis from running test test until models repo is published
2017-03-17 13:11:36 +01:00
ines
279b1d1965
Update version
2017-03-17 12:43:08 +01:00
ines
8af4b9e4df
Fix compatibility.json link
2017-03-17 12:43:03 +01:00
Matthew Honnibal
a630726b13
Fix typo in tests
2017-03-16 20:50:36 -05:00
Matthew Honnibal
f98b30583f
Fix tests
2017-03-16 19:48:00 -05:00
Matthew Honnibal
db51abf685
Fix tests
2017-03-16 18:53:47 -05:00
Matthew Honnibal
adb0b7e43b
Fix loading when no package found
2017-03-16 18:30:23 -05:00
Matthew Honnibal
5c66cffafd
Add tag map for Spanish
2017-03-16 18:05:15 -05:00
Matthew Honnibal
c4351e1165
Update base-form check in lemmatizer, for UD 2.0 morphology
2017-03-16 17:59:31 -05:00
Matthew Honnibal
1e10383e1b
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-16 17:41:13 -05:00
Matthew Honnibal
859315863a
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-16 17:40:07 -05:00
Matthew Honnibal
fea9fe08af
Merge pull request #866 from juanmirocks/master
...
Fix lemmatization of OOV words
2017-03-16 23:37:36 +01:00
Matthew Honnibal
ffd4a19383
Increment version
2017-03-16 17:35:57 -05:00
Matthew Honnibal
28bb546939
Merge pull request #883 from ericzhao28/master
...
Add `lower_` and `upper_` properties to `Span` class
2017-03-16 23:35:47 +01:00
ines
fd60961825
Fix spacing
2017-03-16 23:23:26 +01:00
Matthew Honnibal
890747d8ff
Fix trailing whitespace on morphology features
2017-03-16 17:07:37 -05:00
Matthew Honnibal
af41a9790c
Merge remote-tracking branch 'origin/develop-downloads'
2017-03-16 20:41:37 +01:00
Matthew Honnibal
303a56f173
Get absolute path for linking
2017-03-16 20:41:23 +01:00
ines
3d484c3faf
Don't print in parse_package_meta and accept on_erro callback instead
...
TODO: log warning for missing meta data in spacy.link, as this affects
the Language class returned by spacy.load()
2017-03-16 20:34:50 +01:00
ines
d8c984b65e
Don't exit if no model meta data is present
2017-03-16 20:33:33 +01:00
Matthew Honnibal
2524efc0ac
Merge remote-tracking branch 'origin/develop-downloads'
2017-03-16 20:20:41 +01:00
ines
8253581057
Link model automatically if not direct download
2017-03-16 19:54:51 +01:00
Matthew Honnibal
8843b84bd1
Merge remote-tracking branch 'origin/develop-downloads'
2017-03-16 12:00:42 -05:00
Matthew Honnibal
55f813bfbb
Don't reapply the model during training
2017-03-16 11:59:43 -05:00
Matthew Honnibal
c90dc7ac29
Clean up state initiatisation in transition system
2017-03-16 11:59:11 -05:00
Matthew Honnibal
a46933a8fe
Clean up FTRL parsing stuff.
2017-03-16 11:58:20 -05:00
ines
618ce3b425
Add .meta to Language object
...
Allows getting the current model's meta data, e.g.:
nlp = spacy.load('my-model')
print(nlp.meta)
2017-03-16 17:14:56 +01:00
ines
e348d4434c
Add spacy.info(model_name) to show model meta
...
Allows "previewing" model before loading and making sure it's linked
correctly.
2017-03-16 17:13:40 +01:00
ines
eea3b35e3f
Update model loading to support links
...
Remove match_best_version check, fetch model language from meta instead
of directory name, and don't make too many assumptions – if model is
downloaded via downloader, version should match anyway. (Otherwise,
users should be free to add and load whichever models they want.)
2017-03-16 17:13:08 +01:00
ines
5f3f04bd0a
Add util function to load and parse package meta.json
2017-03-16 17:10:05 +01:00
ines
7f920c2f75
Don't break text in when rendering print_msg
2017-03-16 17:09:50 +01:00
ines
16a63d9676
Add docstring
2017-03-16 17:09:11 +01:00
ines
68c04fa897
Move sys_exit() function to util
2017-03-16 17:08:58 +01:00
ines
ccd1a79988
Add spacy.link module to link model directories to shortcuts
2017-03-16 17:01:51 +01:00
Matthew Honnibal
2611ac2a89
Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens
2017-03-16 09:38:28 -05:00
ines
595d89698a
Add basestring
2017-03-16 10:01:14 +01:00
ines
7b2eca36e4
Revert "Fix formatting and remove unused code"
...
This reverts commit d7898d586f
.
2017-03-16 09:58:41 +01:00
ines
2f0db1dd36
Use small English model as default
2017-03-16 09:54:40 +01:00
Matthew Honnibal
3d0833c3df
Fix off-by-1 in parse features fill_context
2017-03-15 19:55:35 -05:00
Matthew Honnibal
4ef68c413f
Approximate cost in Break transition, to speed things up a bit.
2017-03-15 16:40:27 -05:00
Matthew Honnibal
8543db8a5b
Use ftrl optimizer in parser
2017-03-15 11:56:37 -05:00
ines
4cfc8ffbd2
Reformat pickle tests
2017-03-15 17:39:54 +01:00
ines
2a0fcf1354
Add tests for new download module
2017-03-15 17:39:43 +01:00
ines
71956c94db
Handle deprecated language-specific model downloading
2017-03-15 17:37:55 +01:00
ines
58b884b6d4
Refactor download script and about.py to use new download method
2017-03-15 17:37:18 +01:00
ines
f5d1a39a5b
Add util functions for printing and wrapping messages
2017-03-15 17:35:57 +01:00
ines
d7898d586f
Fix formatting and remove unused code
2017-03-15 17:35:41 +01:00
ines
b672e95045
Fix formatting
2017-03-15 17:35:04 +01:00
ines
0474e706a0
Remove unused deprecated functions for sputnik
2017-03-15 17:34:54 +01:00
ines
b13e7f79b4
Fix formatting and remove unused imports
2017-03-15 17:33:57 +01:00
ines
1101fd3855
Fix formatting and remove unused imports
2017-03-15 17:33:39 +01:00
ines
842782c128
Move fix_deprecated_glove_vectors_loading to deprecated.py
2017-03-15 17:33:29 +01:00
Matthew Honnibal
4cab8ac136
Update morph exceptions test
2017-03-15 09:31:34 -05:00
Matthew Honnibal
d719f8e77e
Use nogil in parser, and set L1 to 0.0 by default
2017-03-15 09:31:01 -05:00
Matthew Honnibal
c61c501406
Update beam-parser to allow parser to maintain nogil
2017-03-15 09:30:22 -05:00
Matthew Honnibal
3d4e389d23
Whitespace
2017-03-15 09:29:42 -05:00
Matthew Honnibal
7769bc31e3
Add beam-search classes
2017-03-15 09:27:41 -05:00
Matthew Honnibal
c79b3129e3
Fix setting of empty lexeme in initial parse state
2017-03-15 09:26:53 -05:00
Matthew Honnibal
d864708072
Add more morphology names in attrs.pyx
2017-03-15 09:26:16 -05:00
Matthew Honnibal
b382dc902c
Add morph rules in Language
2017-03-15 09:24:40 -05:00
Matthew Honnibal
8dbff4f5f4
Wire up English lemma and morph rules.
2017-03-15 09:23:22 -05:00
Matthew Honnibal
f70be44746
Use lemmatizer in code, not from downloaded model.
2017-03-15 04:52:50 -05:00
ines
42ba740dde
Revert "Merge branch 'debug'"
...
This reverts commit 89b79d1178
, reversing
changes made to 02bdf490a1
.
2017-03-13 20:11:52 +01:00
ines
4c5f51e49e
Update regression test
2017-03-13 15:16:11 +01:00
ines
02bdf490a1
Remove regression test to see if it caused pytest Travis error
2017-03-13 13:00:22 +01:00
ines
17018750ac
Add regression test for #717
2017-03-13 12:58:22 +01:00
ines
2883ebfca2
Remove print statement
2017-03-13 12:30:42 +01:00
ines
98c13d8aa9
Add regression test for #401
2017-03-13 12:28:41 +01:00
ines
444d665f9d
Add regression test for #686
2017-03-13 12:23:35 +01:00
ines
46b17e5b51
Add regression test for #719
2017-03-13 12:17:35 +01:00
ines
c8ae682ff9
Add regression test for #636
2017-03-13 12:08:31 +01:00
ines
337f9601f2
Add missing unicode declaration
2017-03-13 12:08:19 +01:00
ines
d70386ec6e
Update docstring in #886 regression test
2017-03-13 12:00:38 +01:00
ines
51ba3ef0a8
Add regression test for #886
2017-03-13 11:44:58 +01:00
ines
eec3f21c50
Add WordNet license
2017-03-12 13:58:24 +01:00
ines
f9e603903b
Rename stop_words.py to word_sets.py and include more sets
...
NUM_WORDS and ORDINAL_WORDS are currently not used, but the hard-coded
list should be removed from orth.pyx and replaced to use
language-specific functions. This will later allow other languages to
use their own functions to set those flags. (In English, this is easier
because it only needs to be checked against a set – in German for
example, this requires a more complex function, as most number words
are one word.)
2017-03-12 13:58:22 +01:00
ines
f24f9b4b7b
Remove unused code
2017-03-12 13:58:22 +01:00
ines
1da29a7146
Use new Lemmatizer data and remove file import
...
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
2017-03-12 13:58:22 +01:00
ines
0957737ee8
Add Python-formatted lemmatizer data and rules
2017-03-12 13:58:22 +01:00
ines
c89e30d1a3
Add test for English time exceptions ("1a.m." etc.)
2017-03-12 13:58:22 +01:00
ines
ce9568af84
Move English time exceptions ("1a.m." etc.) and refactor
2017-03-12 13:58:22 +01:00
ines
6b30541774
Fix formatting
2017-03-12 13:58:22 +01:00
Ines Montani
e97a30b99a
Merge pull request #885 from PySUST/master
...
[Bengali] Spell checked and add new stop words
2017-03-12 13:20:59 +01:00
ines
66c1f194f9
Use consistent unicode declarations
2017-03-12 13:07:28 +01:00
shuvanon
91cb4cdb2b
Sort stop_words
2017-03-12 17:55:51 +06:00
shuvanon
784f6cfa49
Update stop_words
2017-03-12 17:41:01 +06:00
shuvanon
73cc17078e
Merge branch 'master' of https://github.com/PySUST/spaCy
2017-03-12 14:52:17 +06:00
shuvanon
35ec7135bb
Spell checked and add new stop words
2017-03-12 14:51:34 +06:00
Em
9c809efc25
Removed mapStr
2017-03-11 16:23:26 -08:00
Matthew Honnibal
fa23278ee3
Add classes for beam parser and beam NER
2017-03-11 12:45:37 -06:00
Matthew Honnibal
6c4108c073
Add header for beam parser
2017-03-11 12:45:12 -06:00
Matthew Honnibal
4382f175b3
Squelch compiler warnings
2017-03-11 12:44:43 -06:00
Matthew Honnibal
ea2592879f
Merge branch 'master' of https://github.com/explosion/spaCy
2017-03-11 11:13:37 -06:00
Matthew Honnibal
1224c4d3c6
Improve output on trainer
2017-03-11 11:12:48 -06:00
Matthew Honnibal
b438dfd3f3
Add itn argument to tagger.update
2017-03-11 11:12:21 -06:00
Matthew Honnibal
931feb3360
Allow beam parsing for NER
2017-03-11 11:12:01 -06:00
Matthew Honnibal
f77a5bb60a
Switch back to greedy parser
2017-03-11 11:11:30 -06:00
Matthew Honnibal
ca9c8c57c0
Add iteration argument to parser.update
2017-03-11 07:00:47 -06:00
Matthew Honnibal
dcce9ca3f3
Use beam parser
2017-03-11 07:00:20 -06:00
Matthew Honnibal
e30ffdd003
Use ftrl optimizer in tagger
2017-03-11 06:59:13 -06:00
Matthew Honnibal
d59c6926c1
I think this fixes the segfault
2017-03-11 06:58:34 -06:00
Matthew Honnibal
318b9e32ff
WIP on beam parser. Currently segfaults.
2017-03-11 06:19:52 -06:00
Em
426d17167f
Added string manipulation for spans
2017-03-10 16:50:02 -08:00
Matthew Honnibal
b0d80dc9ae
Update name of 'train' function in BeamParser
2017-03-10 14:35:43 -06:00
Matthew Honnibal
d11f1a4ddf
Record negative costs in non-monotonic arc eager oracle
2017-03-10 11:22:04 -06:00
Matthew Honnibal
ecf91a2dbb
Support beam parser
2017-03-10 11:21:21 -06:00