Commit Graph

2903 Commits

Author SHA1 Message Date
ines
e7f95c37ee Merge base tokenizer exceptions 2017-05-08 15:55:52 +02:00
ines
24606d364c Remove redundant language_data.py files in languages
Originally intended to collect all components of a language, but just
made things messy. Now each component is in charge of exporting itself
properly.
2017-05-08 15:55:29 +02:00
ines
a627d3e3b0 Reorganise Chinese language data 2017-05-08 15:54:36 +02:00
ines
7b86ee093a Reorganise Swedish language data 2017-05-08 15:54:29 +02:00
ines
50510fa947 Reorganise Portuguese language data 2017-05-08 15:52:01 +02:00
ines
279895ea83 Reorganise Dutch language data 2017-05-08 15:51:39 +02:00
ines
04ef5025bd Reorganise Norwegian language data 2017-05-08 15:51:22 +02:00
ines
5edbc725d8 Reorganise Japanese language data 2017-05-08 15:50:46 +02:00
ines
51a389d3bb Reorganise Italian language data 2017-05-08 15:50:17 +02:00
ines
1bbfa14436 Reorganise Hungarian language data 2017-05-08 15:49:56 +02:00
ines
a77c9fc60d Reorganise Hebrew language data 2017-05-08 15:49:28 +02:00
ines
7f05e977fa Reorganise French language data 2017-05-08 15:49:05 +02:00
ines
0207ffdd52 Reorganise Finnish language data 2017-05-08 15:48:31 +02:00
ines
8e483ec950 Reorganise Spanish language data 2017-05-08 15:48:04 +02:00
ines
c7c21b980f Reorganise English language data 2017-05-08 15:47:25 +02:00
ines
1bf9d5ec8b Reorganise German language data 2017-05-08 15:44:26 +02:00
ines
7b3a983f96 Reorganise Bengali language data 2017-05-08 15:43:50 +02:00
ines
607ba458e7 Fix whitespace 2017-05-08 15:42:31 +02:00
ines
60db497525 Add update_exc and expand_exc to util
Doesn't require separate language data util anymore
2017-05-08 15:42:12 +02:00
ines
6e5bd4f228 Remove unused functions from deprecated 2017-05-08 15:40:16 +02:00
ines
f68e420bc0 Add PRON_LEMMA and DET_LEMMA to deprecated
Will be replaced with proper values across the language data later.
2017-05-08 15:35:30 +02:00
ines
bd6a7cf4f6 Simplify deprecated model downloading
Only relevant for spaCy < v1.7.0.
2017-05-08 15:32:10 +02:00
ines
95edd9e896 Let parse_package_meta take full path 2017-05-08 15:30:48 +02:00
ines
326746eb15 Add util function to resolve arg to model path
1. check if in data dir or shortcut link
2. check if installed as a pip package
3. check if string is path to model
4. check if Path or Path-like object
2017-05-08 15:29:47 +02:00
ines
a7801e7342 Update spacy.load()
path argument is now deprecated and name can either take a model name
or path. Implement lazy loading by importing module and read Language
class name off __all__.
2017-05-08 15:27:25 +02:00
ines
94697e9afc Fix typo 2017-05-08 02:00:37 +02:00
ines
0ee2a22b67 Merge branch 'pr/1024' into develop 2017-05-08 01:12:44 +02:00
ines
c4492d260a Fix kwargs 2017-05-08 01:05:24 +02:00
ines
b5a726c5cd Tidy up deprecated.py 2017-05-07 23:29:22 +02:00
ines
59c3b9d4dd Tidy up CLI and fix print functions 2017-05-07 23:25:29 +02:00
ines
311704674d Add path2str compat function 2017-05-07 23:24:56 +02:00
ines
e34069db9f Move is_package and get_model_package_path to util 2017-05-07 23:24:51 +02:00
ines
957ba676b4 Add model files base path to about.py 2017-05-07 23:22:35 +02:00
ines
8d8dd9ceb2 Don't set default value for model 2017-05-07 23:22:21 +02:00
ines
b1f22c5a10 Fix formatting 2017-05-03 20:11:02 +02:00
ines
a04b5be1b2 Add glossary for annotation scheme (closes #1034)
Can be imported as explain from spacy.glossary, or called as
spacy.explain(term)
2017-05-03 17:02:17 +02:00
Gregory Howard
929f2792a7 Rennaming cls in module. cls is now a class 2017-05-03 15:41:07 +02:00
Gregory Howard
0e8c41ea4f Adding method lemmatizer for every class 2017-05-03 12:14:42 +02:00
Gregory Howard
32ca07989e adding export japanese 2017-05-03 11:07:29 +02:00
Grégory Howard
f9d7144224 Merge branch 'master' into master 2017-05-03 11:04:51 +02:00
Gregory Howard
f2ab7d77b4 Lazy imports language 2017-05-03 11:01:42 +02:00
Ines Montani
3ea23a3f4d Fix formatting 2017-05-03 09:44:38 +02:00
Ines Montani
d730eb0c0d Raise custom ImportError if importing janome fails 2017-05-03 09:43:29 +02:00
Ines Montani
949ad6594b Add newline 2017-05-03 09:38:43 +02:00
Ines Montani
d12ca587ea Add newline 2017-05-03 09:38:29 +02:00
Ines Montani
8676cd0135 Add newline 2017-05-03 09:38:07 +02:00
Yasuaki Uechi
c8f83aeb87 Add basic japanese support 2017-05-03 13:56:21 +09:00
Gregory Howard
c0afcd22bb Merge remote-tracking branch 'remotes/upstream/master' 2017-04-27 14:42:54 +02:00
Matthew Honnibal
31ec9e1371 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-27 13:21:39 +02:00
Matthew Honnibal
2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Gregory Howard
92f368f83b Removing extra spaces 2017-04-27 12:02:14 +02:00
Gregory Howard
13b6957c8e Adding unitest for tokenization in french (with title) 2017-04-27 11:53:44 +02:00
Gregory Howard
8ff4682255 correcting tokenizer exception.
Adding tests for lemmatization
2017-04-27 11:52:14 +02:00
Ines Montani
7da9cefd25 Merge pull request #1022 from luvogels/master
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
c9e592ae6c Add newline 2017-04-27 11:15:41 +02:00
Ines Montani
5942adccc2 Add newline 2017-04-27 11:15:19 +02:00
Ines Montani
4cd9269aef Add newline 2017-04-27 11:15:04 +02:00
Ines Montani
ccf13ecc21 Add newline 2017-04-27 11:14:42 +02:00
Ines Montani
03d2b0cc05 Add newline 2017-04-27 11:14:26 +02:00
Gregory Howard
44cb486849 Adding unitest for tokenization in french (with title) 2017-04-27 10:59:38 +02:00
Gregory Howard
ad8129cb45 Improvement of rules now title insentive and have same declaration format 2017-04-27 10:23:56 +02:00
luvogels
d12a0b6431 Hooked up tokenizer tests 2017-04-26 23:21:41 +02:00
Matthew Honnibal
f0e1606d27 Increment version 2017-04-26 20:25:41 +02:00
luvogels
b331929a7e Merge branch 'master' of https://github.com/luvogels/spaCy 2017-04-26 19:15:48 +02:00
luvogels
8de59ce3b9 Added tokenizer tests 2017-04-26 19:10:18 +02:00
Matthew Honnibal
4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
Matthew Honnibal
24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang
460094bf09 Update __init__.py 2017-04-26 18:27:55 +02:00
ines
527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Gregory Howard
ed5f094451 Adding insensitive lemmatisation test 2017-04-25 18:07:02 +02:00
ghoward
26e31afc18 renamming tests 2017-04-25 17:46:01 +02:00
ghoward
c085c2d391 Adding some unitests 2017-04-25 17:44:16 +02:00
ghoward
55c6910f90 Look_up table for languages in spacy.
Need to find an another name for lemmatizerlookup. I was not inspired.
Trying to uses new files in fr language.
2017-04-24 16:39:00 +02:00
Matthew Honnibal
c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal
65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00
Matthew Honnibal
70a43858e1 Fix flakey test 2017-04-24 00:06:30 +02:00
Matthew Honnibal
3973af2d15 Make training test less flakey 2017-04-23 22:59:34 +02:00
Matthew Honnibal
4f9657b42b Fix reporting if no dev data with train 2017-04-23 22:27:10 +02:00
Matthew Honnibal
df2ac8b843 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 21:25:07 +02:00
Matthew Honnibal
d0e19267e8 Create directory if missing in save_to_directory 2017-04-23 21:24:43 +02:00
ines
42305bc519 Remove unnecessary test 2017-04-23 21:21:41 +02:00
ines
012ea594d1 Add file for misc tests 2017-04-23 21:06:51 +02:00
ines
83f66947dc Rename test_download to test_cli 2017-04-23 21:06:50 +02:00
ines
401045433c Simplify compat.fix_text 2017-04-23 21:06:50 +02:00
Matthew Honnibal
e033c86a64 Increment version 2017-04-23 21:03:43 +02:00
Matthew Honnibal
d2436dc17b Update fix for Issue #999 2017-04-23 18:14:37 +02:00
Matthew Honnibal
874a3cbb07 Add test for Issue #955 2017-04-23 17:57:01 +02:00
Matthew Honnibal
60703cede5 Ensure noun chunks can't be nested. Closes #955 2017-04-23 17:56:39 +02:00
Matthew Honnibal
c9ec24b257 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-23 17:07:46 +02:00
Matthew Honnibal
5d8af40445 Add test for Issue #999 2017-04-23 17:06:30 +02:00
Matthew Honnibal
4d2a659c52 Fix json dump for Python3 2017-04-23 17:05:53 +02:00
Matthew Honnibal
040751ad17 Remove xfail on Test #910 2017-04-23 16:28:55 +02:00
ines
3a9710f356 Pass dev_scores to print_progress correctly (resolves #1008)
Only read scores attribute if command is used with dev_data, otherwise
default dev_scores to empty dict.
2017-04-23 15:58:40 +02:00
Matthew Honnibal
1b12f342e4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-20 17:03:11 +02:00
Matthew Honnibal
4eef200bab Persist the actions within spacy.parser.cfg 2017-04-20 17:02:44 +02:00
ines
25c70b4cc5 Move fix_text to spacy.compat (see #1002) 2017-04-20 15:47:17 +02:00
Ines Montani
60b5243bee Merge pull request #1002 from oroszgy/model_cli_fix
Fixes for the `model` CLI
2017-04-20 15:41:03 +02:00
Gyorgy Orosz
4a06a2572c Using ftfy for handling broken encoded strings. 2017-04-20 13:34:51 +02:00
Ines Montani
3800b29046 Merge pull request #1001 from recognai/master
Add SPACE to es tag map
2017-04-20 12:16:34 +02:00
oeg
f0bcd0babb fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing 2017-04-20 11:36:24 +02:00