Matthew Honnibal
c0909afe22
Merge pull request #312 from wbwseeker/space_head_bug
...
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Wolfgang Seeker
289b10f441
remove some comments
2016-04-14 15:37:51 +02:00
Matthew Honnibal
fe9299a118
* Fix long-standing issue with coarse-grained tags: proper nouns weren't receiving the PROPN tag, and personal pronouns weren't receiving the PRON tag. This should fix Issue #191 , and also Issue #325 , which reported that proper nouns were being lemmatized using the common noun policies. This lemmatization will be prevented if the universal tag is PROPN, not NOUN, as no lemmatization rules are loaded for the PROPN tag.
2016-04-14 12:46:43 +02:00
Matthew Honnibal
6f82065761
* Fix infixed commas in tokenizer, re Issue #326 . Need to benchmark on empirical data, to make sure this doesn't break other cases.
2016-04-14 11:36:03 +02:00
Matthew Honnibal
0f957dd586
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-04-14 10:37:56 +02:00
Matthew Honnibal
108aca0e50
* Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping.
2016-04-14 10:37:39 +02:00
Matthew Honnibal
61d20de35d
* Fix language.py docstring
2016-04-14 10:36:57 +02:00
Wolfgang Seeker
d99a9cbce9
different handling of space tokens
...
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal
04d0209be9
* Recognise multiple infixes in a token.
2016-04-13 18:38:26 +10:00
Henning Peters
a473d6e937
fix tests (use english model)
2016-04-12 16:41:57 +02:00
Henning Peters
f2d011c034
avoid polluting spacy namespace with lang classes
2016-04-12 16:31:16 +02:00
Henning Peters
ff690f76ba
fix loading non-german models
2016-04-12 16:00:56 +02:00
Henning Peters
6215272786
remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels
2016-04-12 11:28:07 +02:00
Henning Peters
5f699883dd
make openmp on windows optional
2016-04-12 10:12:57 +02:00
Matthew Honnibal
6df3858dbc
* Fix Issue #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.
2016-04-12 13:17:59 +10:00
Wolfgang Seeker
d328e0b4a8
Merge branch 'master' into space_head_bug
2016-04-11 12:11:01 +02:00
Henning Peters
13a6899fc6
Merge pull request #329 from sjjpo2002/patch-1
...
Enable OpenMP compiler option for MSVC
2016-04-10 09:45:08 +02:00
SJ
91b3f1c12f
Enable OpenMP compiler option for MSVC
...
Enable OpenMP compiler option for MSVC to support Multi-Threading for nlp.pipe()
2016-04-09 15:22:17 -07:00
Wolfgang Seeker
80bea62842
bugfix in unit test
2016-04-08 16:46:44 +02:00
Henning Peters
72e0de7330
Merge branch 'master' of github.com:spacy-io/spaCy
2016-04-08 14:52:38 +02:00
Henning Peters
29ad621825
add de
2016-04-08 14:52:29 +02:00
Henning Peters
7c4dde3a1b
Update package.json
2016-04-08 14:48:47 +02:00
Wolfgang Seeker
be4903a1b2
update version numbers
2016-04-08 13:54:05 +02:00
Wolfgang Seeker
f9150ccf2a
rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip
2016-04-08 13:38:07 +02:00
Wolfgang Seeker
1fe911cdb0
bigfix
2016-04-07 18:19:51 +02:00
Matthew Honnibal
872695759d
Merge pull request #306 from wbwseeker/german_noun_chunks
...
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Matthew Honnibal
c628908479
* Pin Cython to <0.24, until we fix for new version
2016-04-07 11:51:53 +10:00
Matthew Honnibal
85485f5c2b
Fix inconsistencies in generate_specials.py
...
Re Issue #321 , fix inconsistencies in the script that generates specials.json. The result still isn't so satisfying --- we need to revise this as we move to parse more morphologically rich languages.
2016-04-07 11:21:52 +10:00
Henning Peters
357e2aaece
Merge branch 'master' of github.com:spacy-io/spaCy
2016-04-05 11:26:22 +02:00
Henning Peters
470cdf5bf9
remove deprecated LOCAL_DATA_DIR
2016-04-05 11:25:54 +02:00
Henning Peters
67a2bd2197
Update README.rst
2016-04-01 20:20:22 +02:00
Ines Montani
54ae410bed
Update GitHub links
2016-04-01 02:23:52 +11:00
Ines Montani
5f2654c30d
Fix main container flex properties
2016-04-01 02:23:42 +11:00
Ines Montani
53a45995c4
Update readme
2016-04-01 01:30:19 +11:00
Ines Montani
1f8309a862
Replace website with new version
2016-04-01 01:24:48 +11:00
Ines Montani
f321272bee
Update gitignore for website
2016-04-01 00:36:56 +11:00
Wolfgang Seeker
a8f4e49900
update init_model.py to previous (better) state
2016-03-29 16:12:13 +02:00
Matthew Honnibal
26622f0ffc
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-03-29 14:31:52 +11:00
Matthew Honnibal
b1fe41b45d
* Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment.
2016-03-29 14:31:05 +11:00
Matthew Honnibal
9c73983bdd
* Add test for hyphenation problem in Issue #302
2016-03-29 14:27:13 +11:00
Matthew Honnibal
d249e2f7f3
* Improve error message in bin/parser/train.py
2016-03-29 13:04:33 +11:00
Matthew Honnibal
910a6c805f
* Add infix rule for double hyphens, re Issue #302
2016-03-29 13:03:44 +11:00
Matthew Honnibal
ad119c074f
* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.
2016-03-29 13:02:42 +11:00
Matthew Honnibal
8c7a1908ee
Merge pull request #307 from scoder/faster_string_store
...
remove internal redundancy and overhead from StringStore
2016-03-29 12:59:52 +11:00
Wolfgang Seeker
7195b6742d
add restrictions to L-arc and R-arc to prevent space heads
2016-03-28 10:40:52 +02:00
Matthew Honnibal
8c77a994c6
Merge pull request #305 from henningpeters/master
...
multiple langs in download script
2016-03-26 21:54:59 +11:00
Henning Peters
c90d4a6f17
relative imports in __init__.py
2016-03-26 11:44:53 +01:00
Henning Peters
db095a162c
fix
2016-03-25 18:59:47 +01:00
Henning Peters
b8f63071eb
add lang registration facility
2016-03-25 18:54:45 +01:00
Matthew Honnibal
9cd21ad5b5
Merge pull request #284 from olegzd/olegzd/example/inventoryCount
...
Added reloadable English() example for inventory counting
2016-03-25 09:48:47 +11:00