Ines Montani
885691a7ab
Describe converters more explicitly (see #2643 )
2018-09-12 14:53:03 +02:00
Grivaz
aeba99ab0d
Introduces a bulk merge function, in order to solve issue #653 ( #2696 )
...
* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions
2018-09-10 16:41:42 +02:00
tyburam
476472d181
Lex _attrs for polish language ( #2750 )
...
* Signed spaCy contributor agreement
* Added polish version of english lex_attrs
2018-09-10 11:53:57 +02:00
Sainath Adapa
77139bc03c
Basic support for Telugu language ( #2751 )
2018-09-10 11:53:18 +02:00
Maxim Kupfer
97e2874225
added contributor agreement for mbkupfer ( #2738 )
2018-09-10 11:32:03 +02:00
Maxim Kupfer
cebe50b5b8
Remove ')' for clarity ( #2737 )
...
Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know.
2018-09-10 11:31:49 +02:00
Matthew Honnibal
b2cb1fc67d
Merge matcher tests
2018-09-06 01:39:53 +02:00
Suraj Krishnan Rajan
356af7b0a1
Fix tests
2018-09-06 01:39:36 +02:00
Piotr Żelasko
bdb2165bd1
Less norm computations in token similarity ( #2730 )
...
* Less norm computations in token similarity
* Contributor agreement
2018-09-05 21:50:23 +02:00
Aniruddha Adhikary
4530ddcc51
update bengali token rules for hyphen and digits ( #2731 )
2018-09-05 21:49:00 +02:00
Nathaniel J. Smith
26849874ad
When calling getoption() in conftest.py, pass a default option ( #2709 )
...
* When calling getoption() in conftest.py, pass a default option
This is necessary to allow testing an installed spacy by running:
pytest --pyargs spacy
* Add contributor agreement
2018-09-03 09:57:52 +02:00
Matthew Honnibal
4d2d7d5866
Fix new feature flags
2018-08-27 02:12:39 +02:00
Matthew Honnibal
598dbf1ce0
Fix character-based tokenization for Japanese
2018-08-27 01:51:38 +02:00
Matthew Honnibal
3763e20afc
Pass subword_features and conv_depth params
2018-08-27 01:51:15 +02:00
Matthew Honnibal
8051136d70
Support subword_features and conv_depth params in Tok2Vec
2018-08-27 01:50:48 +02:00
Matthew Honnibal
9c33d4d1df
Add more hyper-parameters to spacy ud-train
...
* subword_features: Controls whether subword features are used in the
word embeddings. True by default (specifically, prefix, suffix and word
shape). Should be set to False for languages like Chinese and Japanese.
* conv_depth: Depth of the convolutional layers. Defaults to 4.
2018-08-27 01:48:46 +02:00
Ines Montani
e9022f7b33
Remove docstrings for deprecated arguments (see #2703 )
2018-08-26 14:23:13 +02:00
Ines Montani
559f4139e3
Add FAC to spacy.explain ( resolves #2706 )
2018-08-26 14:13:50 +02:00
Steve Sharp
ca747f58a4
Update _install.jade ( #2688 )
...
Typo fix: "models" -> "model"
2018-08-22 13:16:04 +02:00
Matthew Honnibal
51a9efbf3b
Add draft Binder class
2018-08-22 13:12:51 +02:00
Arya Prabhudesai
db2c2b286c
Create aryaprabhudesai.md ( #2681 )
2018-08-20 18:56:14 +02:00
Matthew Honnibal
f0e6be689a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-08-16 17:18:19 +02:00
Matthew Honnibal
5ce459d2ee
Fix error in vocab
2018-08-16 17:18:09 +02:00
Ines Montani
aeb49eb625
Update version [ci skip]
2018-08-16 16:56:02 +02:00
Ines Montani
a0eacd3293
Merge branch 'master' into develop
2018-08-16 16:55:05 +02:00
Ines Montani
c0fa9903f4
Update model directory JS [ci skip]
...
Prevent the default release URL from being overwritten and add license type
2018-08-16 16:54:50 +02:00
Ines Montani
03f661fefb
Add Greek to models directory [ci skip]
2018-08-16 16:51:56 +02:00
Matthew Honnibal
00febda2e3
Improve alignment around quotes
2018-08-16 01:04:34 +02:00
Matthew Honnibal
66a3f2ba21
Lower-case text before alignment
2018-08-16 00:42:36 +02:00
Matthew Honnibal
595c893791
Expose noise_level option in train CLI
2018-08-16 00:41:44 +02:00
Matthew Honnibal
8365226bf3
Fix lookup of symbols in vocab.
2018-08-15 23:43:34 +02:00
Matthew Honnibal
b9f0588580
Set version to v2.1.0a1
2018-08-15 17:22:39 +02:00
Matthew Honnibal
e968016417
Note link between issues #2671 and #2675
2018-08-15 17:18:28 +02:00
Matthew Honnibal
63bdc734ba
Skip flakey test
2018-08-15 16:56:55 +02:00
Matthew Honnibal
ce512e1d47
Fix #2671 : Incorrect match ID on some patterns
2018-08-15 16:19:08 +02:00
Matthew Honnibal
f12b9190f6
Xfail test for issue #2671
2018-08-15 15:55:31 +02:00
Matthew Honnibal
7cfa665ce6
Add failing test for issue 2671: Incorrect rule ID returned from matcher
2018-08-15 15:54:33 +02:00
Matthew Honnibal
1b2a5869ab
Set version to v2.1.0a2.dev0
2018-08-15 15:38:52 +02:00
Matthew Honnibal
5080760288
Add extra comment on 'add label' in parser
2018-08-15 15:37:24 +02:00
Matthew Honnibal
6e749d3c70
Skip flakey parser test
2018-08-15 15:37:04 +02:00
Ines Montani
fd9d175a53
Update live code [ci skip]
2018-08-15 15:28:48 +02:00
Matthew Honnibal
48ed1ca29d
Add branch option to push-tag script
2018-08-15 03:16:43 +02:00
Matthew Honnibal
6ea981c839
Add converter for jsonl NER data
2018-08-14 14:04:32 +02:00
Matthew Honnibal
a9fb6d5511
Fix docs2jsonl function
2018-08-14 14:03:48 +02:00
Matthew Honnibal
ea2edd1e2c
Merge branch 'feature/docs_to_json' into develop
2018-08-14 13:23:42 +02:00
Matthew Honnibal
6ec236ab08
Fix label-clobber bug in parser.begin_training()
...
The parser.begin_training() method was rewritten in v2.1. The rewrite
introduced a regression, where if you added labels prior to
begin_training(), these labels were discarded. This patch fixes that.
2018-08-14 13:20:19 +02:00
Matthew Honnibal
02c5c114d0
Fix usage of deprecated freqs.txt in init-model
2018-08-14 13:19:15 +02:00
Matthew Honnibal
2a5a61683e
Add function to get train format from Doc objects
...
Our JSON training format is annoying to work with, and we've wanted to
retire it for some time. In the meantime, we can at least add some
missing functions to make it easier to live with.
This patch adds a function that generates the JSON format from a list
of Doc objects, one per paragraph. This should be a convenient way to handle
a lot of data conversions: whatever format you have the source
information in, you can use it to setup a Doc object. This approach
should offer better future-proofing as well. Hopefully, we can steadily
rewrite code that is sensitive to the current data-format, so that it
instead goes through this function. Then when we change the data format,
we won't have such a problem.
2018-08-14 13:13:10 +02:00
Matthew Honnibal
4336397ecb
Update develop from master
2018-08-14 03:04:28 +02:00
Matthew Honnibal
13fa550b36
Merge branch 'master' of https://github.com/explosion/spaCy
2018-08-14 02:32:01 +02:00