spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-11 14:42:37 +03:00

Author	SHA1	Message	Date
ines	effb55d591	Adjust formatting [ci skip]	2018-06-11 00:29:13 +02:00
Nathan Breit	ba6d2cf393	Add EpiTator to Universe (#2429 )	2018-06-11 00:24:13 +02:00
Daniel Ruf	d6d688914f	chore: cache dependencies (#2418 ) * chore: cache dependencies * chore: add CLA	2018-06-11 00:22:41 +02:00
himkt	1a568f2e08	fix wrong documentations (#2423 )	2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi	d66292f767	fix UD data file extensions (#2425 ) * fix UD data files extension * add contributor agreement for msklvsk	2018-06-08 14:26:11 +02:00
Nour Shalabi	a169b79092	Additions to Arabic stop words. (#2422 ) * Additions to Arabic stop words. * Create nourshalabi.md	2018-06-08 02:33:23 +02:00
Ines Montani	3f2e3cbd27	Add links to Reddit data (see #2401 )	2018-05-31 16:22:43 +02:00
ines	b8ef9c1000	Fix model names in conftest (see #2379 )	2018-05-30 14:10:20 +02:00
ines	0baaf836cf	Update formatting [ci skip]	2018-05-30 13:32:49 +02:00
ines	3913e18201	Add self-attentive-parser to universe (see #59 )	2018-05-30 13:31:28 +02:00
Maciej	c7d53348d7	Fix bug in CLI iob and ner converter (#2392 ) (fixes #2385 ) * issue_2385 add tests for iob_to_biluo converter function * issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter * issue_2385 add test to fix b char bug * add contributor agreement * fill contributor agreement	2018-05-30 12:28:44 +02:00
ines	605c663a4c	Fix HTML merger examples (see #2390 )	2018-05-30 12:22:32 +02:00
ansgar-t	9732988951	escape html in displacy.render (#2378 ) (closes #2361 ) ## Description Fix for issue #2361 : replace &, <, >, " with &amp; , &lt; , &gt; , &quot; in before rendering svg ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. (As discussed in the comments to #2361) - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-05-28 18:36:41 +02:00
Samuel Pouyt	d85494bfae	Added agrement (#2374 )	2018-05-26 18:19:08 +02:00
Samuel Pouyt	5f988b8e9c	Update _custom.jade (#2372 ) It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`	2018-05-26 18:17:12 +02:00
ines	d84a830d79	Merge branch 'master' of https://github.com/explosion/spaCy	2018-05-26 17:57:05 +02:00
ines	fb923b31ea	Fix bad HTML example (see #2376 ) and turn it into section on matcher + components Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.	2018-05-26 17:57:02 +02:00
James Messinger	4515e96e90	Better formatting for `spacy train` CLI (#2357 ) * Better formatting for `spacy train` CLI Changed to use fixed-spaces rather than tabs to align table headers and data. ### Before: ``` Itn. P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token % 0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4 1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1 2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9 ``` ### After: ``` Itn. Dep Loss NER Loss UAS NER P. NER R. NER F. Tag % Token % CPU WPS GPU WPS 0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4 1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1 2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9 ``` * Added contributor file	2018-05-25 13:08:45 +02:00
Shantam Raj	592834183a	corrected spelling (#2359 ) changed interpretted to interpreted	2018-05-24 13:29:52 +02:00
ines	8adb967e0c	Fix from source quickstart instructions for Windows See: https://stackoverflow.com/a/50478036/6400719	2018-05-24 12:42:16 +02:00
Aristo Rinjuang	432ede04af	adding more words and rephrasing (#2351 ) * adding more words and rephrasing * adding a contributor * tokenizer bugs solved	2018-05-24 11:40:57 +02:00
Jani Monoses	ec62cadf4c	Updates to Romanian support (#2354 ) * Add back Romanian in conftest * Romanian lex_attr * More tokenizer exceptions for Romanian * Add tests for some Romanian tokenizer exceptions	2018-05-24 11:40:00 +02:00
Shantam Raj	1a4682dd0b	Update _training.jade (#2340 ) * Update _training.jade Correcting grammar. Replacing "The" with "To". * Create armsp.md * Update armsp.md	2018-05-21 11:09:33 +02:00
cclauss	f7dcaa1f6b	Simplify is_config() and normalize_string_keys() (#2305 ) * Simplify is_config() and normalize_string_keys() * Use __in__ to avoid the nested _ands_ and _ors_. * Dict comprehension directly tracks with the doc string * Keep more basic loop in normalize_string_keys * Whitespace	2018-05-21 01:54:35 +02:00
ines	ff1082d8e4	Add version tag in CLI docs [ci skip]	2018-05-21 01:17:49 +02:00
Ines Montani	d4cc736b7c	💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346 ) * Go back to using requests instead of urllib (closes #2320) Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey. * Only download model if not installed (see #1456) Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience. * Pass additional options to pip when installing model (resolves #1456) Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example: python -m spacy download en --user * Add CLI option to enable installing model package dependencies * Revert "Add CLI option to enable installing model package dependencies" This reverts commit `9336ffe695`. * Update documentation	2018-05-20 20:26:56 +02:00
ines	b59e3b157f	Don't require attrs argument in Doc.retokenize and allow both ints and unicode (resolves #2304 )	2018-05-20 15:15:37 +02:00
ines	5768df4f09	Add SimpleFrozenDict util to use as default function argument	2018-05-20 15:13:37 +02:00
Matthew Honnibal	581d318971	Fix conftest	2018-05-15 00:54:45 +02:00
Tahar Zanouda	00417794d3	Add Arabic language (#2314 ) * added support for Arabic lang * added Arabic language support * updated conftest	2018-05-15 00:27:19 +02:00
Jani Monoses	0e08e49e87	Lemmatizer ro (#2319 ) * Add Romanian lemmatizer lookup table. Adapted from http://www.lexiconista.com/datasets/lemmatization/ by replacing cedillas with commas (ș and ț). The original dataset is licensed under the Open Database License. * Fix one blatant issue in the Romanian lemmatizer * Romanian examples file * Add ro_tokenizer in conftest * Add Romanian lemmatizer test	2018-05-12 15:20:04 +02:00
vishnumenon	ae3719ece5	Fix the code for FACILITIY entities (#2324 ) * Fix the code for FACILITIY entities As far as I can tell, the default models all use "FAC" rather than "FACILITY" * Added my Contributor Agreement * Rename vishnumenon to vishnumenon.md	2018-05-12 15:19:17 +02:00
Jani Monoses	42b34832e4	Update Romanian stopword list (#2316 ) * Contributor agreement for janimo * Update Romanian stopword list Include the correct spellings of all the words already in the repo that are using cedillas (ş and ţ) instead of commas (ș and ț). Add another unrelated spelling fix. See https://github.com/stopwords-iso/stopwords-ro/pull/1 and https://github.com/stopwords-iso/stopwords-ro/pull/2	2018-05-10 12:16:56 +02:00
Lucas Abbade	18af53014f	Adding my contributor agreement (#2315 ) * Create LRAbbade.md * Update LRAbbade.md	2018-05-09 21:25:05 +02:00
Lucas Abbade	be7fdc59d1	Update lex_attrs.py (#2307 ) * Update lex_attrs.py Fixed spelling mistakes of some numbers (according to Brazilian Portuguese). * Update lex_attrs.py As requested, I've included the correct spelling for both Brazilian Portuguese and Portuguese Portuguese. I will advise however, that the two are separated in the future. Brazilian Portuguese is a very different language from the original one, although most of the writing is unified, the way people talk in both countries is radically different. Keeping both languages as one may lead to bigger issues in the future, especially when it comes to spell checking.	2018-05-09 20:49:31 +02:00
mauryaland	5368ba028a	Update stop_words.py for French language (#2310 ) * Add contraction forms of some common stopwords All the stopwords added contain the apostrophe" ' "or " ’ ". * Adds contributor agreement mauryaland * Update mauryaland.md	2018-05-09 12:04:38 +02:00
ines	7a3599c21a	Fix formatting and consistency	2018-05-07 23:02:11 +02:00
ines	37facf9b4d	Add config for no-response [ci skip]	2018-05-07 22:04:54 +02:00
ines	ac25bc4016	Add docs section on sentence segmentation [ci skip]	2018-05-07 21:25:20 +02:00
ines	14148cd147	Fix formatting and wording	2018-05-07 21:24:35 +02:00
ines	f803da609f	Add scattertext [ci skip]	2018-05-07 19:10:23 +02:00
ines	a685fff875	Merge branch 'master' of https://github.com/explosion/spaCy	2018-05-07 18:58:57 +02:00
ines	e2241c797c	Add lock-threads configuration [ci skip]	2018-05-07 18:54:22 +02:00
B!	414f5270b3	B Cavello's signed Contributor Agreement v2 (#2302 ) This time hopefully created in the right spot. (Sorry about that!)	2018-05-07 17:48:54 +02:00
Matt Upson	9a1d3b63fb	Add missing default to .set_extension (#2297 ) Failing to set a default, method, or getter results in a ValueError: ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0	2018-05-04 18:47:01 +02:00
ines	929a01139a	Order issue templates	2018-05-04 03:04:41 +02:00
Ines Montani	7f39c8896b	Update issue templates (#2295 ) * Update issue templates * Update templates	2018-05-04 03:02:26 +02:00
Douglas Knox	9b49a40f4e	Test and fix for Issue #2219 (#2272 ) Test and fix for Issue #2219: Token.similarity() failed if single letter	2018-05-03 18:40:46 +02:00
Paul O'Leary McCann	bd72fbf09c	Port Japanese mecab tokenizer from v1 (#2036 ) * Port Japanese mecab tokenizer from v1 This brings the Mecab-based Japanese tokenization introduced in #1246 to spaCy v2. There isn't a JapaneseTagger implementation yet, but POS tag information from Mecab is stored in a token extension. A tag map is also included. As a reminder, Mecab is required because Universal Dependencies are based on Unidic tags, and Janome doesn't support Unidic. Things to check: 1. Is this the right way to use a token extension? 2. What's the right way to implement a JapaneseTagger? The approach in #1246 relied on `tag_from_strings` which is just gone now. I guess the best thing is to just try training spaCy's default Tagger? -POLM * Add tagging/make_doc and tests	2018-05-03 18:38:26 +02:00
G.Pruvost	cc8e804648	#2211 - Support for ssl certs config on download command (#2212 ) * Add support for SSL/Certs customization on download CLI * Add a note on SSL options for the 'download' CLI in the README * Add contributor agreement	2018-05-03 18:37:02 +02:00

1 2 3 4 5 ...

8751 Commits