mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-27 10:26:35 +03:00
0226b3bf0e
10 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Jaydeep Borkar
|
6a06a3fa6a |
Update stop_words.py and add name in contributors (#4325)
* Update stop_words.py and add name in contributors * add jaydeepborkar.md in contributors directory * Reset template [ci skip] Co-authored-by: Ines Montani <ines@ines.io> |
||
Ines Montani
|
eddeb36c96
|
💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. --> ## Description - [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files. - [x] Update flake8 config to exclude very large files (lemmatization tables etc.) - [x] Update code to be compatible with flake8 rules - [x] Fix various small bugs, inconsistencies and messy stuff in the language data - [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means) Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results. At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information. ### Types of change enhancement, code style ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. |
||
Ines Montani
|
d33953037e
|
💫 Port master changes over to develop (#2979)
* Create aryaprabhudesai.md (#2681) * Update _install.jade (#2688) Typo fix: "models" -> "model" * Add FAC to spacy.explain (resolves #2706) * Remove docstrings for deprecated arguments (see #2703) * When calling getoption() in conftest.py, pass a default option (#2709) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement * update bengali token rules for hyphen and digits (#2731) * Less norm computations in token similarity (#2730) * Less norm computations in token similarity * Contributor agreement * Remove ')' for clarity (#2737) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know. * added contributor agreement for mbkupfer (#2738) * Basic support for Telugu language (#2751) * Lex _attrs for polish language (#2750) * Signed spaCy contributor agreement * Added polish version of english lex_attrs * Introduces a bulk merge function, in order to solve issue #653 (#2696) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions * Describe converters more explicitly (see #2643) * Add multi-threading note to Language.pipe (resolves #2582) [ci skip] * Fix formatting * Fix dependency scheme docs (closes #2705) [ci skip] * Don't set stop word in example (closes #2657) [ci skip] * Add words to portuguese language _num_words (#2759) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Update Indonesian model (#2752) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file * Fixed spaCy+Keras example (#2763) * bug fixes in keras example * created contributor agreement * Adding French hyphenated first name (#2786) * Fix typo (closes #2784) * Fix typo (#2795) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer" * Adding basic support for Sinhala language. (#2788) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement * Also include lowercase norm exceptions * Fix error (#2802) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement * Add charlax's contributor agreement (#2805) * agreement of contributor, may I introduce a tiny pl languge contribution (#2799) * Contributors agreement * Contributors agreement * Contributors agreement * Add jupyter=True to displacy.render in documentation (#2806) * Revert "Also include lowercase norm exceptions" This reverts commit |
||
Ines Montani
|
ea20b72c08 |
💫 Make like_num work for prefixed numbers (#2808)
* Only split + prefix if not numbers * Make like_num work for prefixed numbers * Add test for like_num |
||
Aashish Gangwani
|
6eebfc7bf4 |
Added numbers to ../lang/hi/lex_attrs.py (#2629)
I have added numbers in hindi lex_attrs.py file according to Indian numbering system(https://en.wikipedia.org/wiki/Indian_numbering_system) and here are there english translations: 'शून्य' => zero 'एक' => one 'दो' => two 'तीन' => three 'चार' => four 'पांच' => five 'छह' => six 'सात'=>seven 'आठ' => eight 'नौ' => nine 'दस' => ten 'ग्यारह' => eleven 'बारह' => twelve 'तेरह' => thirteen 'चौदह' => fourteen 'पंद्रह' => fifteen 'सोलह'=> sixteen 'सत्रह' => seventeen 'अठारह' => eighteen 'उन्नीस' => nineteen 'बीस' => twenty 'तीस' => thirty 'चालीस' => forty 'पचास' => fifty 'साठ' => sixty 'सत्तर' => seventy 'अस्सी' => eighty 'नब्बे' => ninety 'सौ' => hundred 'हज़ार' => thousand 'लाख' => hundred thousand 'करोड़' => ten million 'अरब' => billion 'खरब' => hundred billion <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. |
||
Benjamin Peterson
|
9452134cd1 | remove no-break spaces from Hindi example (fixes #1750) | ||
Abhinav Sharma
|
59f5740ede
|
improved upon the list of included stop_words | ||
Ines Montani
|
42b241ccd0
|
Update language code in usage example in comment | ||
Abhinav Sharma
|
84edade82d
|
Create examples.py
Populated the file with the translations of English example sentences |
||
ines
|
266e7180a7 | Add Language class, stop words and basic stemmer that sets NORM |