spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-13 07:32:34 +03:00

Author	SHA1	Message	Date
M. Revuelta Espinosa	51232ffb9e	Update universe.json (include PatternOmatic) (#6399 ) Request to include PatternOmatic in spaCy Universe Adds @revuel to contributors	2020-11-19 13:15:50 +01:00
Adriane Boyd	3cf6479467	Fix JSON in #6395	2020-11-17 15:25:41 +01:00
Adriane Boyd	c2eb0992ae	Fix JSON in #6395	2020-11-17 15:24:38 +01:00
Sam Edwardes	c3d9550f30	Added spaCyTextBlob to universe.json (#6395 )	2020-11-17 14:38:59 +01:00
Sam Edwardes	78913a4f95	Added spaCyTextBlob to universe.json (#6395 )	2020-11-17 14:38:34 +01:00
Adriane Boyd	96726ec1f6	Fix DocBin init in training example (#6396 )	2020-11-17 14:36:44 +01:00
Adriane Boyd	6f014efb97	Install dev requirements before running tests	2020-11-16 10:59:50 +01:00
Adriane Boyd	53493b032a	Clean installed packages before CI sdist install	2020-11-16 10:46:39 +01:00
Adriane Boyd	fb2c3075fd	Remove wheel from setup_requires	2020-11-16 10:34:04 +01:00
Adriane Boyd	ed32fa80cd	Update source install instructions * Use `pip install` instead of `python setup.py install` * For developers recommend: * `python setup.py build_ext --inplace -j N` * `python setup.py develop`	2020-11-16 10:13:51 +01:00
svlandeg	99d0412b6e	add link to REL project	2020-11-15 18:35:56 +01:00
svlandeg	73fc1ed963	remove labels from morphologizer constructor	2020-11-11 21:48:50 +01:00
svlandeg	d5a920325f	remove labels from constructor	2020-11-11 21:34:12 +01:00
svlandeg	fcd79e0655	remove set_morphology from docs	2020-11-11 21:32:34 +01:00
Adriane Boyd	320a8b1481	Add ent_id_ to strings serialized with Doc (#6353 )	2020-11-10 20:16:07 +08:00
Adriane Boyd	a7e7d6c6c9	Ignore misaligned in Morphologizer.get_loss (#6363 ) Fix bug where `Morphologizer.get_loss` treated misaligned annotation as `EMPTY_MORPH` rather than ignoring it. Remove unneeded default `EMPTY_MORPH` mappings.	2020-11-10 20:15:09 +08:00
Sofie Van Landeghem	a0c899a0ff	Fix textcat + transformer architecture (#6371 ) * add pooling to textcat TransformerListener * maybe_get_dim in case it's null	2020-11-10 20:14:47 +08:00
Ines Montani	3ca5c7082d	Use pip install . in quickstart [ci skip]	2020-11-10 17:27:49 +08:00
Ines Montani	de6453940e	Merge pull request #6305 from svlandeg/feature/score-docs [ci skip]	2020-11-10 02:52:11 +01:00
Ines Montani	d490428089	Update README.md [ci skip]	2020-11-10 09:51:20 +08:00
Alec Chapman	8b919d77c1	add medspacy to universe and fix example w/ cov-bsv	2020-11-10 09:49:39 +08:00
Ines Montani	4d337eedf2	Merge pull request #6322 from medspacy/master	2020-11-10 02:47:29 +01:00
Ines Montani	d7950c5ada	Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip]	2020-11-10 02:45:52 +01:00
Ines Montani	448bfbdc30	Remove conda from nightly install widget [ci skip]	2020-11-10 09:44:52 +08:00
svlandeg	789fb3d124	add docs for upstream argument of TransformerListener	2020-11-09 21:42:58 +01:00
Ines Montani	363ac73c72	Update docs [ci skip]	2020-11-09 12:43:26 +08:00
Adriane Boyd	90550552a0	CI updates for python 3.5 (#6354 ) * Update pip in CI * Use --prefer-binary * Use `--prefer-binary` * Delete all installed packages before testing source install * sdist install with --only-binary :all:	2020-11-06 13:35:51 +01:00
Daniel Vasic	20d72de986	Added Multext-East V5 tagset for Croatian language (#6248 ) * Added Multext-East V5 tagset for Croatian language * Create danielvasic.md * Update danielvasic.md * Update danielvasic.md * Add tag map to CroatianDefaults Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-11-05 12:19:22 +01:00
Robert Šípek	6069efe57d	Add tag map to cs language (#6284 )	2020-11-05 10:13:11 +01:00
Adriane Boyd	e4c3d6748c	Update TIGER link and tag description (#6344 )	2020-11-05 09:33:45 +01:00
Adriane Boyd	8644ee3e3f	Update TIGER link and tag description (#6344 )	2020-11-05 09:33:00 +01:00
Vu Ha	6d465ec52c	add oprd to the list of accepted deps for noun chunking (#6302 ) * add oprd to the list of accepted deps for noun chunking * add SCA	2020-11-05 09:17:35 +01:00
Adriane Boyd	31de700b0f	Fix on_match callback and remove empty patterns (#6312 ) For the `DependencyMatcher`: * Fix on_match callback so that it is called once per matched pattern * Fix results so that patterns with empty match lists are not returned	2020-11-05 09:16:26 +01:00
Sofie Van Landeghem	8ef056cf98	fix embed_size in Entity Linker architecture (#6343 )	2020-11-04 22:20:13 +01:00
Ines Montani	019a1dd5e8	Fix v3 overview [ci skip]	2020-11-03 18:10:06 +01:00
Adriane Boyd	b3ca183269	Add python 3.9 classifier	2020-11-03 17:31:09 +01:00
Adriane Boyd	244fcb815d	Add python 3.9 to CI, reenable python 3.7	2020-11-03 17:30:09 +01:00
Adriane Boyd	084fc575aa	Set version to v3.0.0rc3	2020-11-03 17:29:57 +01:00
Adriane Boyd	1c4df8fd09	Replace pytokenizations with internal alignment (#6293 ) * Replace pytokenizations with internal alignment Replace pytokenizations with internal alignment algorithm that is restricted to only allow differences in whitespace and capitalization. * Rename `spacy.training.align` to `spacy.training.alignment` to contain the `Alignment` dataclass * Implement `get_alignments` in `spacy.training.align` * Refactor trailing whitespace handling * Remove unnecessary exception for empty docs Allow a non-empty whitespace-only doc to be aligned with an empty doc * Remove empty docs exceptions completely	2020-11-03 16:24:38 +01:00
Adriane Boyd	a4b32b9552	Handle missing reference values in scorer (#6286 ) * Handle missing reference values in scorer Handle missing values in reference doc during scoring where it is possible to detect an unset state for the attribute. If no reference docs contain annotation, `None` is returned instead of a score. `spacy evaluate` displays `-` for missing scores and the missing scores are saved as `None`/`null` in the metrics. Attributes without unset states: * `token.head`: relies on `token.dep` to recognize unset values * `doc.cats`: unable to handle missing annotation Additional changes: * add optional `has_annotation` check to `score_scans` to replace `doc.sents` hack * update `score_token_attr_per_feat` to handle missing and empty morph representations * fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START` vs. `SENT_START` * Fix import * Update return types	2020-11-03 15:47:18 +01:00
Alec Chapman	204c7c8a00	fix thumbnail link to be github raw url	2020-11-01 07:53:48 -07:00
Adriane Boyd	5d2cb86c34	Fix on_match callback for DependencyMatcher (#6313 ) Fix `DependencyMatcher` so that the callback is called only once per match.	2020-10-31 12:20:27 +01:00
Adriane Boyd	45c9a68828	Identify final Matcher pattern node by quantifier (#6317 ) Modify the internal pattern representation in `Matcher` patterns to identify the final ID state using a unique quantifier rather than a combination of other attributes. It was insufficient to identify the final ID node based on an uninitialized `quantifier` (coincidentally being the same as the `ZERO`) with `nr_attr` as 0. (In addition, it was potentially bug-prone that `nr_attr` was set to 0 even though attrs were allocated.) In the case of `{"OP": "!"}` (a valid, if pointless, pattern), `nr_attr` is 0 and the quantifier is ZERO, so the previous methods for incrementing to the ID node at the end of the pattern weren't able to distinguish the final ID node from the `{"OP": "!"}` pattern.	2020-10-31 12:18:48 +01:00
Sofie Van Landeghem	2918923541	fix resolving of dot notation (#6326 )	2020-10-31 12:17:06 +01:00
Alec Chapman	73d22d96ff	add medspacy to universe and fix example w/ cov-bsv	2020-10-29 07:53:56 -06:00
Duygu Altinok	0e55f806dd	Turkish tokenization improvements (#6268 ) * added single and paired orth variants * added token match * added long text tokenization test * inverted init * normalized lemmas to lowercase * more abbrevs * tests for ordinals and abbrevs * separated period abbvrevs to another list * fiex typo * added ordinal and abbrev tests * added number tests for dates * minor refinement * added inflected abbrevs regex * added percentage and inflection * cosmetics * added token match * added url inflection tests * excluded url tokens from custom pattern * removed url match import	2020-10-29 09:43:17 +01:00
Adriane Boyd	58a7461cff	Add Macedonian to website languages	2020-10-29 08:51:26 +01:00
Adriane Boyd	94aa4c7410	Add Nepali to supported languages on website (#6315 )	2020-10-29 08:51:15 +01:00
Kunal Sharma	1b8f1f6f1b	Adding MindMeld to Universe JSON (#6275 ) * Adding Mindmeld to Universe JSON Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/ * Signing contribution agreement. Co-authored-by: kunshar2 <kunshar2@cisco.com>	2020-10-29 08:50:59 +01:00
Adriane Boyd	8cc5ed6771	Add Macedonian to website languages	2020-10-29 08:49:56 +01:00

... 36 37 38 39 40 ...

15700 Commits