spaCy/website/meta/universe.json

{
    "resources": [
        {
            "id": "spacy-vscode",
            "title": "spaCy Visual Studio Code Extension",
            "thumb": "https://raw.githubusercontent.com/explosion/spacy-vscode/main/icon.png",
            "slogan": "Work with spaCy's config files in VS Code",
            "description": "The spaCy VS Code Extension provides additional tooling and features for working with spaCy's config files. Version 1.0.0 includes hover descriptions for registry functions, variables, and section names within the config as an installable extension.",
            "url": "https://marketplace.visualstudio.com/items?itemName=Explosion.spacy-extension",
            "github": "explosion/spacy-vscode",
            "code_language": "python",
            "author": "Explosion",
            "author_links": {
                "twitter": "@explosion_ai",
                "github": "explosion"
            },
            "category": ["extension"],
            "tags": []
        },
        {
            "id": "sayswho",
            "title": "SaysWho",
            "slogan": "Quote identification, attribution and resolution",
            "description": "A Python package for identifying and attributing quotes in text. It uses a combination of spaCy functionality, logic and grammar to find quotes and their speakers, then uses the spaCy coreferencing model to better clarify who is speaking. Currently English only.",
            "github": "afriedman412/sayswho",
            "pip": "sayswho",
            "code_language": "python",
            "author": "Andy Friedman",
            "author_links": {
                "twitter": "@steadynappin",
                "github": "afriedman412"
            },
            "code_example": [
                "from sayswho import SaysWho",
                "text = open(\"path/to/your/text_file.txt\").read()",
                "sw = SaysWho()",
                "sw.attribute(text)",

                "sw.expand_match() # see quote/cluster matches",
                "sw.render_to_html() # output your text, quotes and cluster matches to an html file called \"temp.html\""
            ],
            "category": ["standalone"],
            "tags": ["attribution", "coref", "text-processing"]
        },
        {
            "id": "parsigs",
            "title": "parsigs",
            "slogan": "Structuring prescriptions text made simple using spaCy",
            "description": "Parsigs is an open-source project that aims to extract the relevant dosage information from prescriptions text without compromising the patient's privacy.\n\nNotice you also need to install the model in order to use the package: `pip install https://huggingface.co/royashcenazi/en_parsigs/resolve/main/en_parsigs-any-py3-none-any.whl`",
            "github": "royashcenazi/parsigs",
            "pip": "parsigs",
            "code_language": "python",
            "author": "Roy Ashcenazi",
            "code_example": [
                "# You'll need to install the trained model, see instructions in the description section",
                "from parsigs.parse_sig_api import StructuredSig, SigParser",
                "sig_parser = SigParser()",
                "",
                "sig = 'Take 1 tablet of ibuprofen 200mg 3 times every day for 3 weeks'",
                "parsed_sig = sig_parser.parse(sig)"
            ],
            "author_links": {
                "github": "royashcenazi"
            },
            "category": ["model", "research", "biomedical"],
            "tags": ["sigs", "prescription","pharma"]
        },
        {
            "id": "latincy",
            "title": "LatinCy",
            "thumb": "https://raw.githubusercontent.com/diyclassics/la_core_web_lg/main/latincy-logo.png",
            "slogan": "Synthetic trained spaCy pipelines for Latin NLP",
            "description": "Set of trained general purpose Latin-language 'core' pipelines for use with spaCy. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other.",
            "url": "https://huggingface.co/latincy",
            "code_example": [
                "# pip install https://huggingface.co/latincy/la_core_web_lg/resolve/main/la_core_web_lg-any-py3-none-any.whl",
                "import spacy",
                "nlp = spacy.load('la_core_web_lg')",
                "doc = nlp('Haec narrantur a poetis de Perseo')",
                "",
                "print(f'{doc[0].text}, {doc[0].norm_}, {doc[0].lemma_}, {doc[0].pos_}')",
                "",
                "# > Haec, haec, hic, DET"
            ],
            "code_language": "python",
            "author": "Patrick J. Burns",
            "author_links": {
                "twitter": "@diyclassics",
                "github": "diyclassics",
                "website": "https://diyclassics.github.io/"
            },
            "category": ["pipeline", "research"],
            "tags": ["latin"]
        },
        {
            "id": "odycy",
            "title": "OdyCy",
            "slogan": "General-purpose language pipelines for premodern Greek.",
            "description": "Academically validated modular NLP pipelines for premodern Greek. odyCy achieves state of the art performance on multiple tasks on unseen test data from the Universal Dependencies Perseus treebank, and performs second best on the PROIEL treebank’s test set on even more tasks. In addition performance also seems relatively stable across the two evaluation datasets in comparison with other NLP pipelines. OdyCy is being used at the Center for Humanities Computing for preprocessing and analyzing Ancient Greek corpora for New Testament research, meaning that you can expect consistent maintenance and improvements.",
            "github": "centre-for-humanities-computing/odyCy",
            "code_example": [
                "# To install the high-accuracy transformer-based pipeline",
                "# pip install https://huggingface.co/chcaa/grc_odycy_joint_trf/resolve/main/grc_odycy_joint_trf-any-py3-none-any.whl",
                "import spacy",
                "",
                "nlp = spacy.load('grc_odycy_joint_trf')",
                "",
                "doc = nlp('τὴν γοῦν Ἀττικὴν ἐκ τοῦ ἐπὶ πλεῖστον διὰ τὸ λεπτόγεων ἀστασίαστον οὖσαν ἄνθρωποι ᾤκουν οἱ αὐτοὶ αἰεί.')"
            ],
            "code_language": "python",
            "url": "https://centre-for-humanities-computing.github.io/odyCy/",
            "thumb": "https://raw.githubusercontent.com/centre-for-humanities-computing/odyCy/7b94fec60679d06272dca88a4dcfe0f329779aea/docs/_static/logo.svg",
            "image": "https://github.com/centre-for-humanities-computing/odyCy/raw/main/docs/_static/logo_with_text_below.svg",
            "author": "Jan Kostkan, Márton Kardos (Center for Humanities Computing, Aarhus University)",
            "author_links": {
                "github": "centre-for-humanities-computing",
                "website": "https://chc.au.dk/"
            },
            "category": ["pipeline", "standalone", "research"],
            "tags": ["ancient Greek"]
        },
        {
            "id": "spacy-wasm",
            "title": "spacy-wasm",
            "slogan": "spaCy in the browser using WebAssembly",
            "description": "Run spaCy directly in the browser with WebAssembly. Using Pyodide, the application loads the spaCy model and renders the text prompt with displaCy.",
            "url": "https://spacy-wasm.vercel.app/",
            "github": "SyedAhkam/spacy-wasm",
            "code_language": "python",
            "author": "Syed Ahkam",
            "author_links": {
                "twitter": "@SyedAhkam1",
                "github": "SyedAhkam"
            },
            "category": ["visualizers"],
            "tags": ["visualization", "deployment"]
        },
        {
            "id": "spacysee",
            "title": "spaCysee",
            "slogan": "Visualize spaCy's Dependency Parsing, POS tagging, and morphological analysis",
            "description": "A project that helps you visualize your spaCy docs in Jupyter notebooks. Each of the dependency tags, POS tags and morphological features are clickable. Clicking on a tag will bring up the relevant documentation for that tag.",
            "github": "moxley01/spacysee",
            "pip": "spacysee",
            "code_example": [
                "import spacy",
                "from spacysee import render",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = nlp('This is a neat way to visualize your spaCy docs')",
                "render(doc, width='500', height='500')"
            ],
            "code_language": "python",
            "thumb": "https://www.mattoxley.com/static/images/spacysee_logo.svg",
            "image": "https://www.mattoxley.com/static/images/spacysee_logo.svg",
            "author": "Matt Oxley",
            "author_links": {
                "twitter": "matt0xley",
                "github": "moxley01",
                "website": "https://mattoxley.com"
            },
            "category": ["visualizers"],
            "tags": ["visualization"]
        },
        {
            "id": "grecy",
            "title": "greCy",
            "slogan": "Ancient Greek pipelines for spaCy",
            "description": "greCy offers state-of-the-art pipelines for ancient Greek NLP. It installs language models available in various sizes, some of them containing either word vectors or the aristoBERTo transformer.",
            "github": "jmyerston/greCy",
            "pip": "grecy",
            "code_example": [
                "python -m grecy install grc_proiel_trf",
                "",
                "#After installing grc_proiel_trf or any other model",
                "import spacy",
                "",
                "nlp = spacy.load('grc_proiel_trf')",
                "doc = nlp('δοκῶ μοι περὶ ὧν πυνθάνεσθε οὐκ ἀμελέτητος εἶναι')",
                "",
                "for token in doc:",
                "   print(f'{token.text}, lemma: {token.lemma_}, pos: {token.pos_}, dep: {token.dep_}')"
            ],
            "code_language": "python",
            "thumb": "https://jacobo-syntax.hf.space/media/03a5317fa660c142e41dd2870b4273ce4e668e6fcdee0a276891f563.png",
            "author": "Jacobo Myerston",
            "author_links": {
                "twitter": "@jcbmyrstn",
                "github": "jmyerston",
                "website": "https://huggingface.co/spaces/Jacobo/syntax"
            },
            "category": ["pipeline", "research","models"],
            "tags": ["ancient Greek"]
        },
        {
            "id": "spacy-cleaner",
            "title": "spacy-cleaner",
            "slogan": "Easily clean text with spaCy!",
            "description": "**spacy-cleaner** utilises spaCy `Language` models to replace, remove, and \n  mutate spaCy tokens. Cleaning actions available are:\n\n* Remove/replace stopwords.\n* Remove/replace punctuation.\n* Remove/replace numbers.\n* Remove/replace emails.\n* Remove/replace URLs.\n* Perform lemmatisation.\n\nSee our [docs](https://ce11an.github.io/spacy-cleaner/) for more information.",
            "github": "Ce11an/spacy-cleaner",
            "pip": "spacy-cleaner",
            "code_example": [
                "import spacy",
                "import spacy_cleaner",
                "from spacy_cleaner.processing import removers, replacers, mutators",
                "",
                "model = spacy.load(\"en_core_web_sm\")",
                "pipeline = spacy_cleaner.Pipeline(",
                "    model,",
                "    removers.remove_stopword_token,",
                "    replacers.replace_punctuation_token,",
                "    mutators.mutate_lemma_token,",
                ")",
                "",
                "texts = [\"Hello, my name is Cellan! I love to swim!\"]",
                "",
                "pipeline.clean(texts)",
                "# ['hello _IS_PUNCT_ Cellan _IS_PUNCT_ love swim _IS_PUNCT_']"
            ],
            "code_language": "python",
            "url": "https://ce11an.github.io/spacy-cleaner/",
            "image": "https://raw.githubusercontent.com/Ce11an/spacy-cleaner/main/docs/assets/images/spacemen.png",
            "author": "Cellan Hall",
            "author_links": {
                "twitter": "Ce11an",
                "github": "Ce11an",
                "website": "https://www.linkedin.com/in/cellan-hall/"
            },
            "category": ["extension"],
            "tags": ["text-processing"]
        },
        {
            "id": "Zshot",
            "title": "Zshot",
            "slogan": "Zero and Few shot named entity & relationships recognition",
            "github": "ibm/zshot",
            "pip": "zshot",
            "code_example": [
                "import spacy",
                "from zshot import PipelineConfig, displacy",
                "from zshot.linker import LinkerRegen",
                "from zshot.mentions_extractor import MentionsExtractorSpacy",
                "from zshot.utils.data_models import Entity",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "# zero shot definition of entities",
                "nlp_config = PipelineConfig(",
                "    mentions_extractor=MentionsExtractorSpacy(),",
                "    linker=LinkerRegen(),",
                "    entities=[",
                "        Entity(name='Paris',",
                "               description='Paris is located in northern central France, in a north-bending arc of the river Seine'),",
                "        Entity(name='IBM',",
                "               description='International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York'),",
                "        Entity(name='New York', description='New York is a city in U.S. state'),",
                "        Entity(name='Florida', description='southeasternmost U.S. state'),",
                "        Entity(name='American',",
                "              description='American, something of, from, or related to the United States of America, commonly known as the United States or America'),",
                "        Entity(name='Chemical formula',",
                "               description='In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecul'),",
                "        Entity(name='Acetamide',",
                "               description='Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent.'),",
                "        Entity(name='Armonk',",
                "               description='Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States.'),",
                "        Entity(name='Acetic Acid',",
                "               description='Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH'),",
                "        Entity(name='Industrial solvent',",
                "               description='Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent.'),",
                "    ]",
                ")",
                "nlp.add_pipe('zshot', config=nlp_config, last=True)",
                "",
                "text = 'International Business Machines Corporation (IBM) is an American multinational technology corporation' \\",
                "        ' headquartered in Armonk, New York, with operations in over 171 countries.'",
                "",
                "doc = nlp(text)",
                "displacy.serve(doc, style='ent')"
            ],
            "thumb": "https://ibm.github.io/zshot/img/graph.png",
            "url": "https://ibm.github.io/zshot/",
            "author": "IBM Research",
            "author_links": {
                "github": "ibm",
                "twitter": "IBMResearch",
                "website": "https://research.ibm.com/labs/ireland/"
            },
            "category": ["scientific", "models", "research"]
        },
        {
            "id": "concepcy",
            "title": "concepCy",
            "slogan": "A multilingual knowledge graph in spaCy",
            "description": "A spaCy wrapper for ConceptNet, a freely-available semantic network designed to help computers understand the meaning of words.",
            "github": "JulesBelveze/concepcy",
            "pip": "concepcy",
            "code_example": [
                "import spacy",
                "import concepcy",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "# Using default concepCy configuration",
                "nlp.add_pipe('concepcy')",
                "",
                "doc = nlp('WHO is a lovely company')",
                "",
                "# Access all the 'RelatedTo' relations from the Doc",
                "for word, relations in doc._.relatedto.items():",
                "    print(f'Word: {word}\n{relations}')",
                "",
                "# Access the 'RelatedTo' relations word by word",
                "for token in doc:",
                "    print(f'Word: {token}\n{token._.relatedto}')"
            ],
            "category": ["pipeline"],
            "image": "https://github.com/JulesBelveze/concepcy/blob/main/figures/concepcy.png",
            "tags": ["semantic", "ConceptNet"],
            "author": "Jules Belveze",
            "author_links": {
                "github": "JulesBelveze",
                "website": "https://www.linkedin.com/in/jules-belveze/"
            }
        },
        {
            "id": "spacyfishing",
            "title": "spaCy fishing",
            "slogan": "Named entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.",
            "description": "A spaCy wrapper of Entity-Fishing for named entity disambiguation and linking against a Wikidata knowledge base.",
            "github": "Lucaterre/spacyfishing",
            "pip": "spacyfishing",
            "code_example": [
                "import spacy",
                "text = 'Victor Hugo and Honoré de Balzac are French writers who lived in Paris.'",
                "nlp = spacy.load('en_core_web_sm')",
                "nlp.add_pipe('entityfishing')",
                "doc = nlp(text)",
                "for span in doc.ents:",
                "    print((ent.text, ent.label_, ent._.kb_qid, ent._.url_wikidata, ent._.nerd_score))",
                "# ('Victor Hugo', 'PERSON', 'Q535', 'https://www.wikidata.org/wiki/Q535', 0.972)",
                "# ('Honoré de Balzac', 'PERSON', 'Q9711', 'https://www.wikidata.org/wiki/Q9711', 0.9724)",
                "# ('French', 'NORP', 'Q121842', 'https://www.wikidata.org/wiki/Q121842', 0.3739)",
                "# ('Paris', 'GPE', 'Q90', 'https://www.wikidata.org/wiki/Q90', 0.5652)",
                "## Set parameter `extra_info` to `True` and check also span._.description, span._.src_description, span._.normal_term, span._.other_ids"
            ],
            "category": ["models", "pipeline"],
            "image": "https://raw.githubusercontent.com/Lucaterre/spacyfishing/main/docs/spacyfishing-logo-resized.png",
            "tags": ["NER", "NEL"],
            "author": "Lucas Terriel",
            "author_links": {
                "twitter": "TerreLuca",
                "github": "Lucaterre"
            }
        },
        {
            "id": "aim-spacy",
            "title": "Aim-spaCy",
            "slogan": "Aim-spaCy is an Aim-based spaCy experiment tracker.",
            "description": "Aim-spaCy helps to easily collect, store and explore training logs for spaCy, including: hyper-parameters, metrics and displaCy visualizations",
            "github": "aimhubio/aim-spacy",
            "pip": "aim-spacy",
            "code_example": ["https://github.com/aimhubio/aim-spacy/tree/master/examples"],
            "code_language": "python",
            "url": "https://aimstack.io/spacy",
            "thumb": "https://user-images.githubusercontent.com/13848158/172912427-ee9327ea-3cd8-47fa-8427-6c0d36cd831f.png",
            "image": "https://user-images.githubusercontent.com/13848158/136364717-0939222c-55b6-44f0-ad32-d9ab749546e4.png",
            "author": "AimStack",
            "author_links": {
                "twitter": "aimstackio",
                "github": "aimhubio",
                "website": "https://aimstack.io"
            },
            "category": ["visualizers"],
            "tags": ["experiment-tracking", "visualization"]
        },
        {
            "id": "spacy-report",
            "title": "spacy-report",
            "slogan": "Generates interactive reports for spaCy models.",
            "description": "The goal of spacy-report is to offer static reports for spaCy models that help users make better decisions on how the models can be used.",
            "github": "koaning/spacy-report",
            "pip": "spacy-report",
            "thumb": "https://github.com/koaning/spacy-report/raw/main/icon.png",
            "image": "https://raw.githubusercontent.com/koaning/spacy-report/main/gif.gif",
            "code_example": [
                "python -m spacy report textcat training/model-best/ corpus/train.spacy corpus/dev.spacy"
            ],
            "category": ["visualizers", "research"],
            "author": "Vincent D. Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning",
                "website": "https://koaning.io"
            }
        },
        {
            "id": "scrubadub_spacy",
            "title": "scrubadub_spacy",
            "category": ["pipeline"],
            "slogan": "Remove personally identifiable information from text using spaCy.",
            "description": "scrubadub removes personally identifiable information from text. scrubadub_spacy is an extension that uses spaCy NLP models to remove personal information from text.",
            "github": "LeapBeyond/scrubadub_spacy",
            "pip": "scrubadub-spacy",
            "url": "https://github.com/LeapBeyond/scrubadub_spacy",
            "code_language": "python",
            "author": "Leap Beyond",
            "author_links": {
                "github": "LeapBeyond",
                "website": "https://leapbeyond.ai"
            },
            "code_example": [
                "import scrubadub, scrubadub_spacy",
                "scrubber = scrubadub.Scrubber()",
                "scrubber.add_detector(scrubadub_spacy.detectors.SpacyEntityDetector)",
                "print(scrubber.clean(\"My name is Alex, I work at LifeGuard in London, and my eMail is alex@lifeguard.com btw. my super secret twitter login is username: alex_2000 password: g-dragon180888\"))",
                "# My name is {{NAME}}, I work at {{ORGANIZATION}} in {{LOCATION}}, and my eMail is {{EMAIL}} btw. my super secret twitter login is username: {{USERNAME}} password: {{PASSWORD}}"
            ]
        },
        {
            "id": "spacy-setfit-textcat",
            "title": "spacy-setfit-textcat",
            "category": ["research"],
            "tags": ["SetFit", "Few-Shot"],
            "slogan": "spaCy Project: Experiments with SetFit & Few-Shot Classification",
            "description": "This project is an experiment with spaCy and few-shot text classification using SetFit",
            "github": "pmbaumgartner/spacy-setfit-textcat",
            "url": "https://github.com/pmbaumgartner/spacy-setfit-textcat",
            "code_language": "python",
            "author": "Peter Baumgartner",
            "author_links": {
                "twitter": "pmbaumgartner",
                "github": "pmbaumgartner",
                "website": "https://www.peterbaumgartner.com/"
            },
            "code_example": [
                "https://colab.research.google.com/drive/1CvGEZC0I9_v8gWrBxSJQ4Z8JGPJz-HYb?usp=sharing"
            ]
        },
        {
            "id": "spacy-experimental",
            "title": "spacy-experimental",
            "category": ["extension"],
            "slogan": "Cutting-edge experimental spaCy components and features",
            "description": "This package includes experimental components and features for spaCy v3.x, for example model architectures, pipeline components and utilities.",
            "github": "explosion/spacy-experimental",
            "pip": "spacy-experimental",
            "url": "https://github.com/explosion/spacy-experimental",
            "code_language": "python",
            "author": "Explosion",
            "author_links": {
                "twitter": "explosion_ai",
                "github": "explosion",
                "website": "https://explosion.ai/"
            },
            "code_example": [
                "python -m pip install -U pip setuptools wheel",
                "python -m pip install spacy-experimental"
            ]
        },
        {
            "id": "spacypdfreader",
            "title": "spacypdfreader",
            "category": ["pipeline"],
            "tags": ["PDF"],
            "slogan": "Easy PDF to text to spaCy text extraction in Python.",
            "description": "*spacypdfreader* is a Python library that allows you to convert PDF files directly into *spaCy* `Doc` objects. The library provides several built in parsers or bring your own parser. `Doc` objects are annotated with several custom attributes including: `token._.page_number`, `doc._.page_range`, `doc._.first_page`, `doc._.last_page`, `doc._.pdf_file_name`, and `doc._.page(int)`.",
            "github": "SamEdwardes/spacypdfreader",
            "pip": "spacypdfreader",
            "url": "https://samedwardes.github.io/spacypdfreader/",
            "code_language": "python",
            "author": "Sam Edwardes",
            "author_links": {
                "twitter": "TheReaLSamlam",
                "github": "SamEdwardes",
                "website": "https://samedwardes.com"
            },
            "code_example": [
                "import spacy",
                "from spacypdfreader.spacypdfreader import pdf_reader",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = pdf_reader('tests/data/test_pdf_01.pdf', nlp)",
                "",
                "# Get the page number of any token.",
                "print(doc[0]._.page_number)  # 1",
                "print(doc[-1]._.page_number) # 4",
                "",
                "# Get page meta data about the PDF document.",
                "print(doc._.pdf_file_name)   # 'tests/data/test_pdf_01.pdf'",
                "print(doc._.page_range)      # (1, 4)",
                "print(doc._.first_page)      # 1",
                "print(doc._.last_page)       # 4",
                "",
                "# Get all of the text from a specific PDF page.",
                "print(doc._.page(4))         # 'able to display the destination page (unless...'"
            ]
        },
        {
            "id": "nlpcloud",
            "title": "NLPCloud.io",
            "slogan": "Production-ready API for spaCy models in production",
            "description": "A highly-available hosted API to easily deploy and use spaCy models in production. Supports NER, POS tagging, dependency parsing, and tokenization.",
            "github": "nlpcloud",
            "pip": "nlpcloud",
            "code_example": [
                "import nlpcloud",
                "",
                "client = nlpcloud.Client('en_core_web_lg', '4eC39HqLyjWDarjtT1zdp7dc')",
                "client.entities('John Doe is a Go Developer at Google')",
                "# [{'end': 8, 'start': 0, 'text': 'John Doe', 'type': 'PERSON'}, {'end': 25, 'start': 13, 'text': 'Go Developer', 'type': 'POSITION'}, {'end': 35,'start': 30, 'text': 'Google', 'type': 'ORG'}]"
            ],
            "thumb": "https://avatars.githubusercontent.com/u/77671902",
            "image": "https://nlpcloud.io/assets/images/logo.svg",
            "code_language": "python",
            "author": "NLPCloud.io",
            "author_links": {
                "github": "nlpcloud",
                "twitter": "cloud_nlp",
                "website": "https://nlpcloud.io"
            },
            "category": ["apis", "nonpython", "standalone"],
            "tags": ["api", "deploy", "production"]
        },
        {
            "id": "eMFDscore",
            "title": "eMFDscore : Extended Moral Foundation Dictionary Scoring for Python",
            "slogan": "Extended Moral Foundation Dictionary Scoring for Python",
            "description": "eMFDscore is a library for the fast and flexible extraction of various moral information metrics from textual input data. eMFDscore is built on spaCy for faster execution and performs minimal preprocessing consisting of tokenization, syntactic dependency parsing, lower-casing, and stopword/punctuation/whitespace removal. eMFDscore lets users score documents with multiple Moral Foundations Dictionaries, provides various metrics for analyzing moral information, and extracts moral patient, agent, and attribute words related to entities.",
            "github": "medianeuroscience/emfdscore",
            "code_example": [
                "from emfdscore.scoring import score_docs",
                "import pandas as pd",
                "template_input = pd.read_csv('emfdscore/template_input.csv', header=None)",
                "DICT_TYPE = 'emfd'",
                "PROB_MAP = 'single'",
                "SCORE_METHOD = 'bow'",
                "OUT_METRICS = 'vice-virtue'",
                "OUT_CSV_PATH = 'single-vv.csv'",
                "df = score_docs(template_input,DICT_TYPE,PROB_MAP,SCORE_METHOD,OUT_METRICS,num_docs)"
            ],
            "code_language": "python",
            "author": "Media Neuroscience Lab",
            "author_links": {
                "github": "medianeuroscience",
                "twitter": "medianeuro"
            },
            "category": ["research", "teaching"],
            "tags": ["morality", "dictionary", "sentiment"]
        },
        {
            "id": "skweak",
            "title": "skweak",
            "slogan": "Weak supervision for NLP",
            "description": "`skweak` brings the power of weak supervision to NLP tasks, and in particular sequence labelling and text classification. Instead of annotating documents by hand, `skweak` allows you to define *labelling functions* to automatically label your documents, and then aggregate their results using a statistical model that estimates the accuracy and confusions of each labelling function.",
            "github": "NorskRegnesentral/skweak",
            "pip": "skweak",
            "code_example": [
                "import spacy, re",
                "from skweak import heuristics, gazetteers, aggregation, utils",
                "",
                "# LF 1: heuristic to detect occurrences of MONEY entities",
                "def money_detector(doc):",
                "   for tok in doc[1:]:",
                "      if tok.text[0].isdigit() and tok.nbor(-1).is_currency:",
                "          yield tok.i-1, tok.i+1, 'MONEY'",
                "lf1 = heuristics.FunctionAnnotator('money', money_detector)",
                "",
                "# LF 2: detection of years with a regex",
                "lf2= heuristics.TokenConstraintAnnotator ('years', lambda tok: re.match('(19|20)\\d{2}$', tok.text), 'DATE')",
                "",
                "# LF 3: a gazetteer with a few names",
                "NAMES = [('Barack', 'Obama'), ('Donald', 'Trump'), ('Joe', 'Biden')]",
                "trie = gazetteers.Trie(NAMES)",
                "lf3 = gazetteers.GazetteerAnnotator('presidents', {'PERSON':trie})",
                "",
                "# We create a corpus (here with a single text)",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = nlp('Donald Trump paid $750 in federal income taxes in 2016')",
                "",
                "# apply the labelling functions",
                "doc = lf3(lf2(lf1(doc)))",
                "",
                "# and aggregate them",
                "hmm = aggregation.HMM('hmm', ['PERSON', 'DATE', 'MONEY'])",
                "hmm.fit_and_aggregate([doc])",
                "",
                "# we can then visualise the final result (in Jupyter)",
                "utils.display_entities(doc, 'hmm')"
            ],
            "code_language": "python",
            "url": "https://github.com/NorskRegnesentral/skweak",
            "thumb": "https://raw.githubusercontent.com/NorskRegnesentral/skweak/main/data/skweak_logo_thumbnail.jpg",
            "image": "https://raw.githubusercontent.com/NorskRegnesentral/skweak/main/data/skweak_logo.jpg",
            "author": "Pierre Lison",
            "author_links": {
                "twitter": "plison2",
                "github": "plison",
                "website": "https://www.nr.no/~plison"
            },
            "category": ["pipeline", "standalone", "research", "training"],
            "tags": [],
            "spacy_version": 3
        },
        {
            "id": "numerizer",
            "title": "numerizer",
            "slogan": "Convert natural language numerics into ints and floats.",
            "description": "A SpaCy extension for Docs, Spans and Tokens that converts numerical words and quantitative named entities into numeric strings.",
            "github": "jaidevd/numerizer",
            "pip": "numerizer",
            "code_example": [
                "from spacy import load",
                "import numerizer",
                "nlp = load('en_core_web_sm') # or any other model",
                "doc = nlp('The Hogwarts Express is at platform nine and three quarters')",
                "doc._.numerize()",
                "# {nine and three quarters: '9.75'}"
            ],
            "author": "Jaidev Deshpande",
            "author_links": {
                "github": "jaidevd",
                "twitter": "jaidevd"
            },
            "category": ["standalone"]
        },
        {
            "id": "spacy-dbpedia-spotlight",
            "title": "DBpedia Spotlight for SpaCy",
            "slogan": "Use DBpedia Spotlight to link entities inside SpaCy",
            "description": "This library links SpaCy with [DBpedia Spotlight](https://www.dbpedia-spotlight.org/). You can easily get the DBpedia entities from your documents, using the public web service or by using your own instance of DBpedia Spotlight. The `doc.ents` are populated with the entities and all their details (URI, type, ...).",
            "github": "MartinoMensio/spacy-dbpedia-spotlight",
            "pip": "spacy-dbpedia-spotlight",
            "code_example": [
                "import spacy_dbpedia_spotlight",
                "# load your model as usual",
                "nlp = spacy.load('en_core_web_lg')",
                "# add the pipeline stage",
                "nlp.add_pipe('dbpedia_spotlight')",
                "# get the document",
                "doc = nlp('The president of USA is calling Boris Johnson to decide what to do about coronavirus')",
                "# see the entities",
                "print('Entities', [(ent.text, ent.label_, ent.kb_id_) for ent in doc.ents])",
                "# inspect the raw data from DBpedia spotlight",
                "print(doc.ents[0]._.dbpedia_raw_result)"
            ],
            "category": ["models", "pipeline"],
            "author": "Martino Mensio",
            "author_links": {
                "twitter": "MartinoMensio",
                "github": "MartinoMensio",
                "website": "https://martinomensio.github.io"
            }
        },
        {
            "id": "spacy-textblob",
            "title": "spacytextblob",
            "slogan": "A TextBlob sentiment analysis pipeline component for spaCy.",
            "thumb": "https://github.com/SamEdwardes/spacytextblob/raw/main/docs/static/img/logo-thumb-square-250x250.png",
            "description": "spacytextblob is a pipeline component that enables sentiment analysis using the [TextBlob](https://github.com/sloria/TextBlob) library. It will add the additional extension `._.blob` to `Doc`, `Span`, and `Token` objects.",
            "github": "SamEdwardes/spacytextblob",
            "pip": "spacytextblob",
            "code_example": [
                "# the following installations are required",
                "# python -m textblob.download_corpora",
                "# python -m spacy download en_core_web_sm",
                "",
                "import spacy",
                "from spacytextblob.spacytextblob import SpacyTextBlob",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "nlp.add_pipe('spacytextblob')",
                "text = 'I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy.'",
                "doc = nlp(text)",
                "doc._.blob.polarity                            # Polarity: -0.125",
                "doc._.blob.subjectivity                        # Subjectivity: 0.9",
                "doc._.blob.sentiment_assessments.assessments   # Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]",
                "doc._.blob.ngrams()                            # [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]"
            ],
            "code_language": "python",
            "url": "https://spacytextblob.netlify.app/",
            "author": "Sam Edwardes",
            "author_links": {
                "twitter": "TheReaLSamlam",
                "github": "SamEdwardes",
                "website": "https://samedwardes.com"
            },
            "category": ["pipeline"],
            "tags": ["sentiment", "textblob"],
            "spacy_version": 3
        },
        {
            "id": "spacy-sentence-bert",
            "title": "spaCy - sentence-transformers",
            "slogan": "Pipelines for pretrained sentence-transformers (BERT, RoBERTa, XLM-RoBERTa & Co.) directly within spaCy",
            "description": "This library lets you use the embeddings from [sentence-transformers](https://github.com/UKPLab/sentence-transformers) of Docs, Spans and Tokens directly from spaCy. Most models are for the english language but three of them are multilingual.",
            "github": "MartinoMensio/spacy-sentence-bert",
            "pip": "spacy-sentence-bert",
            "code_example": [
                "import spacy_sentence_bert",
                "# load one of the models listed at https://github.com/MartinoMensio/spacy-sentence-bert/",
                "nlp = spacy_sentence_bert.load_model('en_roberta_large_nli_stsb_mean_tokens')",
                "# get two documents",
                "doc_1 = nlp('Hi there, how are you?')",
                "doc_2 = nlp('Hello there, how are you doing today?')",
                "# use the similarity method that is based on the vectors, on Doc, Span or Token",
                "print(doc_1.similarity(doc_2[0:7]))"
            ],
            "category": ["models", "pipeline"],
            "author": "Martino Mensio",
            "author_links": {
                "twitter": "MartinoMensio",
                "github": "MartinoMensio",
                "website": "https://martinomensio.github.io"
            }
        },
        {
            "id": "spacy-streamlit",
            "title": "spacy-streamlit",
            "slogan": "spaCy building blocks for Streamlit apps",
            "github": "explosion/spacy-streamlit",
            "description": "This package contains utilities for visualizing spaCy models and building interactive spaCy-powered apps with [Streamlit](https://streamlit.io). It includes various building blocks you can use in your own Streamlit app, like visualizers for **syntactic dependencies**, **named entities**, **text classification**, **semantic similarity** via word vectors, token attributes, and more.",
            "pip": "spacy-streamlit",
            "category": ["visualizers"],
            "thumb": "https://i.imgur.com/mhEjluE.jpg",
            "image": "https://user-images.githubusercontent.com/13643239/85388081-f2da8700-b545-11ea-9bd4-e303d3c5763c.png",
            "code_example": [
                "import spacy_streamlit",
                "",
                "models = [\"en_core_web_sm\", \"en_core_web_md\"]",
                "default_text = \"Sundar Pichai is the CEO of Google.\"",
                "spacy_streamlit.visualize(models, default_text)"
            ],
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            }
        },
        {
            "id": "spaczz",
            "title": "spaczz",
            "slogan": "Fuzzy matching and more for spaCy.",
            "description": "Spaczz provides fuzzy matching and multi-token regex matching functionality for spaCy. Spaczz's components have similar APIs to their spaCy counterparts and spaczz pipeline components can integrate into spaCy pipelines where they can be saved/loaded as models.",
            "github": "gandersen101/spaczz",
            "pip": "spaczz",
            "code_example": [
                "import spacy",
                "from spaczz.matcher import FuzzyMatcher",
                "",
                "nlp = spacy.blank(\"en\")",
                "text = \"\"\"Grint Anderson created spaczz in his home at 555 Fake St,",
                "Apt 5 in Nashv1le, TN 55555-1234 in the US.\"\"\"  # Spelling errors intentional.",
                "doc = nlp(text)",
                "",
                "matcher = FuzzyMatcher(nlp.vocab)",
                "matcher.add(\"NAME\", [nlp(\"Grant Andersen\")])",
                "matcher.add(\"GPE\", [nlp(\"Nashville\")])",
                "matches = matcher(doc)",
                "",
                "for match_id, start, end, ratio in matches:",
                "    print(match_id, doc[start:end], ratio)"
            ],
            "code_language": "python",
            "url": "https://spaczz.readthedocs.io/en/latest/",
            "author": "Grant Andersen",
            "author_links": {
                "twitter": "gandersen101",
                "github": "gandersen101"
            },
            "category": ["pipeline"],
            "tags": ["fuzzy-matching", "regex"]
        },
        {
            "id": "spacy-universal-sentence-encoder",
            "title": "spaCy - Universal Sentence Encoder",
            "slogan": "Make use of Google's Universal Sentence Encoder directly within spaCy",
            "description": "This library lets you use Universal Sentence Encoder embeddings of Docs, Spans and Tokens directly from TensorFlow Hub",
            "github": "MartinoMensio/spacy-universal-sentence-encoder",
            "pip": "spacy-universal-sentence-encoder",
            "code_example": [
                "import spacy_universal_sentence_encoder",
                "# load one of the models: ['en_use_md', 'en_use_lg', 'xx_use_md', 'xx_use_lg']",
                "nlp = spacy_universal_sentence_encoder.load_model('en_use_lg')",
                "# get two documents",
                "doc_1 = nlp('Hi there, how are you?')",
                "doc_2 = nlp('Hello there, how are you doing today?')",
                "# use the similarity method that is based on the vectors, on Doc, Span or Token",
                "print(doc_1.similarity(doc_2[0:7]))"
            ],
            "category": ["models", "pipeline"],
            "author": "Martino Mensio",
            "author_links": {
                "twitter": "MartinoMensio",
                "github": "MartinoMensio",
                "website": "https://martinomensio.github.io"
            }
        },
        {
            "id": "whatlies",
            "title": "whatlies",
            "slogan": "Make interactive visualisations to figure out 'what lies' in word embeddings.",
            "description": "This small library offers tools to make visualisation easier of both word embeddings as well as operations on them. It has support for spaCy prebuilt models as a first class citizen but also offers support for sense2vec. There's a convenient API to perform linear algebra as well as support for popular transformations like PCA/UMAP/etc.",
            "github": "koaning/whatlies",
            "pip": "whatlies",
            "thumb": "https://i.imgur.com/rOkOiLv.png",
            "image": "https://raw.githubusercontent.com/koaning/whatlies/master/docs/gif-two.gif",
            "code_example": [
                "from whatlies import EmbeddingSet",
                "from whatlies.language import SpacyLanguage",
                "",
                "lang = SpacyLanguage('en_core_web_md')",
                "words = ['cat', 'dog', 'fish', 'kitten', 'man', 'woman', 'king', 'queen', 'doctor', 'nurse']",
                "",
                "emb = lang[words]",
                "emb.plot_interactive(x_axis='man', y_axis='woman')"
            ],
            "category": ["visualizers", "research"],
            "author": "Vincent D. Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning",
                "website": "https://koaning.io"
            }
        },
        {
            "id": "bertopic",
            "title": "BERTopic",
            "slogan": "Leveraging BERT and c-TF-IDF to create easily interpretable topics.",
            "description": "BERTopic is a topic modeling technique that leverages embedding models and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. BERTopic supports guided, (semi-) supervised, hierarchical, and dynamic topic modeling.",
            "github": "maartengr/bertopic",
            "pip": "bertopic",
            "thumb": "https://i.imgur.com/Rx2LfBm.png",
            "image": "https://raw.githubusercontent.com/MaartenGr/BERTopic/master/images/topic_visualization.gif",
            "code_example": [
                "import spacy",
                "from bertopic import BERTopic",
                "from sklearn.datasets import fetch_20newsgroups",
                "",
                "docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']",
                "nlp = spacy.load('en_core_web_md', exclude=['tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer'])",
                "",
                "topic_model = BERTopic(embedding_model=nlp)",
                "topics, probs = topic_model.fit_transform(docs)",
                "",
                "fig = topic_model.visualize_topics()",
                "fig.show()"
            ],
            "category": ["visualizers", "training"],
            "author": "Maarten Grootendorst",
            "author_links": {
                "twitter": "maartengr",
                "github": "maartengr",
                "website": "https://maartengrootendorst.com"
            }
        },
        {
            "id": "tokenwiser",
            "title": "tokenwiser",
            "slogan": "Connect vowpal-wabbit & scikit-learn models to spaCy to run simple classification benchmarks. Comes with many utility functions for spaCy pipelines.",
            "github": "koaning/tokenwiser",
            "pip": "tokenwiser",
            "thumb": "https://koaning.github.io/tokenwiser/token.png",
            "image": "https://koaning.github.io/tokenwiser/logo-tokw.png",
            "code_example": [
                "import spacy",
                "",
                "from sklearn.pipeline import make_pipeline",
                "from sklearn.feature_extraction.text import CountVectorizer",
                "from sklearn.linear_model import LogisticRegression",
                "",
                "from tokenwiser.component import attach_sklearn_categoriser",
                "",
                "X = [",
                "    'i really like this post',",
                "    'thanks for that comment',",
                "    'i enjoy this friendly forum',",
                "    'this is a bad post',",
                "    'i dislike this article',",
                "    'this is not well written'",
                "]",
                "",
                "y = ['pos', 'pos', 'pos', 'neg', 'neg', 'neg']",
                "",
                "# Note that we're training a pipeline here via a single-batch `.fit()` method",
                "pipe = make_pipeline(CountVectorizer(), LogisticRegression()).fit(X, y)",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "# This is where we attach our pre-trained model as a pipeline step.",
                "attach_sklearn_categoriser(nlp, pipe_name='silly_sentiment', estimator=pipe)"
            ],
            "category": ["pipeline", "training"],
            "author": "Vincent D. Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning",
                "website": "https://koaning.io"
            }
        },
        {
            "id": "Klayers",
            "title": "Klayers",
            "category": ["pipeline"],
            "tags": ["AWS"],
            "slogan": "spaCy as a AWS Lambda Layer",
            "description": "A collection of Python Packages as AWS Lambda(λ) Layers",
            "github": "keithrozario/Klayers",
            "pip": "",
            "url": "https://github.com/keithrozario/Klayers",
            "code_language": "python",
            "author": "Keith Rozario",
            "author_links": {
                "twitter": "keithrozario",
                "github": "keithrozario",
                "website": "https://www.keithrozario.com"
            },
            "code_example": [
                "# SAM Template",
                "MyLambdaFunction:",
                "    Type: AWS::Serverless::Function",
                "    Handler: 02_pipeline/spaCy.main",
                "    Description: Name Entity Extraction",
                "    Runtime: python3.8",
                "    Layers:",
                "        - arn:aws:lambda:${self:provider.region}:113088814899:layer:Klayers-python37-spacy:18"
            ]
        },
        {
            "type": "education",
            "id": "video-spacys-ner-model-alt",
            "title": "Named Entity Recognition (NER) using spaCy",
            "slogan": "",
            "description": "In this video, I show you how to do named entity recognition using the spaCy library for Python.",
            "youtube": "Gn_PjruUtrc",
            "author": "Applied Language Technology",
            "author_links": {
                "twitter": "HelsinkiNLP",
                "github": "Applied-Language-Technology",
                "website": "https://applied-language-technology.mooc.fi/"
            },
            "category": ["videos"]
        },
        {
            "id": "HuSpaCy",
            "title": "HuSpaCy",
            "category": ["models"],
            "tags": ["Hungarian"],
            "slogan": "HuSpaCy: industrial-strength Hungarian natural language processing",
            "description": "HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing facilities.",
            "github": "huspacy/huspacy",
            "pip": "huspacy",
            "url": "https://github.com/huspacy/huspacy",
            "code_language": "python",
            "author": "SzegedAI",
            "author_links": {
                "github": "https://szegedai.github.io/",
                "website": "https://u-szeged.hu/english"
            },
            "code_example": [
                "# Load the model using huspacy",
                "import huspacy",
                "",
                "nlp = huspacy.load()",
                "",
                "# Load the mode using spacy.load()",
                "import spacy",
                "",
                "nlp = spacy.load(\"hu_core_news_lg\")",
                "",
                "# Load the model directly as a module",
                "import hu_core_news_lg",
                "",
                "nlp = hu_core_news_lg.load()\n",
                "# Either way you get the same model and can start processing texts.",
                "doc = nlp(\"Csiribiri csiribiri zabszalma - négy csillag közt alszom ma.\")"
            ]
        },
        {
            "id": "spacy-stanza",
            "title": "spacy-stanza",
            "slogan": "Use the latest Stanza (StanfordNLP) research models directly in spaCy",
            "description": "This package wraps the Stanza (formerly StanfordNLP) library, so you can use Stanford's models as a spaCy pipeline. Using this wrapper, you'll be able to use the following annotations, computed by your pretrained `stanza` model:\n\n- Statistical tokenization (reflected in the `Doc` and its tokens)\n - Lemmatization (`token.lemma` and `token.lemma_`)\n - Part-of-speech tagging (`token.tag`, `token.tag_`, `token.pos`, `token.pos_`)\n - Dependency parsing (`token.dep`, `token.dep_`, `token.head`)\n - Named entity recognition (`doc.ents`, `token.ent_type`, `token.ent_type_`, `token.ent_iob`, `token.ent_iob_`)\n - Sentence segmentation (`doc.sents`)",
            "github": "explosion/spacy-stanza",
            "pip": "spacy-stanza",
            "thumb": "https://i.imgur.com/myhLjMJ.png",
            "code_example": [
                "import stanza",
                "import spacy_stanza",
                "",
                "stanza.download(\"en\")",
                "nlp = spacy_stanza.load_pipeline(\"en\")",
                "",
                "doc = nlp(\"Barack Obama was born in Hawaii. He was elected president in 2008.\")",
                "for token in doc:",
                "    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)",
                "print(doc.ents)"
            ],
            "category": ["pipeline", "standalone", "models", "research"],
            "author": "Explosion",
            "author_links": {
                "twitter": "explosion_ai",
                "github": "explosion",
                "website": "https://explosion.ai"
            }
        },
        {
            "id": "spacy-udpipe",
            "title": "spacy-udpipe",
            "slogan": "Use the latest UDPipe models directly in spaCy",
            "description": "This package wraps the fast and efficient UDPipe language-agnostic NLP pipeline (via its Python bindings), so you can use UDPipe pre-trained models as a spaCy pipeline for 50+ languages out-of-the-box. Inspired by spacy-stanza, this package offers slightly less accurate models that are in turn much faster.",
            "github": "TakeLab/spacy-udpipe",
            "pip": "spacy-udpipe",
            "code_example": [
                "import spacy_udpipe",
                "",
                "spacy_udpipe.download(\"en\") # download English model",
                "",
                "text = \"Wikipedia is a free online encyclopedia, created and edited by volunteers around the world.\"",
                "nlp = spacy_udpipe.load(\"en\")",
                "",
                "doc = nlp(text)",
                "for token in doc:",
                "    print(token.text, token.lemma_, token.pos_, token.dep_)"
            ],
            "category": ["pipeline", "standalone", "models", "research"],
            "author": "TakeLab",
            "author_links": {
                "github": "TakeLab",
                "website": "https://takelab.fer.hr/"
            }
        },
        {
            "id": "spacy-server",
            "title": "spaCy Server",
            "slogan": "\uD83E\uDD9C Containerized HTTP API for spaCy NLP",
            "description": "For developers who need programming language agnostic NLP, spaCy Server is a containerized HTTP API that provides industrial-strength natural language processing. Unlike other servers, our server is fast, idiomatic, and well documented.",
            "github": "neelkamath/spacy-server",
            "code_example": [
                "docker run --rm -dp 8080:8080 neelkamath/spacy-server",
                "curl http://localhost:8080/ner -H 'Content-Type: application/json' -d '{\"sections\": [\"My name is John Doe. I grew up in California.\"]}'"
            ],
            "code_language": "shell",
            "url": "https://hub.docker.com/r/neelkamath/spacy-server",
            "author": "Neel Kamath",
            "author_links": {
                "github": "neelkamath",
                "website": "https://neelkamath.com"
            },
            "category": ["apis"],
            "tags": ["docker"]
        },
        {
            "id": "nlp-architect",
            "title": "NLP Architect",
            "slogan": "Python lib for exploring Deep NLP & NLU by Intel AI",
            "github": "NervanaSystems/nlp-architect",
            "pip": "nlp-architect",
            "thumb": "https://i.imgur.com/vMideRx.png",
            "category": ["standalone", "research"],
            "tags": ["pytorch"]
        },
        {
            "id": "Chatterbot",
            "title": "Chatterbot",
            "slogan": "A machine-learning based conversational dialog engine for creating chat bots",
            "github": "gunthercox/ChatterBot",
            "pip": "chatterbot",
            "thumb": "https://i.imgur.com/eyAhwXk.jpg",
            "code_example": [
                "from chatterbot import ChatBot",
                "from chatterbot.trainers import ListTrainer",
                "# Create a new chat bot named Charlie",
                "chatbot = ChatBot('Charlie')",
                "trainer = ListTrainer(chatbot)",
                "trainer.train([",
                "'Hi, can I help you?',",
                "'Sure, I would like to book a flight to Iceland.',",
                "'Your flight has been booked.'",
                "])",
                "",
                "response = chatbot.get_response('I would like to book a flight.')"
            ],
            "author": "Gunther Cox",
            "author_links": {
                "github": "gunthercox"
            },
            "category": ["conversational", "standalone"],
            "tags": ["chatbots"]
        },
        {
            "id": "alibi",
            "title": "alibi",
            "slogan": "Algorithms for monitoring and explaining machine learning models ",
            "github": "SeldonIO/alibi",
            "pip": "alibi",
            "thumb": "https://i.imgur.com/YkzQHRp.png",
            "code_example": [
                "from alibi.explainers import AnchorTabular",
                "explainer = AnchorTabular(predict_fn, feature_names)",
                "explainer.fit(X_train)",
                "explainer.explain(x)"
            ],
            "author": "Seldon",
            "category": ["standalone", "research"]
        },
        {
            "id": "spacymoji",
            "slogan": "Emoji handling and meta data as a spaCy pipeline component",
            "github": "ines/spacymoji",
            "description": "spaCy extension and pipeline component for adding emoji meta data to `Doc` objects. Detects emoji consisting of one or more unicode characters, and can optionally merge multi-char emoji (combined pictures, emoji with skin tone modifiers) into one token. Human-readable emoji descriptions are added as a custom attribute, and an optional lookup table can be provided for your own descriptions. The extension sets the custom `Doc`, `Token` and `Span` attributes `._.is_emoji`, `._.emoji_desc`, `._.has_emoji` and `._.emoji`.",
            "pip": "spacymoji",
            "category": ["pipeline"],
            "tags": ["emoji", "unicode"],
            "thumb": "https://i.imgur.com/XOTYIgn.jpg",
            "code_example": [
                "import spacy",
                "from spacymoji import Emoji",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "nlp.add_pipe(\"emoji\", first=True)",
                "doc = nlp(\"This is a test 😻 👍🏿\")",
                "",
                "assert doc._.has_emoji is True",
                "assert doc[2:5]._.has_emoji is True",
                "assert doc[0]._.is_emoji is False",
                "assert doc[4]._.is_emoji is True",
                "assert doc[5]._.emoji_desc == \"thumbs up dark skin tone\"",
                "assert len(doc._.emoji) == 2",
                "assert doc._.emoji[1] == (\"👍🏿\", 5, \"thumbs up dark skin tone\")"
            ],
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            }
        },
        {
            "id": "spacyopentapioca",
            "title": "spaCyOpenTapioca",
            "slogan": "Named entity linking on Wikidata in spaCy via OpenTapioca",
            "description": "A spaCy wrapper of OpenTapioca for named entity linking on Wikidata",
            "github": "UB-Mannheim/spacyopentapioca",
            "pip": "spacyopentapioca",
            "code_example": [
                "import spacy",
                "nlp = spacy.blank('en')",
                "nlp.add_pipe('opentapioca')",
                "doc = nlp('Christian Drosten works in Germany.')",
                "for span in doc.ents:",
                "    print((span.text, span.kb_id_, span.label_, span._.description, span._.score))",
                "# ('Christian Drosten', 'Q1079331', 'PERSON', 'German virologist and university teacher', 3.6533377082098895)",
                "# ('Germany', 'Q183', 'LOC', 'sovereign state in Central Europe', 2.1099332471902863)",
                "## Check also span._.types, span._.aliases, span._.rank"
            ],
            "category": ["models", "pipeline"],
            "tags": ["NER", "NEL"],
            "author": "Renat Shigapov",
            "author_links": {
                "twitter": "_shigapov",
                "github": "shigapov"
            }
        },
        {
            "id": "spacy_readability",
            "slogan": "Add text readability meta data to Doc objects",
            "description": "spaCy v2.0 pipeline component for calculating readability scores of of text. Provides scores for Flesh-Kincaid grade level, Flesh-Kincaid reading ease, and Dale-Chall.",
            "github": "mholtzscher/spacy_readability",
            "pip": "spacy-readability",
            "code_example": [
                "import spacy",
                "from spacy_readability import Readability",
                "",
                "nlp = spacy.load('en')",
                "read = Readability(nlp)",
                "nlp.add_pipe(read, last=True)",
                "doc = nlp(\"I am some really difficult text to read because I use obnoxiously large words.\")",
                "doc._.flesch_kincaid_grade_level",
                "doc._.flesch_kincaid_reading_ease",
                "doc._.dale_chall"
            ],
            "author": "Michael Holtzscher",
            "author_links": {
                "github": "mholtzscher"
            },
            "category": ["pipeline"]
        },
        {
            "id": "spacy_cld",
            "title": "spaCy-CLD",
            "slogan": "Add language detection to your spaCy pipeline using CLD2",
            "description": "spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1).\n\nspacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.",
            "github": "nickdavidhaynes/spacy-cld",
            "pip": "spacy_cld",
            "code_example": [
                "import spacy",
                "from spacy_cld import LanguageDetector",
                "",
                "nlp = spacy.load('en')",
                "language_detector = LanguageDetector()",
                "nlp.add_pipe(language_detector)",
                "doc = nlp('This is some English text.')",
                "",
                "doc._.languages  # ['en']",
                "doc._.language_scores['en']  # 0.96"
            ],
            "author": "Nicholas D Haynes",
            "author_links": {
                "github": "nickdavidhaynes"
            },
            "category": ["pipeline"]
        },
        {
            "id": "spacy-iwnlp",
            "slogan": "German lemmatization with IWNLP",
            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add [IWNLP-py](https://github.com/Liebeck/iwnlp-py) as German lemmatizer directly into your spaCy pipeline.",
            "github": "Liebeck/spacy-iwnlp",
            "pip": "spacy-iwnlp",
            "code_example": [
                "import spacy",
                "from spacy_iwnlp import spaCyIWNLP",
                "",
                "nlp = spacy.load('de')",
                "iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20170501.json')",
                "nlp.add_pipe(iwnlp)",
                "doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')",
                "for token in doc:",
                "    print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas))"
            ],
            "author": "Matthias Liebeck",
            "author_links": {
                "github": "Liebeck"
            },
            "category": ["pipeline"],
            "tags": ["lemmatizer", "german"]
        },
        {
            "id": "spacy-sentiws",
            "slogan": "German sentiment scores with SentiWS",
            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add [SentiWS](http://wortschatz.uni-leipzig.de/en/download) as German sentiment score directly into your spaCy pipeline.",
            "github": "Liebeck/spacy-sentiws",
            "pip": "spacy-sentiws",
            "code_example": [
                "import spacy",
                "from spacy_sentiws import spaCySentiWS",
                "",
                "nlp = spacy.load('de_core_news_sm')",
                "nlp.add_pipe('sentiws', config={'sentiws_path': 'data/sentiws'})",
                "doc = nlp('Die Dummheit der Unterwerfung blüht in hübschen Farben.')",
                "",
                "for token in doc:",
                "    print('{}, {}, {}'.format(token.text, token._.sentiws, token.pos_))"
            ],
            "author": "Matthias Liebeck",
            "author_links": {
                "github": "Liebeck"
            },
            "category": ["pipeline"],
            "tags": ["sentiment", "german"]
        },
        {
            "id": "spacy-lefff",
            "slogan": "POS and French lemmatization with Lefff",
            "description": "spacy v2.0 extension and pipeline component for adding a French POS and lemmatizer based on [Lefff](https://hal.inria.fr/inria-00521242/).",
            "github": "sammous/spacy-lefff",
            "pip": "spacy-lefff",
            "code_example": [
                "import spacy",
                "from spacy_lefff import LefffLemmatizer, POSTagger",
                "",
                "nlp = spacy.load('fr')",
                "pos = POSTagger()",
                "french_lemmatizer = LefffLemmatizer(after_melt=True)",
                "nlp.add_pipe(pos, name='pos', after='parser')",
                "nlp.add_pipe(french_lemmatizer, name='lefff', after='pos')",
                "doc = nlp(u\"Paris est une ville très chère.\")",
                "for d in doc:",
                "    print(d.text, d.pos_, d._.melt_tagger, d._.lefff_lemma, d.tag_, d.lemma_)"
            ],
            "author": "Sami Moustachir",
            "author_links": {
                "github": "sammous"
            },
            "category": ["pipeline"],
            "tags": ["pos", "lemmatizer", "french"]
        },
        {
            "id": "lemmy",
            "title": "Lemmy",
            "slogan": "A Danish lemmatizer",
            "description": "Lemmy is a lemmatizer for Danish 🇩🇰 . It comes already trained on Dansk Sprognævns (DSN) word list (‘fuldformliste’) and the Danish Universal Dependencies and is ready for use. Lemmy also supports training on your own dataset. The model currently included in Lemmy was evaluated on the Danish Universal Dependencies dev dataset and scored an accruacy > 99%.\n\nYou can use Lemmy as a spaCy extension, more specifcally a spaCy pipeline component. This is highly recommended and makes the lemmas easily accessible from the spaCy tokens. Lemmy makes use of POS tags to predict the lemmas. When wired up to the spaCy pipeline, Lemmy has the benefit of using spaCy’s builtin POS tagger.",
            "github": "sorenlind/lemmy",
            "pip": "lemmy",
            "code_example": [
                "import da_custom_model as da # name of your spaCy model",
                "import lemmy.pipe",
                "nlp = da.load()",
                "",
                "# create an instance of Lemmy's pipeline component for spaCy",
                "pipe = lemmy.pipe.load()",
                "",
                "# add the comonent to the spaCy pipeline.",
                "nlp.add_pipe(pipe, after='tagger')",
                "",
                "# lemmas can now be accessed using the `._.lemma` attribute on the tokens",
                "nlp(\"akvariernes\")[0]._.lemma"
            ],
            "thumb": "https://i.imgur.com/RJVFRWm.jpg",
            "author": "Søren Lind Kristiansen",
            "author_links": {
                "github": "sorenlind"
            },
            "category": ["pipeline"],
            "tags": ["lemmatizer", "danish"]
        },
        {
            "id": "augmenty",
            "title": "Augmenty",
            "slogan": "The cherry on top of your NLP pipeline",
            "description": "Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.",
            "github": "kennethenevoldsen/augmenty",
            "pip": "augmenty",
            "code_example": [
                "import spacy",
                "import augmenty",
                "",
                "nlp = spacy.load('en_core_web_md')",
                "",
                "docs = nlp.pipe(['Augmenty is a great tool for text augmentation'])",
                "",
                "ent_dict = {'ORG': [['spaCy'], ['spaCy', 'Universe']]}",
                "entity_augmenter = augmenty.load('ents_replace.v1',",
                "                                 ent_dict = ent_dict, level=1)",
                "",
                "for doc in augmenty.docs(docs, augmenter=entity_augmenter, nlp=nlp):",
                "    print(doc)"
            ],
            "thumb": "https://github.com/KennethEnevoldsen/augmenty/blob/master/img/icon.png?raw=true",
            "author": "Kenneth Enevoldsen",
            "author_links": {
                "github": "kennethenevoldsen",
                "website": "https://www.kennethenevoldsen.com"
            },
            "category": ["training", "research"],
            "tags": ["training", "research", "augmentation"]
        },
        {
            "id": "dacy",
            "title": "DaCy",
            "slogan": "An efficient Pipeline for Danish NLP",
            "description": "DaCy is a Danish preprocessing pipeline trained in SpaCy. It has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency parsing for Danish. This repository contains material for using the DaCy, reproducing the results and guides on usage of the package. Furthermore, it also contains a series of behavioural test for biases and robustness of Danish NLP pipelines.",
            "github": "centre-for-humanities-computing/DaCy",
            "pip": "dacy",
            "code_example": [
                "import dacy",
                "print(dacy.models()) # get a list of dacy models",
                "nlp = dacy.load('medium')  # load your spacy pipeline",
                "",
                "# DaCy also includes functionality for adding other Danish models to the pipeline",
                "# For instance you can add the BertTone model for classification of sentiment polarity to the pipeline:",
                "nlp = add_berttone_polarity(nlp)"
            ],
            "thumb": "https://github.com/centre-for-humanities-computing/DaCy/blob/main/img/icon_no_title.png?raw=true",
            "author": "Centre for Humanities Computing Aarhus",
            "author_links": {
                "github": "centre-for-humanities-computing",
                "website": "https://chcaa.io/#/"
            },
            "category": ["pipeline"],
            "tags": ["pipeline", "danish"]
        },
        {
            "id": "spacy-wrap",
            "title": "spaCy-wrap",
            "slogan": "For Wrapping fine-tuned transformers in spaCy pipelines",
            "description": "spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing inclusion of existing models within existing workflows.",
            "github": "kennethenevoldsen/spacy-wrap",
            "pip": "spacy_wrap",
            "code_example": [
                "import spacy",
                "import spacy_wrap",
                "",
                "nlp = spacy.blank('en')",
                "config = {",
                "   'doc_extension_trf_data': 'clf_trf_data',  # document extention for the forward pass",
                "   'doc_extension_prediction': 'sentiment',  # document extention for the prediction",
                "   'labels': ['negative', 'neutral', 'positive'],",
                "   'model': {",
                "       'name': 'cardiffnlp/twitter-roberta-base-sentiment',  # the model name or path of huggingface model",
                "},",
                "}",
                "",
                "transformer = nlp.add_pipe('classification_transformer', config=config)",
                "transformer.model.initialize()",
                "",
                "doc = nlp('spaCy is a wonderful tool')",
                "",
                "print(doc._.clf_trf_data)",
                "# TransformerData(wordpieces=...",
                "print(doc._.sentiment)",
                "# 'positive'",
                "print(doc._.sentiment_prob)",
                "# {'prob': array([0.004, 0.028, 0.969], dtype=float32), 'labels': ['negative', 'neutral', 'positive']}"
            ],
            "thumb": "https://raw.githubusercontent.com/KennethEnevoldsen/spacy-wrap/main/docs/_static/icon.png",
            "author": "Kenneth Enevoldsen",
            "author_links": {
                "github": "KennethEnevoldsen",
                "website": "https://www.kennethenevoldsen.com"
            },
            "category": ["pipeline", "models", "training"],
            "tags": ["pipeline", "models", "transformers"]
        },
        {
            "id": "asent",
            "title": "Asent",
            "slogan": "Fast, flexible and transparent sentiment analysis",
            "description": "Asent is a rule-based sentiment analysis library for Python made using spaCy. It is inspired by VADER, but uses a more modular ruleset, that allows the user to change e.g. the method for finding negations. Furthermore it includes visualisers to visualize the model predictions, making the model easily interpretable.",
            "github": "kennethenevoldsen/asent",
            "pip": "asent",
            "code_example": [
                "import spacy",
                "import asent",
                "",
                "# load spacy pipeline",
                "nlp = spacy.blank('en')",
                "nlp.add_pipe('sentencizer')",
                "",
                "# add the rule-based sentiment model",
                "nlp.add_pipe('asent_en_v1')",
                "",
                "# try an example",
                "text = 'I am not very happy, but I am also not especially sad'",
                "doc = nlp(text)",
                "",
                "# print polarity of document, scaled to be between -1, and 1",
                "print(doc._.polarity)",
                "# neg=0.0 neu=0.631 pos=0.369 compound=0.7526",
                "",
                "# Naturally, a simple score can be quite unsatisfying, thus Asent implements a series of visualizer to interpret the results:",
                "asent.visualize(doc, style='prediction')",
                " # or",
                "asent.visualize(doc[:5], style='analysis')"
            ],
            "thumb": "https://github.com/KennethEnevoldsen/asent/raw/main/docs/img/logo_black_font.png?raw=true",
            "author": "Kenneth Enevoldsen",
            "author_links": {
                "github": "KennethEnevoldsen",
                "website": "https://www.kennethenevoldsen.com"
            },
            "category": ["pipeline", "models"],
            "tags": ["pipeline", "models", "sentiment"]
        },
        {
            "id": "textdescriptives",
            "title": "TextDescriptives",
            "slogan": "Extraction of descriptive stats, readability, and syntactic complexity measures",
            "description": "Pipeline component for spaCy v.3 that calculates descriptive statistics, readability metrics, and syntactic complexity (dependency distance).",
            "github": "HLasse/TextDescriptives",
            "pip": "textdescriptives",
            "code_example": [
                "import spacy",
                "import textdescriptives as td",
                "nlp = spacy.load('en_core_web_sm')",
                "nlp.add_pipe('textdescriptives')",
                "doc = nlp('This is a short test text')",
                "doc._.readability # access some of the values",
                "td.extract_df(doc) # extract all metrics to DataFrame"
            ],
            "author": "Lasse Hansen, Kenneth Enevoldsen, Ludvig Olsen",
            "author_links": {
                "github": "HLasse"
            },
            "category": ["pipeline"],
            "tags": ["pipeline", "readability", "syntactic complexity", "descriptive statistics"]
        },
        {
            "id": "neuralcoref",
            "slogan": "State-of-the-art coreference resolution based on neural nets and spaCy",
            "description": "This coreference resolution module is based on the super fast [spaCy](https://spacy.io/) parser and uses the neural net scoring model described in [Deep Reinforcement Learning for Mention-Ranking Coreference Models](http://cs.stanford.edu/people/kevclark/resources/clark-manning-emnlp2016-deep.pdf) by Kevin Clark and Christopher D. Manning, EMNLP 2016. Since ✨Neuralcoref v2.0, you can train the coreference resolution system on your own dataset — e.g., another language than English! — **provided you have an annotated dataset**. Note that to use neuralcoref with spaCy > 2.1.0, you'll have to install neuralcoref from source.",
            "github": "huggingface/neuralcoref",
            "thumb": "https://i.imgur.com/j6FO9O6.jpg",
            "code_example": [
                "import spacy",
                "import neuralcoref",
                "",
                "nlp = spacy.load('en')",
                "neuralcoref.add_to_pipe(nlp)",
                "doc1 = nlp('My sister has a dog. She loves him.')",
                "print(doc1._.coref_clusters)",
                "",
                "doc2 = nlp('Angela lives in Boston. She is quite happy in that city.')",
                "for ent in doc2.ents:",
                "    print(ent._.coref_cluster)"
            ],
            "author": "Hugging Face",
            "author_links": {
                "github": "huggingface"
            },
            "category": ["standalone", "conversational", "models"],
            "tags": ["coref"]
        },
        {
            "id": "neuralcoref-vizualizer",
            "title": "Neuralcoref Visualizer",
            "slogan": "State-of-the-art coreference resolution based on neural nets and spaCy",
            "description": "In short, coreference is the fact that two or more expressions in a text – like pronouns or nouns – link to the same person or thing. It is a classical Natural language processing task, that has seen a revival of interest in the past two years as several research groups applied cutting-edge deep-learning and reinforcement-learning techniques to it. It is also one of the key building blocks to building conversational Artificial intelligences.",
            "url": "https://huggingface.co/coref/",
            "image": "https://i.imgur.com/3yy4Qyf.png",
            "thumb": "https://i.imgur.com/j6FO9O6.jpg",
            "github": "huggingface/neuralcoref",
            "category": ["visualizers", "conversational"],
            "tags": ["coref", "chatbots"],
            "author": "Hugging Face",
            "author_links": {
                "github": "huggingface"
            }
        },
        {
            "id": "matcher-explorer",
            "title": "Rule-based Matcher Explorer",
            "slogan": "Test spaCy's rule-based Matcher by creating token patterns interactively",
            "description": "Test spaCy's rule-based `Matcher` by creating token patterns interactively and running them over your text. Each token can set multiple attributes like text value, part-of-speech tag or boolean flags. The token-based view lets you explore how spaCy processes your text – and why your pattern matches, or why it doesn't. For more details on rule-based matching, see the [documentation](https://spacy.io/usage/rule-based-matching).",
            "image": "https://explosion.ai/assets/img/demos/matcher.png",
            "thumb": "https://i.imgur.com/rPK4AGt.jpg",
            "url": "https://explosion.ai/demos/matcher",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["visualizers"]
        },
        {
            "id": "displacy",
            "title": "displaCy",
            "slogan": "A modern syntactic dependency visualizer",
            "description": "Visualize spaCy's guess at the syntactic structure of a sentence. Arrows point from children to heads, and are labelled by their relation type.",
            "url": "https://explosion.ai/demos/displacy",
            "thumb": "https://i.imgur.com/nxDcHaL.jpg",
            "image": "https://explosion.ai/assets/img/demos/displacy.png",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["visualizers"]
        },
        {
            "id": "displacy-ent",
            "title": "displaCy ENT",
            "slogan": "A modern named entity visualizer",
            "description": "Visualize spaCy's guess at the named entities in the document. You can filter the displayed types, to only show the annotations you're interested in.",
            "url": "https://explosion.ai/demos/displacy-ent",
            "thumb": "https://i.imgur.com/A77Ecbs.jpg",
            "image": "https://explosion.ai/assets/img/demos/displacy-ent.png",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["visualizers"]
        },
        {
            "id": "explacy",
            "slogan": "A small tool that explains spaCy parse results",
            "github": "tylerneylon/explacy",
            "thumb": "https://i.imgur.com/V1hCWmn.jpg",
            "image": "https://raw.githubusercontent.com/tylerneylon/explacy/master/img/screenshot.png",
            "code_example": [
                "import spacy",
                "import explacy",
                "",
                "nlp = spacy.load('en')",
                "explacy.print_parse_info(nlp, 'The salad was surprisingly tasty.')"
            ],
            "author": "Tyler Neylon",
            "author_links": {
                "github": "tylerneylon"
            },
            "category": ["visualizers"]
        },
        {
            "id": "deplacy",
            "slogan": "CUI-based Tree Visualizer for Universal Dependencies and Immediate Catena Analysis",
            "description": "Simple dependency visualizer for [spaCy](https://spacy.io/), [UniDic2UD](https://pypi.org/project/unidic2ud), [Stanza](https://stanfordnlp.github.io/stanza/), [NLP-Cube](https://github.com/Adobe/NLP-Cube), [Trankit](https://github.com/nlp-uoregon/trankit), etc.",
            "github": "KoichiYasuoka/deplacy",
            "image": "https://i.imgur.com/6uOI4Op.png",
            "code_example": [
                "import spacy",
                "import deplacy",
                "",
                "nlp=spacy.load('en_core_web_sm')",
                "doc=nlp('I saw a horse yesterday which had no name.')",
                "deplacy.render(doc)"
            ],
            "author": "Koichi Yasuoka",
            "author_links": {
                "github": "KoichiYasuoka"
            },
            "category": ["visualizers"]
        },
        {
            "id": "scattertext",
            "slogan": "Beautiful visualizations of how language differs among document types",
            "description": "A tool for finding distinguishing terms in small-to-medium-sized corpora, and presenting them in a sexy, interactive scatter plot with non-overlapping term labels. Exploratory data analysis just got more fun.",
            "github": "JasonKessler/scattertext",
            "image": "https://jasonkessler.github.io/2012conventions0.0.2.2.png",
            "code_example": [
                "import spacy",
                "",
                "from scattertext import SampleCorpora, produce_scattertext_explorer",
                "from scattertext import produce_scattertext_html",
                "from scattertext.CorpusFromPandas import CorpusFromPandas",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "convention_df = SampleCorpora.ConventionData2012.get_data()",
                "corpus = CorpusFromPandas(convention_df,",
                "                          category_col='party',",
                "                          text_col='text',",
                "                          nlp=nlp).build()",
                "",
                "html = produce_scattertext_html(corpus,",
                "                                    category='democrat',",
                "                                    category_name='Democratic',",
                "                                    not_category_name='Republican',",
                "                                    minimum_term_frequency=5,",
                "                                    width_in_pixels=1000)",
                "open('./simple.html', 'wb').write(html.encode('utf-8'))",
                "print('Open ./simple.html in Chrome or Firefox.')"
            ],
            "author": "Jason Kessler",
            "author_links": {
                "github": "JasonKessler",
                "twitter": "jasonkessler"
            },
            "category": ["visualizers"]
        },
        {
            "id": "rasa",
            "title": "Rasa",
            "slogan": "Turn natural language into structured data",
            "description": "Machine learning tools for developers to build, improve, and deploy contextual chatbots and assistants. Powered by open source.",
            "github": "RasaHQ/rasa",
            "pip": "rasa",
            "thumb": "https://i.imgur.com/TyZnpwL.png",
            "url": "https://rasa.com/",
            "author": "Rasa",
            "author_links": {
                "github": "RasaHQ"
            },
            "category": ["conversational"],
            "tags": ["chatbots"]
        },
        {
            "id": "mindmeld",
            "title": "MindMeld - Conversational AI platform",
            "slogan": "Conversational AI platform for deep-domain voice interfaces and chatbots",
            "description": "The MindMeld Conversational AI platform is among the most advanced AI platforms for building production-quality conversational applications. It is a Python-based machine learning framework which encompasses all of the algorithms and utilities required for this purpose. (https://github.com/cisco/mindmeld)",
            "github": "cisco/mindmeld",
            "pip": "mindmeld",
            "thumb": "https://www.mindmeld.com/img/mindmeld-logo.png",
            "category": ["conversational", "ner"],
            "tags": ["chatbots"],
            "author": "Cisco",
            "author_links": {
                "github": "cisco/mindmeld",
                "website": "https://www.mindmeld.com/"
            }
        },
        {
            "id": "torchtext",
            "title": "torchtext",
            "slogan": "Data loaders and abstractions for text and NLP",
            "github": "pytorch/text",
            "pip": "torchtext",
            "thumb": "https://i.imgur.com/WFkxuPo.png",
            "code_example": [
                ">>> pos = data.TabularDataset(",
                "...    path='data/pos/pos_wsj_train.tsv', format='tsv',",
                "...    fields=[('text', data.Field()),",
                "...            ('labels', data.Field())])",
                "...",
                ">>> sentiment = data.TabularDataset(",
                "...    path='data/sentiment/train.json', format='json',",
                "...    fields={'sentence_tokenized': ('text', data.Field(sequential=True)),",
                "...            'sentiment_gold': ('labels', data.Field(sequential=False))})"
            ],
            "category": ["standalone", "research"],
            "tags": ["pytorch"]
        },
        {
            "id": "allennlp",
            "title": "AllenNLP",
            "slogan": "An open-source NLP research library, built on PyTorch and spaCy",
            "description": "AllenNLP is a new library designed to accelerate NLP research, by providing a framework that supports modern deep learning workflows for cutting-edge language understanding problems. AllenNLP uses spaCy as a preprocessing component. You can also use Allen NLP to develop spaCy pipeline components, to add annotations to the `Doc` object.",
            "github": "allenai/allennlp",
            "pip": "allennlp",
            "thumb": "https://i.imgur.com/U8opuDN.jpg",
            "url": "http://allennlp.org",
            "author": " Allen Institute for Artificial Intelligence",
            "author_links": {
                "github": "allenai",
                "twitter": "allenai_org",
                "website": "http://allenai.org"
            },
            "category": ["standalone", "research"]
        },
        {
            "id": "scispacy",
            "title": "scispaCy",
            "slogan": "A full spaCy pipeline and models for scientific/biomedical documents",
            "github": "allenai/scispacy",
            "pip": "scispacy",
            "thumb": "https://i.imgur.com/dJQSclW.png",
            "url": "https://allenai.github.io/scispacy/",
            "author": " Allen Institute for Artificial Intelligence",
            "author_links": {
                "github": "allenai",
                "twitter": "allenai_org",
                "website": "http://allenai.org"
            },
            "category": ["scientific", "models", "research", "biomedical"]
        },
        {
            "id": "textacy",
            "slogan": "NLP, before and after spaCy",
            "description": "`textacy` is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance `spacy` library. With the fundamentals – tokenization, part-of-speech tagging, dependency parsing, etc. – delegated to another library, `textacy` focuses on the tasks that come before and follow after.",
            "github": "chartbeat-labs/textacy",
            "pip": "textacy",
            "url": "https://github.com/chartbeat-labs/textacy",
            "author": "Burton DeWilde",
            "author_links": {
                "github": "bdewilde",
                "twitter": "bjdewilde"
            },
            "category": ["standalone"]
        },
        {
            "id": "textpipe",
            "slogan": "clean and extract metadata from text",
            "description": "`textpipe` is a Python package for converting raw text in to clean, readable text and extracting metadata from that text. Its functionalities include transforming raw text into readable text by removing HTML tags and extracting metadata such as the number of words and named entities from the text.",
            "github": "textpipe/textpipe",
            "pip": "textpipe",
            "author": "Textpipe Contributors",
            "author_links": {
                "github": "textpipe",
                "website": "https://github.com/textpipe/textpipe/blob/master/CONTRIBUTORS.md"
            },
            "category": ["standalone"],
            "tags": ["text-processing", "named-entity-recognition"],
            "thumb": "https://avatars0.githubusercontent.com/u/40492530",
            "code_example": [
                "from textpipe import doc, pipeline",
                "sample_text = 'Sample text! <!DOCTYPE>'",
                "document = doc.Doc(sample_text)",
                "print(document.clean)",
                "'Sample text!'",
                "print(document.language)",
                "# 'en'",
                "print(document.nwords)",
                "# 2",
                "",
                "pipe = pipeline.Pipeline(['CleanText', 'NWords'])",
                "print(pipe(sample_text))",
                "# {'CleanText': 'Sample text!', 'NWords': 2}"
            ]
        },
        {
            "id": "mordecai",
            "slogan": "Full text geoparsing using spaCy, Geonames and Keras",
            "description": "Extract the place names from a piece of text, resolve them to the correct place, and return their coordinates and structured geographic information.",
            "github": "openeventdata/mordecai",
            "pip": "mordecai",
            "thumb": "https://i.imgur.com/gPJ9upa.jpg",
            "code_example": [
                "from mordecai import Geoparser",
                "geo = Geoparser()",
                "geo.geoparse(\"I traveled from Oxford to Ottawa.\")"
            ],
            "author": "Andy Halterman",
            "author_links": {
                "github": "ahalterman",
                "twitter": "ahalterman"
            },
            "category": ["standalone", "scientific"]
        },
        {
            "id": "kindred",
            "title": "Kindred",
            "slogan": "Biomedical relation extraction using spaCy",
            "description": "Kindred is a package for relation extraction in biomedical texts. Given some training data, it can build a model to identify relations between entities (e.g. drugs, genes, etc) in a sentence.",
            "github": "jakelever/kindred",
            "pip": "kindred",
            "code_example": [
                "import kindred",
                "",
                "trainCorpus = kindred.bionlpst.load('2016-BB3-event-train')",
                "devCorpus = kindred.bionlpst.load('2016-BB3-event-dev')",
                "predictionCorpus = devCorpus.clone()",
                "predictionCorpus.removeRelations()",
                "classifier = kindred.RelationClassifier()",
                "classifier.train(trainCorpus)",
                "classifier.predict(predictionCorpus)",
                "f1score = kindred.evaluate(devCorpus, predictionCorpus, metric='f1score')"
            ],
            "author": "Jake Lever",
            "author_links": {
                "github": "jakelever"
            },
            "category": ["standalone", "scientific"]
        },
        {
            "id": "sense2vec",
            "slogan": "Use NLP to go beyond vanilla word2vec",
            "description": "sense2vec ([Trask et. al](https://arxiv.org/abs/1511.06388), 2015) is a nice twist on [word2vec](https://en.wikipedia.org/wiki/Word2vec) that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our [sense2vec demo](https://explosion.ai/demos/sense2vec) that lets you explore semantic similarities across all Reddit comments of 2015.",
            "github": "explosion/sense2vec",
            "pip": "sense2vec==1.0.0a1",
            "thumb": "https://i.imgur.com/awfdhX6.jpg",
            "image": "https://explosion.ai/assets/img/demos/sense2vec.png",
            "url": "https://explosion.ai/demos/sense2vec",
            "code_example": [
                "import spacy",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "s2v = nlp.add_pipe(\"sense2vec\")",
                "s2v.from_disk(\"/path/to/s2v_reddit_2015_md\")",
                "",
                "doc = nlp(\"A sentence about natural language processing.\")",
                "assert doc[3:6].text == \"natural language processing\"",
                "freq = doc[3:6]._.s2v_freq",
                "vector = doc[3:6]._.s2v_vec",
                "most_similar = doc[3:6]._.s2v_most_similar(3)",
                "# [(('machine learning', 'NOUN'), 0.8986967),",
                "#  (('computer vision', 'NOUN'), 0.8636297),",
                "#  (('deep learning', 'NOUN'), 0.8573361)]"
            ],
            "category": ["pipeline", "standalone", "visualizers"],
            "tags": ["vectors"],
            "author": "Explosion",
            "author_links": {
                "twitter": "explosion_ai",
                "github": "explosion",
                "website": "https://explosion.ai"
            }
        },
        {
            "id": "spacyr",
            "slogan": "An R wrapper for spaCy",
            "github": "quanteda/spacyr",
            "cran": "spacyr",
            "code_example": [
                "library(\"spacyr\")",
                "spacy_initialize()",
                "",
                "txt <- c(d1 = \"spaCy excels at large-scale information extraction tasks.\",",
                "         d2 = \"Mr. Smith goes to North Carolina.\")",
                "",
                "# process documents and obtain a data.table",
                "parsedtxt <- spacy_parse(txt)"
            ],
            "code_language": "r",
            "author": "Kenneth Benoit & Aki Matsuo",
            "category": ["nonpython"]
        },
        {
            "id": "cleannlp",
            "title": "CleanNLP",
            "slogan": "A tidy data model for NLP in R",
            "description": "The cleanNLP package is designed to make it as painless as possible to turn raw text into feature-rich data frames. the package offers four backends that can be used for parsing text: `tokenizers`, `udpipe`, `spacy` and `corenlp`.",
            "github": "statsmaths/cleanNLP",
            "cran": "cleanNLP",
            "author": "Taylor B. Arnold",
            "author_links": {
                "github": "statsmaths"
            },
            "category": ["nonpython"]
        },
        {
            "id": "spacy-cpp",
            "slogan": "C++ wrapper library for spaCy",
            "description": "The goal of spacy-cpp is to expose the functionality of spaCy to C++ applications, and to provide an API that is similar to that of spaCy, enabling rapid development in Python and simple porting to C++.",
            "github": "d99kris/spacy-cpp",
            "code_example": [
                "Spacy::Spacy spacy;",
                "auto nlp = spacy.load(\"en_core_web_sm\");",
                "auto doc = nlp.parse(\"This is a sentence.\");",
                "for (auto& token : doc.tokens())",
                "    std::cout << token.text() << \" [\" << token.pos_() << \"]\\n\";"
            ],
            "code_language": "cpp",
            "author": "Kristofer Berggren",
            "author_links": {
                "github": "d99kris"
            },
            "category": ["nonpython"]
        },
        {
            "id": "ruby-spacy",
            "title": "ruby-spacy",
            "slogan": "Wrapper module for using spaCy from Ruby via PyCall",
            "description": "ruby-spacy is a wrapper module for using spaCy from the Ruby programming language via PyCall. This module aims to make it easy and natural for Ruby programmers to use spaCy.",
            "github": "yohasebe/ruby-spacy",
            "code_example": [
                "require \"ruby-spacy\"",
                "require \"terminal-table\"",
                "nlp = Spacy::Language.new(\"en_core_web_sm\")",
                "doc = nlp.read(\"Apple is looking at buying U.K. startup for $1 billion\")",
                "headings = [\"text\", \"lemma\", \"pos\", \"tag\", \"dep\"]",
                "rows = []",
                "doc.each do |token|",
                "  rows << [token.text, token.lemma, token.pos, token.tag, token.dep]",
                "end",
                "table = Terminal::Table.new rows: rows, headings: headings",
                "puts table"
            ],
            "code_language": "ruby",
            "url": "https://rubygems.org/gems/ruby-spacy",
            "author": "Yoichiro Hasebe",
            "author_links": {
                "github": "yohasebe",
                "twitter": "yohasebe"
            },
            "category": ["nonpython"],
            "tags": ["ruby"]
        },
        {
            "id": "spacy_api",
            "slogan": "Server/client to load models in a separate, dedicated process",
            "github": "kootenpv/spacy_api",
            "pip": "spacy_api",
            "code_example": [
                "from spacy_api import Client",
                "",
                "spacy_client = Client() # default args host/port",
                "doc = spacy_client.single(\"How are you\")"
            ],
            "author": "Pascal van Kooten",
            "author_links": {
                "github": "kootenpv"
            },
            "category": ["apis"]
        },
        {
            "id": "spacy-api-docker",
            "slogan": "spaCy REST API, wrapped in a Docker container",
            "github": "jgontrum/spacy-api-docker",
            "url": "https://hub.docker.com/r/jgontrum/spacyapi/",
            "thumb": "https://i.imgur.com/NRnDKyj.jpg",
            "code_example": [
                "version: '2'",
                "",
                "services:",
                "  spacyapi:",
                "    image: jgontrum/spacyapi:en_v2",
                "    ports:",
                "      - \"127.0.0.1:8080:80\"",
                "    restart: always"
            ],
            "code_language": "docker",
            "author": "Johannes Gontrum",
            "author_links": {
                "github": "jgontrum"
            },
            "category": ["apis"]
        },
        {
            "id": "spacy-nlp",
            "slogan": " Expose spaCy NLP text parsing to Node.js (and other languages) via Socket.IO",
            "github": "kengz/spacy-nlp",
            "thumb": "https://i.imgur.com/w41VSr7.jpg",
            "code_example": [
                "const spacyNLP = require(\"spacy-nlp\")",
                "// default port 6466",
                "// start the server with the python client that exposes spacyIO (or use an existing socketIO server at IOPORT)",
                "var serverPromise = spacyNLP.server({ port: process.env.IOPORT });",
                "// Loading spacy may take up to 15s"
            ],
            "code_language": "javascript",
            "author": "Wah Loon Keng",
            "author_links": {
                "github": "kengz"
            },
            "category": ["apis", "nonpython"]
        },
        {
            "id": "prodigy",
            "title": "Prodigy",
            "slogan": "Radically efficient machine teaching, powered by active learning",
            "description": "Prodigy is an annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster. Stream in your own examples or real-world data from live APIs, update your model in real-time and chain models together to build more complex systems.",
            "thumb": "https://i.imgur.com/UVRtP6g.jpg",
            "image": "https://i.imgur.com/Dt5vrY6.png",
            "url": "https://prodi.gy",
            "code_example": [
                "prodigy dataset ner_product \"Improve PRODUCT on Reddit data\"",
                "✨ Created dataset 'ner_product'.",
                "",
                "prodigy ner.teach ner_product en_core_web_sm ~/data.jsonl --label PRODUCT",
                "✨ Starting the web server on port 8080..."
            ],
            "code_language": "bash",
            "category": ["standalone", "training"],
            "author": "Explosion",
            "author_links": {
                "twitter": "explosion_ai",
                "github": "explosion",
                "website": "https://explosion.ai"
            }
        },
        {
            "id": "dragonfire",
            "title": "Dragonfire",
            "slogan": "An open-source virtual assistant for Ubuntu based Linux distributions",
            "github": "DragonComputer/Dragonfire",
            "thumb": "https://i.imgur.com/5fqguKS.jpg",
            "image": "https://raw.githubusercontent.com/DragonComputer/Dragonfire/master/docs/img/demo.gif",
            "author": "Dragon Computer",
            "author_links": {
                "github": "DragonComputer",
                "website": "http://dragon.computer"
            },
            "category": ["standalone"]
        },
        {
            "id": "prefect",
            "title": "Prefect",
            "slogan": "Workflow management system designed for modern infrastructure",
            "github": "PrefectHQ/prefect",
            "pip": "prefect",
            "thumb": "https://i.imgur.com/oLTwr0e.png",
            "code_example": [
                "from prefect import Flow",
                "from prefect.tasks.spacy.spacy_tasks import SpacyNLP",
                "import spacy",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "",
                "with Flow(\"Natural Language Processing\") as flow:",
                "    doc = SpacyNLP(text=\"This is some text\", nlp=nlp)",
                "",
                "flow.run()"
            ],
            "author": "Prefect",
            "author_links": {
                "website": "https://prefect.io"
            },
            "category": ["standalone"]
        },
        {
            "id": "graphbrain",
            "title": "Graphbrain",
            "slogan": "Automated meaning extraction and text understanding",
            "description": "Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is to facilitate automated meaning extraction and text understanding, as well as the exploration and inference of knowledge.",
            "github": "graphbrain/graphbrain",
            "pip": "graphbrain",
            "thumb": "https://i.imgur.com/cct9W1E.png",
            "author": "Graphbrain",
            "category": ["standalone"]
        },
        {
            "type": "education",
            "id": "nostarch-nlp-python",
            "title": "Natural Language Processing Using Python",
            "slogan": "No Starch Press, 2020",
            "description": "Natural Language Processing Using Python is an introduction to natural language processing (NLP), the task of converting human language into data that a computer can process. The book uses spaCy, a leading Python library for NLP, to guide readers through common NLP tasks related to generating and understanding human language with code. It addresses problems like understanding a user's intent, continuing a conversation with a human, and maintaining the state of a conversation.",
            "cover": "https://i.imgur.com/w0iycjl.jpg",
            "url": "https://nostarch.com/NLPPython",
            "author": "Yuli Vasiliev",
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "oreilly-python-ds",
            "title": "Introduction to Machine Learning with Python: A Guide for Data Scientists",
            "slogan": "O'Reilly, 2016",
            "description": "Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.",
            "cover": "https://covers.oreillystatic.com/images/0636920030515/lrg.jpg",
            "url": "http://shop.oreilly.com/product/0636920030515.do",
            "author": "Andreas Müller, Sarah Guido",
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "text-analytics-python",
            "title": "Text Analytics with Python",
            "slogan": "Apress / Springer, 2016",
            "description": "*Text Analytics with Python* teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.",
            "github": "dipanjanS/text-analytics-with-python",
            "cover": "https://i.imgur.com/AOmzZu8.png",
            "url": "https://www.amazon.com/Text-Analytics-Python-Real-World-Actionable/dp/148422387X",
            "author": "Dipanjan Sarkar",
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "practical-ml-python",
            "title": "Practical Machine Learning with Python",
            "slogan": "Apress, 2017",
            "description": "Master the essential skills needed to recognize and solve complex problems with machine learning and deep learning. Using real-world examples that leverage the popular Python machine learning ecosystem, this book is your perfect companion for learning the art and science of machine learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute machine learning systems and projects successfully.",
            "github": "dipanjanS/practical-machine-learning-with-python",
            "cover": "https://i.imgur.com/5F4mkt7.jpg",
            "url": "https://www.amazon.com/Practical-Machine-Learning-Python-Problem-Solvers/dp/1484232062",
            "author": "Dipanjan Sarkar, Raghav Bali, Tushar Sharma",
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "packt-nlp-computational-linguistics",
            "title": "Natural Language Processing and Computational Linguistics",
            "slogan": "Packt, 2018",
            "description": "This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.",
            "cover": "https://i.imgur.com/aleMf1Y.jpg",
            "url": "https://www.amazon.com/Natural-Language-Processing-Computational-Linguistics-ebook/dp/B07BWH779J",
            "author": "Bhargav Srinivasa-Desikan",
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "mastering-spacy",
            "title": "Mastering spaCy",
            "slogan": "Packt, 2021",
            "description": "This is your ultimate spaCy book. Master the crucial skills to use spaCy components effectively to create real-world NLP applications with spaCy. Explaining linguistic concepts such as dependency parsing, POS-tagging and named entity extraction with many examples, this book will help you to conquer computational linguistics with spaCy. The book further focuses on ML topics with Keras and Tensorflow. You'll cover popular topics, including intent recognition, sentiment analysis and context resolution; and use them on popular datasets and interpret the results. A special hands-on section on chatbot design is included.",
            "github": "PacktPublishing/Mastering-spaCy",
            "cover": "https://tinyimg.io/i/aWEm0dh.jpeg",
            "url": "https://www.amazon.com/Mastering-spaCy-end-end-implementing/dp/1800563353",
            "author": "Duygu Altinok",
            "author_links": {
                "github": "DuyguA",
                "website": "https://www.linkedin.com/in/duygu-altinok-4021389a"
            },
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "applied-nlp-in-enterprise",
            "title": "Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand",
            "slogan": "O'Reilly, 2021",
            "description": "Natural language processing (NLP) is one of the hottest topics in AI today. Having lagged behind other deep learning fields such as computer vision for years, NLP only recently gained mainstream popularity. Even though Google, Facebook, and OpenAI have open sourced large pretrained language models to make NLP easier, many organizations today still struggle with developing and productionizing NLP applications. This hands-on guide helps you learn the field quickly.",
            "github": "nlpbook/nlpbook",
            "cover": "https://i.imgur.com/6RxLBvf.jpg",
            "url": "https://www.amazon.com/dp/149206257X",
            "author": "Ankur A. Patel",
            "author_links": {
                "github": "aapatel09",
                "website": "https://www.ankurapatel.io"
            },
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "introduction-into-spacy-3",
            "title": "Introduction to spaCy 3",
            "slogan": "A free course for beginners by Dr. W.J.B. Mattingly",
            "url": "http://spacy.pythonhumanities.com/",
            "thumb": "https://spacy.pythonhumanities.com/_static/freecodecamp_small.jpg",
            "author": "Dr. W.J.B. Mattingly",
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "spacy-course",
            "title": "Advanced NLP with spaCy",
            "slogan": "A free online course",
            "description": "In this free interactive course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
            "url": "https://course.spacy.io",
            "image": "https://i.imgur.com/JC00pHW.jpg",
            "thumb": "https://i.imgur.com/5RXLtrr.jpg",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "applt-course",
            "title": "Applied Language Technology",
            "slogan": "NLP for newcomers using spaCy and Stanza",
            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
            "url": "https://applied-language-technology.mooc.fi",
            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
            "thumb": "https://www.mv.helsinki.fi/home/thiippal/images/applt-logo.png",
            "author": "Tuomo Hiippala",
            "author_links": {
                "twitter": "tuomo_h",
                "github": "thiippal",
                "website": "https://www.mv.helsinki.fi/home/thiippal/"
            },
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "video-spacys-ner-model",
            "title": "spaCy's NER model",
            "slogan": "Incremental parsing with bloom embeddings and residual CNNs",
            "description": "spaCy v2.0's Named Entity Recognition system features a sophisticated word embedding strategy using subword features and \"Bloom\" embeddings, a deep convolutional neural network with residual connections, and a novel transition-based approach to named entity parsing. The system is designed to give a good balance of efficiency, accuracy and adaptability. In this talk, I sketch out the components of the system, explaining the intuition behind the various choices. I also give a brief introduction to the named entity recognition problem, with an overview of what else Explosion AI is working on, and why.",
            "youtube": "sqDHBH9IjRU",
            "author": "Matthew Honnibal",
            "author_links": {
                "twitter": "honnibal",
                "github": "honnibal",
                "website": "https://explosion.ai"
            },
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-new-nlp-solutions",
            "title": "Building new NLP solutions with spaCy and Prodigy",
            "slogan": "PyData Berlin 2018",
            "description": "In this talk, I will discuss how to address some of the most likely causes of failure for new Natural Language Processing (NLP) projects. My main recommendation is to take an iterative approach: don't assume you know what your pipeline should look like, let alone your annotation schemes or model architectures.",
            "author": "Matthew Honnibal",
            "author_links": {
                "twitter": "honnibal",
                "github": "honnibal",
                "website": "https://explosion.ai"
            },
            "youtube": "jpWqz85F_4Y",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-modern-nlp-in-python",
            "title": "Modern NLP in Python",
            "slogan": "PyData DC 2016",
            "description": "Academic and industry research in Natural Language Processing (NLP) has progressed at an accelerating pace over the last several years. Members of the Python community have been hard at work moving cutting-edge research out of papers and into open source, \"batteries included\" software libraries that can be applied to practical problems. We'll explore some of these tools for modern NLP in Python.",
            "author": "Patrick Harrison",
            "youtube": "6zm9NC9uRkk",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-spacy-course",
            "title": "Advanced NLP with spaCy · A free online course",
            "description": "spaCy is a modern Python library for industrial-strength Natural Language Processing. In this free and interactive online course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
            "url": "https://course.spacy.io/en",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines"
            },
            "youtube": "THduWAnG97k",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-spacy-course-de",
            "title": "Modernes NLP mit spaCy · Ein Gratis-Onlinekurs",
            "description": "spaCy ist eine moderne Python-Bibliothek für industriestarkes Natural Language Processing. In diesem kostenlosen und interaktiven Onlinekurs lernst du, mithilfe von spaCy fortgeschrittene Systeme für die Analyse natürlicher Sprache zu entwickeln und dabei sowohl regelbasierte Verfahren, als auch moderne Machine-Learning-Technologie einzusetzen.",
            "url": "https://course.spacy.io/de",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines"
            },
            "youtube": "K1elwpgDdls",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-spacy-course-es",
            "title": "NLP avanzado con spaCy · Un curso en línea gratis",
            "description": "spaCy es un paquete moderno de Python para hacer Procesamiento de Lenguaje Natural de potencia industrial. En este curso en línea, interactivo y gratuito, aprenderás a usar spaCy para construir sistemas avanzados de comprensión de lenguaje natural usando enfoques basados en reglas y en machine learning.",
            "url": "https://course.spacy.io/es",
            "author": "Camila Gutiérrez",
            "author_links": {
                "twitter": "Mariacamilagl30"
            },
            "youtube": "RNiLVCE5d4k",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-intro-to-nlp-episode-1",
            "title": "Intro to NLP with spaCy (1)",
            "slogan": "Episode 1: Data exploration",
            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
            "author": "Vincent Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning"
            },
            "youtube": "WnGPv6HnBok",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-intro-to-nlp-episode-2",
            "title": "Intro to NLP with spaCy (2)",
            "slogan": "Episode 2: Rule-based Matching",
            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
            "author": "Vincent Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning"
            },
            "youtube": "KL4-Mpgbahw",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-intro-to-nlp-episode-3",
            "title": "Intro to NLP with spaCy (3)",
            "slogan": "Episode 2: Evaluation",
            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
            "author": "Vincent Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning"
            },
            "youtube": "4V0JDdohxAk",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-intro-to-nlp-episode-4",
            "title": "Intro to NLP with spaCy (4)",
            "slogan": "Episode 4: Named Entity Recognition",
            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
            "author": "Vincent Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning"
            },
            "youtube": "IqOJU1-_Fi0",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-intro-to-nlp-episode-5",
            "title": "Intro to NLP with spaCy (5)",
            "slogan": "Episode 5: Rules vs. Machine Learning",
            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
            "author": "Vincent Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning"
            },
            "youtube": "f4sqeLRzkPg",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-intro-to-nlp-episode-6",
            "title": "Intro to NLP with spaCy (6)",
            "slogan": "Episode 6: Moving to spaCy v3",
            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
            "author": "Vincent Warmerdam",
            "author_links": {
                "twitter": "fishnets88",
                "github": "koaning"
            },
            "youtube": "k77RrmMaKEI",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-spacy-irl-entity-linking",
            "title": "Entity Linking functionality in spaCy",
            "slogan": "spaCy IRL 2019",
            "url": "https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc",
            "author": "Sofie Van Landeghem",
            "author_links": {
                "twitter": "OxyKodit",
                "github": "svlandeg"
            },
            "youtube": "PW3RJM8tDGo",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-spacy-irl-lemmatization",
            "title": "Rethinking rule-based lemmatization",
            "slogan": "spaCy IRL 2019",
            "url": "https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc",
            "author": "Guadalupe Romero",
            "author_links": {
                "twitter": "_guadiromero",
                "github": "guadi1994"
            },
            "youtube": "88zcQODyuko",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "video-spacy-irl-scispacy",
            "title": "ScispaCy: A spaCy pipeline & models for scientific & biomedical text",
            "slogan": "spaCy IRL 2019",
            "url": "https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc",
            "author": "Mark Neumann",
            "author_links": {
                "twitter": "MarkNeumannnn",
                "github": "DeNeutoy"
            },
            "youtube": "2_HSKDALwuw",
            "category": ["videos"]
        },
        {
            "type": "education",
            "id": "podcast-nlp-highlights",
            "title": "NLP Highlights #78: Where do corpora come from?",
            "slogan": "January 2019",
            "description": "Most NLP projects rely crucially on the quality of annotations used for training and evaluating models. In this episode, Matt and Ines of Explosion AI tell us how Prodigy can improve data annotation and model development workflows. Prodigy is an annotation tool implemented as a python library, and it comes with a web application and a command line interface. A developer can define input data streams and design simple annotation interfaces. Prodigy can help break down complex annotation decisions into a series of binary decisions, and it provides easy integration with spaCy models. Developers can specify how models should be modified as new annotations come in in an active learning framework.",
            "soundcloud": "559200912",
            "thumb": "https://i.imgur.com/hOBQEzc.jpg",
            "url": "https://soundcloud.com/nlp-highlights/78-where-do-corpora-come-from-with-matt-honnibal-and-ines-montani",
            "author": "Matt Gardner, Waleed Ammar (Allen AI)",
            "author_links": {
                "website": "https://soundcloud.com/nlp-highlights"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "podcast-init",
            "title": "Podcast.__init__ #87: spaCy with Matthew Honnibal",
            "slogan": "December 2017",
            "description": "As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of spaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.",
            "iframe": "https://www.pythonpodcast.com/wp-content/plugins/podlove-podcasting-plugin-for-wordpress/lib/modules/podlove_web_player/player_v4/dist/share.html?episode=https://www.pythonpodcast.com/?podlove_player4=176",
            "iframe_height": 200,
            "thumb": "https://i.imgur.com/rpo6BuY.png",
            "url": "https://www.podcastinit.com/episode-87-spacy-with-matthew-honnibal/",
            "author": "Tobias Macey",
            "author_links": {
                "website": "https://www.podcastinit.com"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "podcast-init2",
            "title": "Podcast.__init__ #256: An Open Source Toolchain For NLP From Explosion AI",
            "slogan": "March 2020",
            "description": "The state of the art in natural language processing is a constantly moving target. With the rise of deep learning, previously cutting edge techniques have given way to robust language models. Through it all the team at Explosion AI have built a strong presence with the trifecta of spaCy, Thinc, and Prodigy to support fast and flexible data labeling to feed deep learning models and performant and scalable text processing. In this episode founder and open source author Matthew Honnibal shares his experience growing a business around cutting edge open source libraries for the machine learning developent process.",
            "iframe": "https://cdn.podlove.org/web-player/share.html?episode=https%3A%2F%2Fwww.pythonpodcast.com%2F%3Fpodlove_player4%3D614",
            "iframe_height": 200,
            "thumb": "https://i.imgur.com/rpo6BuY.png",
            "url": "https://www.pythonpodcast.com/explosion-ai-natural-language-processing-episode-256/",
            "author": "Tobias Macey",
            "author_links": {
                "website": "https://www.podcastinit.com"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "talk-python-podcast",
            "title": "Talk Python #202: Building a software business",
            "slogan": "March 2019",
            "description": "One core question around open source is how do you fund it? Well, there is always that PayPal donate button. But that's been a tremendous failure for many projects. Often the go-to answer is consulting. But what if you don't want to trade time for money? You could take things up a notch and change the equation, exchanging value for money. That's what Ines Montani and her co-founder did when they started Explosion AI with spaCy as the foundation.",
            "thumb": "https://i.imgur.com/q1twuK8.png",
            "url": "https://talkpython.fm/episodes/show/202/building-a-software-business",
            "soundcloud": "588364857",
            "author": "Michael Kennedy",
            "author_links": {
                "website": "https://talkpython.fm/"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "twimlai-podcast",
            "title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
            "slogan": "May 2019",
            "description": "\"Ines and I caught up to discuss her various projects, including the aforementioned spaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the spaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
            "thumb": "https://i.imgur.com/ng2F5gK.png",
            "url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
            "iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
            "iframe_height": 90,
            "author": "Sam Charrington",
            "author_links": {
                "website": "https://twimlai.com"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "analytics-vidhya",
            "title": "DataHack Radio #23: The Brains behind spaCy",
            "slogan": "June 2019",
            "description": "\"What would you do if you had the chance to pick the brains behind one of the most popular Natural Language Processing (NLP) libraries of our era? A library that has helped usher in the current boom in NLP applications and nurtured tons of NLP scientists? Well – you invite the creators on our popular DataHack Radio podcast and let them do the talking! We are delighted to welcome Ines Montani and Matt Honnibal, the developers of spaCy – a powerful and advanced library for NLP.\"",
            "thumb": "https://i.imgur.com/3zJKZ1P.jpg",
            "url": "https://www.analyticsvidhya.com/blog/2019/06/datahack-radio-ines-montani-matthew-honnibal-brains-behind-spacy/",
            "soundcloud": "630741825",
            "author": "Analytics Vidhya",
            "author_links": {
                "website": "https://www.analyticsvidhya.com",
                "twitter": "analyticsvidhya"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "practical-ai-podcast",
            "title": "Practical AI: Modern NLP with spaCy",
            "slogan": "December 2019",
            "description": "\"spaCy is awesome for NLP! It’s easy to use, has widespread adoption, is open source, and integrates the latest language models. Ines Montani and Matthew Honnibal (core developers of spaCy and co-founders of Explosion) join us to discuss the history of the project, its capabilities, and the latest trends in NLP. We also dig into the practicalities of taking NLP workflows to production. You don’t want to miss this episode!\"",
            "thumb": "https://i.imgur.com/jn8Bcdw.png",
            "url": "https://changelog.com/practicalai/68",
            "author": "Daniel Whitenack & Chris Benson",
            "author_links": {
                "website": "https://changelog.com/practicalai",
                "twitter": "PracticalAIFM"
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "video-entity-linking",
            "title": "Training a custom entity linking mode with spaCy",
            "author": "Sofie Van Landeghem",
            "author_links": {
                "twitter": "OxyKodit",
                "github": "svlandeg"
            },
            "youtube": "8u57WSXVpmw",
            "category": ["videos"]
        },
        {
            "id": "self-attentive-parser",
            "title": "Berkeley Neural Parser",
            "slogan": "Constituency Parsing with a Self-Attentive Encoder (ACL 2018)",
            "description": "A Python implementation of the parsers described in *\"Constituency Parsing with a Self-Attentive Encoder\"* from ACL 2018.",
            "url": "https://arxiv.org/abs/1805.01052",
            "github": "nikitakit/self-attentive-parser",
            "pip": "benepar",
            "code_example": [
                "import benepar, spacy",
                "nlp = spacy.load('en_core_web_md')",
                "nlp.add_pipe('benepar', config={'model': 'benepar_en3'})",
                "doc = nlp('The time for action is now. It is never too late to do something.')",
                "sent = list(doc.sents)[0]",
                "print(sent._.parse_string)",
                "# (S (NP (NP (DT The) (NN time)) (PP (IN for) (NP (NN action)))) (VP (VBZ is) (ADVP (RB now))) (. .))",
                "print(sent._.labels)",
                "# ('S',)",
                "print(list(sent._.children)[0])",
                "# The time for action"
            ],
            "author": "Nikita Kitaev",
            "author_links": {
                "github": "nikitakit",
                "website": "http://kitaev.io"
            },
            "category": ["research", "pipeline"]
        },
        {
            "id": "spacy-graphql",
            "title": "spacy-graphql",
            "slogan": "Query spaCy's linguistic annotations using GraphQL",
            "github": "ines/spacy-graphql",
            "description": "A very simple and experimental app that lets you query spaCy's linguistic annotations using [GraphQL](https://graphql.org/). The API currently supports most token attributes, named entities, sentences and text categories (if available as `doc.cats`, i.e. if you added a text classifier to a model). The `meta` field will return the model meta data. Models are only loaded once and kept in memory.",
            "url": "https://explosion.ai/demos/spacy-graphql",
            "category": ["apis"],
            "tags": ["graphql"],
            "thumb": "https://i.imgur.com/xC7zpTO.png",
            "code_example": [
                "{",
                "  nlp(text: \"Zuckerberg is the CEO of Facebook.\", model: \"en_core_web_sm\") {",
                "    meta {",
                "      lang",
                "      description",
                "    }",
                "    doc {",
                "      text",
                "      tokens {",
                "        text",
                "        pos_",
                "      }",
                "      ents {",
                "        text",
                "        label_",
                "      }",
                "    }",
                "  }",
                "}"
            ],
            "code_language": "json",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            }
        },
        {
            "id": "spacy-js",
            "title": "spacy-js",
            "slogan": "JavaScript API for spaCy with Python REST API",
            "github": "ines/spacy-js",
            "description": "JavaScript interface for accessing linguistic annotations provided by spaCy. This project is mostly experimental and was developed for fun to play around with different ways of mimicking spaCy's Python API.\n\nThe results will still be computed in Python and made available via a REST API. The JavaScript API resembles spaCy's Python API as closely as possible (with a few exceptions, as the values are all pre-computed and it's tricky to express complex recursive relationships).",
            "code_language": "javascript",
            "code_example": [
                "const spacy = require('spacy');",
                "",
                "(async function() {",
                "    const nlp = spacy.load('en_core_web_sm');",
                "    const doc = await nlp('This is a text about Facebook.');",
                "    for (let ent of doc.ents) {",
                "        console.log(ent.text, ent.label);",
                "    }",
                "    for (let token of doc) {",
                "        console.log(token.text, token.pos, token.head.text);",
                "    }",
                "})();"
            ],
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["nonpython"],
            "tags": ["javascript"]
        },
        {
            "id": "spacy-wordnet",
            "title": "spacy-wordnet",
            "slogan": "WordNet meets spaCy",
            "description": "`spacy-wordnet` creates annotations that easily allow the use of WordNet and [WordNet Domains](http://wndomains.fbk.eu/) by using the [NLTK WordNet interface](http://www.nltk.org/howto/wordnet.html)",
            "github": "recognai/spacy-wordnet",
            "tags": ["wordnet", "synsets"],
            "thumb": "https://i.imgur.com/ud4C7cj.png",
            "code_example": [
                "import spacy",
                "from spacy_wordnet.wordnet_annotator import WordnetAnnotator ",
                "",
                "# Load a spaCy model (supported languages are \"es\" and \"en\") ",
                "nlp = spacy.load('en_core_web_sm')",
                "# spaCy 3.x",
                "nlp.add_pipe(\"spacy_wordnet\", after='tagger')",
                "# spaCy 2.x",
                "# nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')",
                "token = nlp('prices')[0]",
                "",
                "# WordNet object links spaCy token with NLTK WordNet interface by giving access to",
                "# synsets and lemmas ",
                "token._.wordnet.synsets()",
                "token._.wordnet.lemmas()",
                "",
                "# And automatically add info about WordNet domains",
                "token._.wordnet.wordnet_domains()"
            ],
            "author": "recognai",
            "author_links": {
                "github": "recognai",
                "twitter": "recogn_ai",
                "website": "https://recogn.ai"
            },
            "category": ["pipeline"]
        },
        {
            "id": "spacy-conll",
            "title": "spacy_conll",
            "slogan": "Parsing from and to CoNLL-U format with `spacy`, `spacy-stanza` and `spacy-udpipe`",
            "description": "This module allows you to parse text into CoNLL-U format or read ConLL-U into a spaCy `Doc`. You can use it as a command line tool, or embed it in your own scripts by adding it as a custom pipeline component to a `spacy`, `spacy-stanza` or `spacy-udpipe` pipeline. It also provides an easy-to-use function to quickly initialize any spaCy-wrapped parser. CoNLL-related properties are added to `Doc` elements, `Span` sentences, and `Token` objects.",
            "code_example": [
                "from spacy_conll import init_parser",
                "",
                "",
                "# Initialise English parser, already including the ConllFormatter as a pipeline component.",
                "# Indicate that we want to get the CoNLL headers in the string output.",
                "# `use_gpu` and `verbose` are specific to stanza. These keywords arguments are passed onto their Pipeline() initialisation",
                "nlp = init_parser(\"en\",",
                "                  \"stanza\",",
                "                  parser_opts={\"use_gpu\": True, \"verbose\": False},",
                "                  include_headers=True)",
                "# Parse a given string",
                "doc = nlp(\"A cookie is a baked or cooked food that is typically small, flat and sweet. It usually contains flour, sugar and some type of oil or fat.\")",
                "",
                "# Get the CoNLL representation of the whole document, including headers",
                "conll = doc._.conll_str",
                "print(conll)"
            ],
            "code_language": "python",
            "author": "Bram Vanroy",
            "author_links": {
                "github": "BramVanroy",
                "twitter": "BramVanroy",
                "website": "http://bramvanroy.be"
            },
            "github": "BramVanroy/spacy_conll",
            "category": ["standalone", "pipeline"],
            "tags": ["linguistics", "computational linguistics", "conll", "conll-u"]
        },
        {
            "id": "ludwig",
            "title": "Ludwig",
            "slogan": "A code-free deep learning toolbox",
            "description": "Ludwig makes it easy to build deep learning models for many applications, including NLP ones. It uses spaCy for tokenizing text in different languages.",
            "pip": "ludwig",
            "github": "uber/ludwig",
            "thumb": "https://i.imgur.com/j1sORgD.png",
            "url": "http://ludwig.ai",
            "author": "Piero Molino @ Uber AI",
            "author_links": {
                "github": "w4nderlust",
                "twitter": "w4nderlus7",
                "website": "http://w4nderlu.st"
            },
            "category": ["standalone", "research"]
        },
        {
            "id": "pic2phrase_bot",
            "title": "pic2phrase_bot: Photo Description Generator",
            "slogan": "A bot that generates descriptions to submitted photos, in a human-like manner.",
            "description": "pic2phrase_bot runs inside Telegram messenger and can be used to generate a phrase describing a submitted photo, employing computer vision, web scraping, and syntactic dependency analysis powered by spaCy.",
            "thumb": "https://i.imgur.com/ggVI02O.jpg",
            "image": "https://i.imgur.com/z1yhWQR.jpg",
            "url": "https://telegram.me/pic2phrase_bot",
            "author": "Yuli Vasiliev",
            "author_links": {
                "twitter": "VasilievYuli"
            },
            "category": ["standalone", "conversational"]
        },
        {
            "id": "pyInflect",
            "slogan": "A Python module for word inflections",
            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add word inflections to the system.",
            "github": "bjascob/pyInflect",
            "pip": "pyinflect",
            "code_example": [
                "import spacy",
                "import pyinflect",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = nlp('This is an example.')",
                "doc[3].tag_                # NN",
                "doc[3]._.inflect('NNS')    # examples"
            ],
            "author": "Brad Jascob",
            "author_links": {
                "github": "bjascob"
            },
            "category": ["pipeline"],
            "tags": ["inflection"]
        },
        {
            "id": "lemminflect",
            "slogan": "A Python module for English lemmatization and inflection",
            "description": "LemmInflect uses a dictionary approach to lemmatize English words and inflect them into forms specified by a user supplied [Universal Dependencies](https://universaldependencies.org/u/pos/) or [Penn Treebank](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) tag.  The library works with out-of-vocabulary (OOV) words by applying neural network techniques to classify word forms and choose the appropriate morphing rules. The system acts as a standalone module or as an extension to spaCy.",
            "github": "bjascob/LemmInflect",
            "pip": "lemminflect",
            "thumb": "https://raw.githubusercontent.com/bjascob/LemmInflect/master/docs/img/icons8-citrus-80.png",
            "code_example": [
                "import spacy",
                "import lemminflect",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = nlp('I am testing this example.')",
                "doc[2]._.lemma()         # 'test'",
                "doc[4]._.inflect('NNS')  # 'examples'"
            ],
            "author": "Brad Jascob",
            "author_links": {
                "github": "bjascob"
            },
            "category": ["pipeline"],
            "tags": ["inflection", "lemmatizer"]
        },
        {
            "id": "amrlib",
            "slogan": "A python library that makes AMR parsing, generation and visualization simple.",
            "description": "amrlib is a python module and spaCy add-in for Abstract Meaning Representation (AMR).  The system can parse sentences to AMR graphs or generate text from existing graphs.  It includes a GUI for visualization and experimentation.",
            "github": "bjascob/amrlib",
            "pip": "amrlib",
            "code_example": [
                "import spacy",
                "import amrlib",
                "amrlib.setup_spacy_extension()",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = nlp('This is a test of the spaCy extension. The test has multiple sentences.')",
                "graphs = doc._.to_amr()",
                "for graph in graphs:",
                "    print(graph)"
            ],
            "author": "Brad Jascob",
            "author_links": {
                "github": "bjascob"
            },
            "category": ["pipeline"]
        },
        {
            "id": "classyclassification",
            "title": "Classy Classification",
            "slogan": "Have you ever struggled with needing a spaCy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go!",
            "description": "Have you ever struggled with needing a [spaCy TextCategorizer](https://spacy.io/api/textcategorizer) but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using [sentence-transformers](https://github.com/UKPLab/sentence-transformers) or [spaCy models](https://spacy.io/usage/models), provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with [Huggingface zero-shot classifiers](https://huggingface.co/models?pipeline_tag=zero-shot-classification).",
            "github": "davidberenstein1957/classy-classification",
            "pip": "classy-classification",
            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/classy-classification/master/logo.png",
            "code_example": [
                "import spacy",
                "",
                "data = {",
                "    \"furniture\": [\"This text is about chairs.\",",
                "               \"Couches, benches and televisions.\",",
                "               \"I really need to get a new sofa.\"],",
                "    \"kitchen\": [\"There also exist things like fridges.\",",
                "                \"I hope to be getting a new stove today.\",",
                "                \"Do you also have some ovens.\"]",
                "}",
                "",
                "# see github repo for examples on sentence-transformers and Huggingface",
                "nlp = spacy.load('en_core_web_md')",
                "nlp.add_pipe(\"classy_classification\", ",
                "    config={",
                "        \"data\": data,",
                "        \"model\": \"spacy\"",
                "    }",
                ")",
                "",
                "print(nlp(\"I am looking for kitchen appliances.\")._.cats)",
                "# Output:",
                "#",
                "# [{\"label\": \"furniture\", \"score\": 0.21}, {\"label\": \"kitchen\", \"score\": 0.79}]"
            ],
            "author": "David Berenstein",
            "author_links": {
                "github": "davidberenstein1957",
                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
            },
            "category": ["pipeline", "standalone"],
            "tags": [
                "classification",
                "zero-shot",
                "few-shot",
                "sentence-transformers",
                "huggingface"
            ],
            "spacy_version": 3
        },
        {
            "id": "conciseconcepts",
            "title": "Concise Concepts",
            "slogan": "Concise Concepts uses few-shot NER based on word embedding similarity to get you going with easy!",
            "description": "When wanting to apply NER to concise concepts, it is really easy to come up with examples, but it takes some effort to train an entire pipeline. Concise Concepts uses few-shot NER based on word embedding similarity to get you going with easy!",
            "github": "davidberenstein1957/concise-concepts",
            "pip": "concise-concepts",
            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/concise-concepts/master/img/logo.png",
            "image": "https://raw.githubusercontent.com/davidberenstein1957/concise-concepts/master/img/example.png",
            "code_example": [
                "import spacy",
                "from spacy import displacy",
                "",
                "data = {",
                "    \"fruit\": [\"apple\", \"pear\", \"orange\"],",
                "    \"vegetable\": [\"broccoli\", \"spinach\", \"tomato\"],",
                "    \"meat\": [\"beef\", \"pork\", \"fish\", \"lamb\"]",
                "}",
                "",
                "text = \"\"\"",
                "    Heat the oil in a large pan and add the Onion, celery and carrots.",
                "    Then, cook over a medium–low heat for 10 minutes, or until softened.",
                "    Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes.",
                "    Later, add some oranges and chickens.\"\"\"",
                "",
                "# use any model that has internal spacy embeddings",
                "nlp = spacy.load('en_core_web_lg')",
                "nlp.add_pipe(\"concise_concepts\", ",
                "    config={\"data\": data}",
                ")",
                "doc = nlp(text)",
                "",
                "options = {\"colors\": {\"fruit\": \"darkorange\", \"vegetable\": \"limegreen\", \"meat\": \"salmon\"},",
                "           \"ents\": [\"fruit\", \"vegetable\", \"meat\"]}",
                "",
                "displacy.render(doc, style=\"ent\", options=options)"
            ],
            "author": "David Berenstein",
            "author_links": {
                "github": "davidberenstein1957",
                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
            },
            "category": ["pipeline"],
            "tags": ["ner", "few-shot", "gensim"],
            "spacy_version": 3
        },
        {
            "id": "crosslingualcoreference",
            "title": "Crosslingual Coreference",
            "slogan": "One multi-lingual coreference model to rule them all!",
            "description": "Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non-English languages also data proved to be poorly annotated. Crosslingual Coreference therefore uses the assumption a trained model with English data and cross-lingual embeddings should work for other languages with similar sentence structure. Verified to work quite well for at least (EN, NL, DK, FR, DE).",
            "github": "davidberenstein1957/crosslingual-coreference",
            "pip": "crosslingual-coreference",
            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/crosslingual-coreference/master/img/logo.png",
            "image": "https://raw.githubusercontent.com/davidberenstein1957/crosslingual-coreference/master/img/example_total.png",
            "code_example": [
                "import spacy",
                "",
                "text = \"\"\"",
                "    Do not forget about Momofuku Ando!",
                "    He created instant noodles in Osaka.",
                "    At that location, Nissin was founded.",
                "    Many students survived by eating these noodles, but they don't even know him.\"\"\"",
                "",
                "# use any model that has internal spacy embeddings",
                "nlp = spacy.load('en_core_web_sm')",
                "nlp.add_pipe(",
                "    \"xx_coref\", config={\"chunk_size\": 2500, \"chunk_overlap\": 2, \"device\": 0})",
                ")",
                "",
                "doc = nlp(text)",
                "",
                "print(doc._.coref_clusters)",
                "# Output",
                "#",
                "# [[[4, 5], [7, 7], [27, 27], [36, 36]],",
                "# [[12, 12], [15, 16]],",
                "# [[9, 10], [27, 28]],",
                "# [[22, 23], [31, 31]]]",
                "print(doc._.resolved_text)",
                "# Output",
                "#",
                "# Do not forget about Momofuku Ando!",
                "# Momofuku Ando created instant noodles in Osaka.",
                "# At Osaka, Nissin was founded.",
                "# Many students survived by eating instant noodles,",
                "# but Many students don't even know Momofuku Ando."
            ],
            "author": "David Berenstein",
            "author_links": {
                "github": "davidberenstein1957",
                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
            },
            "category": ["pipeline", "standalone"],
            "tags": ["coreference", "multi-lingual", "cross-lingual", "allennlp"],
            "spacy_version": 3
        },
        {
            "id": "adeptaugmentations",
            "title": "Adept Augmentations",
            "slogan": " A Python library aimed at dissecting and augmenting NER training data for a few-shot scenario.",
            "description": "EntitySwapAugmenter takes either a `datasets.Dataset` or a `spacy.tokens.DocBin`. Additionally, it is optional to provide a set of labels. It initially creates a knowledge base of entities belonging to a certain label. When running `augmenter.augment()` for N runs, it then creates N new sentences with random swaps of the original entities with an entity of the same corresponding label from the knowledge base.\n\nFor example, assuming that we have knowledge base for `PERSONS`, `LOCATIONS` and `PRODUCTS`. We can then create additional data for the sentence \"Momofuko Ando created instant noodles in Osaka.\" using `augmenter.augment(N=2)`, resulting in \"David created instant noodles in Madrid.\" or \"Tom created Adept Augmentations in the Netherlands\".",
            "github": "argilla-io/adept-augmentations",
            "pip": "adept-augmentations",
            "thumb": "https://raw.githubusercontent.com/argilla-io/adept-augmentations/main/logo.png",
            "code_example": [
                "from adept_augmentations import EntitySwapAugmenter",
                "import spacy",
                "from spacy.tokens import Doc, DocBin",
                "nlp = spacy.blank(\"en\")",
                "",
                "# Create some example golden data",
                "example_data = [",
                "    (\"Apple is looking at buying U.K. startup for $1 billion\", [(0, 5, \"ORG\"), (27, 31, \"LOC\"), (44, 54, \"MONEY\")]),",
                "    (\"Microsoft acquires GitHub for $7.5 billion\", [(0, 9, \"ORG\"), (19, 25, \"ORG\"), (30, 42, \"MONEY\")]),",
                "]",
                "",
                "# Create a new DocBin",
                "nlp = spacy.blank(\"en\")",
                "docs = []",
                "for entry in example_data:",
                "    doc = Doc(nlp.vocab, words=entry[0].split())",
                "    doc.ents = [doc.char_span(ent[0], ent[1], label=ent[2]) for ent in entry[1]]",
                "    docs.append(doc)",
                "golden_dataset = DocBin(docs=docs)",
                "",
                "# Augment Data",
                "augmented_dataset = EntitySwapAugmenter(golden_dataset).augment(4)",
                "for doc in augmented_dataset.get_docs(nlp.vocab):",
                "    print(doc.text)",
                "",
                "# GitHub is looking at buying U.K. startup for $ 7.5 billion",
                "# Microsoft is looking at buying U.K. startup for $ 1 billion",
                "# Microsoft is looking at buying U.K. startup for $ 7.5 billion",
                "# GitHub is looking at buying U.K. startup for $ 1 billion",
                "# Microsoft acquires Apple for $ 7.5 billion",
                "# Apple acquires Microsoft for $ 1 billion",
                "# Microsoft acquires Microsoft for $ 7.5 billion",
                "# GitHub acquires GitHub for $ 1 billion"
            ],
            "author": "David Berenstein",
            "author_links": {
                "github": "davidberenstein1957",
                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
            },
            "category": ["standalone"],
            "tags": ["ner", "few-shot", "augmentation", "datasets", "training"],
            "spacy_version": 3
        },
        {
            "id": "spacysetfit",
            "title": "spaCy-SetFit",
            "slogan": "An an easy and intuitive approach to use SetFit in combination with spaCy.",
            "description": "spaCy-SetFit is a Python library that extends spaCy's text categorization capabilities by incorporating SetFit for few-shot classification. It allows you to train a text categorizer using a intuitive dictionary. \n\nThe library integrates with spaCy's pipeline architecture, enabling easy integration and configuration of the text categorizer component. You can provide a training dataset containing inlier and outlier examples, and spaCy-SetFit will use the paraphrase-MiniLM-L3-v2 model for training the text categorizer with SetFit. Once trained, you can use the categorizer to classify new text and obtain category probabilities.",
            "github": "davidberenstein1957/spacy-setfit",
            "pip": "spacy-setfit",
            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/spacy-setfit/main/logo.png",
            "code_example": [
            "import spacy",
            "",
            "# Create some example data",
            "train_dataset = {",
            "    \"inlier\": [",
            "        \"Text about furniture\",",
            "        \"Couches, benches and televisions.\",",
            "        \"I really need to get a new sofa.\"",
            "    ],",
            "    \"outlier\": [",
            "        \"Text about kitchen equipment\",",
            "        \"This text is about politics\",",
            "        \"Comments about AI and stuff.\"",
            "    ]",
            "}",
            "",
            "# Load the spaCy language model:",
            "nlp = spacy.load(\"en_core_web_sm\")",
            "",
            "# Add the \"spacy_setfit\" pipeline component to the spaCy model, and configure it with SetFit parameters:",
            "nlp.add_pipe(\"spacy_setfit\", config={",
            "    \"pretrained_model_name_or_path\": \"paraphrase-MiniLM-L3-v2\",",
            "    \"setfit_trainer_args\": {",
            "        \"train_dataset\": train_dataset",
            "    }",
            "})",
            "doc = nlp(\"I really need to get a new sofa.\")",
            "doc.cats",
            "# {'inlier': 0.902350975129, 'outlier': 0.097649024871}"
            ],
            "author": "David Berenstein",
            "author_links": {
                "github": "davidberenstein1957",
                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
            },
            "category": ["pipeline"],
            "tags": ["few-shot", "SetFit", "training"],
            "spacy_version": 3
        },
        {
            "id": "blackstone",
            "title": "Blackstone",
            "slogan": "A spaCy pipeline and model for NLP on unstructured legal text",
            "description": "Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the [Incorporated Council of Law Reporting for England and Wales'](https://iclr.co.uk/) research lab, [ICLR&D](https://research.iclr.co.uk/).",
            "github": "ICLRandD/Blackstone",
            "pip": "blackstone",
            "thumb": "https://iclr.s3-eu-west-1.amazonaws.com/assets/iclrand/Blackstone/thumb.png",
            "url": "https://research.iclr.co.uk",
            "author": " ICLR&D",
            "author_links": {
                "github": "ICLRandD",
                "twitter": "ICLRanD",
                "website": "https://research.iclr.co.uk"
            },
            "category": ["scientific", "models", "research"]
        },
        {
            "id": "NGym",
            "title": "NeuralGym",
            "slogan": "A little Windows GUI for training models with spaCy",
            "description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.",
            "github": "d5555/NeuralGym",
            "url": "https://github.com/d5555/NeuralGym",
            "image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png",
            "thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png",
            "author": "d5555",
            "category": ["training"],
            "tags": ["windows"]
        },
        {
            "id": "holmes",
            "title": "Holmes",
            "slogan": "Information extraction from English and German texts based on predicate logic",
            "github": "explosion/holmes-extractor",
            "url": "https://github.com/explosion/holmes-extractor",
            "description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural extraction, topic matching and supervised document classification. There is a [website demonstrating intelligent search based on topic matching](https://holmes-demo.explosion.services).",
            "pip": "holmes-extractor",
            "category": ["pipeline", "standalone"],
            "tags": ["chatbots", "text-processing"],
            "thumb": "https://raw.githubusercontent.com/explosion/holmes-extractor/master/docs/holmes_thumbnail.png",
            "code_example": [
                "import holmes_extractor as holmes",
                "holmes_manager = holmes.Manager(model='en_core_web_lg')",
                "holmes_manager.register_search_phrase('A big dog chases a cat')",
                "holmes_manager.start_chatbot_mode_console()"
            ],
            "author": "Richard Paul Hudson",
            "author_links": {
                "github": "richardpaulhudson"
            }
        },
        {
            "id": "coreferee",
            "title": "Coreferee",
            "slogan": "Coreference resolution for multiple languages",
            "github": "explosion/coreferee",
            "url": "https://github.com/explosion/coreferee",
            "description": "Coreferee is a pipeline plugin that performs coreference resolution for English, French, German and Polish. It is designed so that it is easy to add support for new languages and optimised for limited training data. It uses a mixture of neural networks and programmed rules. Please note you will need to [install models](https://github.com/explosion/coreferee#getting-started) before running the code example.",
            "pip": "coreferee",
            "category": ["pipeline", "models", "standalone"],
            "tags": ["coreference-resolution", "anaphora"],
            "code_example": [
                "import coreferee, spacy",
                "nlp = spacy.load('en_core_web_trf')",
                "nlp.add_pipe('coreferee')",
                "doc = nlp('Although he was very busy with his work, Peter had had enough of it. He and his wife decided they needed a holiday. They travelled to Spain because they loved the country very much.')",
                "doc._.coref_chains.print()",
                "# Output:",
                "#",
                "# 0: he(1), his(6), Peter(9), He(16), his(18)",
                "# 1: work(7), it(14)",
                "# 2: [He(16); wife(19)], they(21), They(26), they(31)",
                "# 3: Spain(29), country(34)",
                "#",
                "print(doc._.coref_chains.resolve(doc[31]))",
                "# Output:",
                "#",
                "# [Peter, wife]"
            ],
            "author": "Richard Paul Hudson",
            "author_links": {
                "github": "richardpaulhudson"
            }
        },
        {
            "id": "spacy-transformers",
            "title": "spacy-transformers",
            "slogan": "spaCy pipelines for pretrained BERT, XLNet and GPT-2",
            "description": "This package provides spaCy model pipelines that wrap [Hugging Face's `transformers`](https://github.com/huggingface/transformers) package, so you can use them in spaCy. The result is convenient access to state-of-the-art transformer architectures, such as BERT, GPT-2, XLNet, etc.",
            "github": "explosion/spacy-transformers",
            "url": "https://explosion.ai/blog/spacy-transformers",
            "pip": "spacy-transformers",
            "category": ["pipeline", "models", "research"],
            "code_example": [
                "import spacy",
                "",
                "nlp = spacy.load(\"en_core_web_trf\")",
                "doc = nlp(\"Apple shares rose on the news. Apple pie is delicious.\")"
            ],
            "author": "Explosion",
            "author_links": {
                "twitter": "explosion_ai",
                "github": "explosion",
                "website": "https://explosion.ai"
            }
        },
        {
            "id": "spacy-huggingface-hub",
            "title": "spacy-huggingface-hub",
            "slogan": "Push your spaCy pipelines to the Hugging Face Hub",
            "description": "This package provides a CLI command for uploading any trained spaCy pipeline packaged with [`spacy package`](https://spacy.io/api/cli#package) to the [Hugging Face Hub](https://huggingface.co). It auto-generates all meta information for you, uploads a pretty README (requires spaCy v3.1+) and handles version control under the hood.",
            "github": "explosion/spacy-huggingface-hub",
            "thumb": "https://i.imgur.com/j6FO9O6.jpg",
            "url": "https://github.com/explosion/spacy-huggingface-hub",
            "pip": "spacy-huggingface-hub",
            "category": ["pipeline", "models"],
            "author": "Explosion",
            "author_links": {
                "twitter": "explosion_ai",
                "github": "explosion",
                "website": "https://explosion.ai"
            }
        },
        {
            "id": "spacy-clausie",
            "title": "spacy-clausie",
            "slogan": "Implementation of the ClausIE information extraction system for Python+spaCy",
            "github": "mmxgn/spacy-clausie",
            "url": "https://github.com/mmxgn/spacy-clausie",
            "description": "ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text",
            "category": ["pipeline", "scientific", "research"],
            "code_example": [
                "import spacy",
                "import claucy",
                "",
                "nlp = spacy.load(\"en\")",
                "claucy.add_to_pipe(nlp)",
                "",
                "doc = nlp(\"AE died in Princeton in 1955.\")",
                "",
                "print(doc._.clauses)",
                "# Output:",
                "# &lt;SV, AE, died, None, None, None, [in Princeton, in 1955]&gt;",
                "",
                "propositions = doc._.clauses[0].to_propositions(as_text=True)",
                "",
                "print(propositions)",
                "# Output:",
                "# [AE died in Princeton in 1955, AE died in 1955, AE died in Princeton"
            ],
            "author": "Emmanouil Theofanis Chourdakis",
            "author_links": {
                "github": "mmxgn"
            }
        },
        {
            "id": "ipymarkup",
            "slogan": "NER, syntax markup visualizations",
            "description": "Collection of NLP visualizations for NER and syntax tree markup. Similar to [displaCy](https://explosion.ai/demos/displacy) and [displaCy ENT](https://explosion.ai/demos/displacy-ent).",
            "github": "natasha/ipymarkup",
            "image": "https://github.com/natasha/ipymarkup/blob/master/table.png?raw=true",
            "pip": "pip install ipymarkup",
            "code_example": [
                "from ipymarkup import show_span_ascii_markup, show_dep_ascii_markup",
                "",
                "text = 'В мероприятии примут участие не только российские учёные, но и зарубежные исследователи, в том числе, Крис Хелмбрехт - управляющий директор и совладелец креативного агентства Kollektiv (Германия, США), Ннека Угбома - руководитель проекта Mushroom works (Великобритания), Гергей Ковач - политик и лидер субкультурной партии «Dog with two tails» (Венгрия), Георг Жено - немецкий режиссёр, один из создателей экспериментального театра «Театр.doc», Театра им. Йозефа Бойса (Германия).'",
                "spans = [(102, 116, 'PER'), (186, 194, 'LOC'), (196, 199, 'LOC'), (202, 214, 'PER'), (254, 268, 'LOC'), (271, 283, 'PER'), (324, 342, 'ORG'), (345, 352, 'LOC'), (355, 365, 'PER'), (445, 455, 'ORG'), (456, 468, 'PER'), (470, 478, 'LOC')]",
                "show_span_ascii_markup(text, spans)"
            ],
            "author": "Alexander Kukushkin",
            "author_links": {
                "github": "kuk"
            },
            "category": ["visualizers"]
        },
        {
            "id": "negspacy",
            "title": "negspaCy",
            "slogan": "spaCy pipeline object for negating concepts in text based on the NegEx algorithm.",
            "github": "jenojp/negspacy",
            "url": "https://github.com/jenojp/negspacy",
            "description": "negspacy is a spaCy pipeline component that evaluates whether Named Entities are negated in text. It adds an extension to 'Span' objects.",
            "pip": "negspacy",
            "category": ["pipeline", "scientific"],
            "tags": ["negation", "text-processing"],
            "thumb": "https://github.com/jenojp/negspacy/blob/master/docs/thumb.png?raw=true",
            "image": "https://github.com/jenojp/negspacy/blob/master/docs/icon.png?raw=true",
            "code_example": [
                "import spacy",
                "from negspacy.negation import Negex",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "nlp.add_pipe(\"negex\", config={\"ent_types\":[\"PERSON\",\"ORG\"]})",
                "",
                "doc = nlp(\"She does not like Steve Jobs but likes Apple products.\")",
                "for e in doc.ents:",
                "    print(e.text, e._.negex)"
            ],
            "author": "Jeno Pizarro",
            "author_links": {
                "github": "jenojp",
                "twitter": "jenojp"
            }
        },
        {
            "id": "ronec",
            "title": "RONEC - Romanian Named Entity Corpus",
            "slogan": "Named Entity Recognition corpus for Romanian language.",
            "github": "dumitrescustefan/ronec",
            "url": "https://github.com/dumitrescustefan/ronec",
            "description": "The corpus holds 5127 sentences, annotated with 16 classes, with a total of 26376 annotated entities. The corpus comes into two formats: BRAT and CONLLUP.",
            "category": ["standalone", "models"],
            "tags": ["ner", "romanian"],
            "thumb": "https://raw.githubusercontent.com/dumitrescustefan/ronec/master/res/thumb.png",
            "code_example": [
                "# to train a new model on ronec",
                "python3 convert_spacy.py ronec/conllup/ronec.conllup output",
                "python3 -m spacy train ro models output/train_ronec.json output/train_ronec.json -p ent",
                "",
                "# download the Romanian NER model",
                "python -m spacy download ro_ner",
                "",
                "# load the model and print entities for a simple sentence",
                "import spacy",
                "",
                "nlp = spacy.load(\"ro_ner\")",
                "doc = nlp(\"Popescu Ion a fost la Cluj\")",
                "",
                "for ent in doc.ents:",
                "\tprint(ent.text, ent.start_char, ent.end_char, ent.label_)"
            ],
            "author": "Stefan Daniel Dumitrescu, Andrei-Marius Avram"
        },
        {
            "id": "Healthsea",
            "title": "Healthsea",
            "slogan": "Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects",
            "description": "This spaCy project trains an NER model and a custom Text Classification model with Clause Segmentation and Blinding capabilities to analyze supplement reviews and their potential effects on health.",
            "github": "explosion/healthsea",
            "thumb": "https://github.com/explosion/healthsea/blob/main/img/Jellyfish.png",
            "category": ["pipeline", "research"],
            "code_example": [
                "import spacy",
                "",
                "nlp = spacy.load(\"en_healthsea\")",
                "doc = nlp(\"This is great for joint pain.\")",
                "",
                "# Clause Segmentation & Blinding",
                "print(doc._.clauses)",
                "",
                ">     {",
                ">    \"split_indices\": [0, 7],",
                ">    \"has_ent\": true,",
                ">    \"ent_indices\": [4, 6],",
                ">    \"blinder\": \"_CONDITION_\",",
                ">    \"ent_name\": \"joint pain\",",
                ">    \"cats\": {",
                ">        \"POSITIVE\": 0.9824668169021606,",
                ">        \"NEUTRAL\": 0.017364952713251114,",
                ">        \"NEGATIVE\": 0.00002889777533710003,",
                ">        \"ANAMNESIS\": 0.0001394189748680219",
                ">    \"prediction_text\": [\"This\", \"is\", \"great\", \"for\", \"_CONDITION_\", \"!\"]",
                ">    }",
                "",
                "# Aggregated results",
                ">    {",
                ">    \"joint_pain\": {",
                ">        \"effects\": [\"POSITIVE\"],",
                ">        \"effect\": \"POSITIVE\",",
                ">        \"label\": \"CONDITION\",",
                ">        \"text\": \"joint pain\"",
                ">       }",
                ">    }"
            ],
            "author": "Edward Schmuhl",
            "author_links": {
                "github": "thomashacker",
                "twitter": "aestheticedwar1",
                "website": "https://explosion.ai/"
            }
        },
        {
            "id": "presidio",
            "title": "Presidio",
            "slogan": "Context aware, pluggable and customizable data protection and PII data anonymization",
            "description": "Presidio *(Origin from Latin praesidium ‘protection, garrison’)* helps to ensure sensitive text is properly managed and governed. It provides fast ***analytics*** and ***anonymization*** for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context.",
            "url": "https://aka.ms/presidio",
            "image": "https://raw.githubusercontent.com/microsoft/presidio/master/docs/assets/before-after.png",
            "github": "microsoft/presidio",
            "category": ["standalone"],
            "thumb": "https://avatars0.githubusercontent.com/u/6154722",
            "author": "Microsoft",
            "author_links": {
                "github": "microsoft"
            }
        },
        {
            "id": "presidio-research",
            "title": "Presidio Research",
            "slogan": "Toolbox for developing and evaluating PII detectors, NER models for PII and generating fake PII data",
            "description": "This package features data-science related tasks for developing new recognizers for Microsoft Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models. Anyone interested in evaluating an existing Microsoft Presidio instance, a specific PII recognizer or to develop new models or logic for detecting PII could leverage the preexisting work in this package. Additionally, anyone interested in generating new data based on previous datasets (e.g. to increase the coverage of entity values) for Named Entity Recognition models could leverage the data generator contained in this package.",
            "url": "https://aka.ms/presidio-research",
            "github": "microsoft/presidio-research",
            "category": ["standalone"],
            "thumb": "https://avatars0.githubusercontent.com/u/6154722",
            "author": "Microsoft",
            "author_links": {
                "github": "microsoft"
            }
        },
        {
            "id": "python-sentence-boundary-disambiguation",
            "title": "pySBD - python Sentence Boundary Disambiguation",
            "slogan": "Rule-based sentence boundary detection that works out-of-the-box",
            "github": "nipunsadvilkar/pySBD",
            "description": "pySBD is 'real-world' sentence segmenter which extracts reasonable sentences when the format and domain of the input text are unknown. It is a rules-based algorithm based on [The Golden Rules](https://s3.amazonaws.com/tm-town-nlp-resources/golden_rules.txt) - a set of tests to check accuracy of segmenter in regards to edge case scenarios developed by [TM-Town](https://www.tm-town.com/) dev team. pySBD is python port of ruby gem [Pragmatic Segmenter](https://github.com/diasks2/pragmatic_segmenter).",
            "pip": "pysbd",
            "category": ["scientific"],
            "tags": ["sentence segmentation"],
            "code_example": [
                "from pysbd.utils import PySBDFactory",
                "",
                "nlp = spacy.blank('en')",
                "# Caution: works with spaCy<=2.x.x",
                "nlp.add_pipe(PySBDFactory(nlp))",
                "",
                "doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.')",
                "print(list(doc.sents))",
                "# [My name is Jonas E. Smith., Please turn to p. 55.]"
            ],
            "author": "Nipun Sadvilkar",
            "author_links": {
                "twitter": "nipunsadvilkar",
                "github": "nipunsadvilkar",
                "website": "https://nipunsadvilkar.github.io"
            }
        },
        {
            "id": "cookiecutter-spacy-fastapi",
            "title": "cookiecutter-spacy-fastapi",
            "slogan": "Docker-based cookiecutter for easy spaCy APIs using FastAPI",
            "description": "Docker-based cookiecutter for easy spaCy APIs using FastAPI. The default endpoints expect batch requests with a list of Records in the Azure Search Cognitive Skill format. So out of the box, this cookiecutter can be setup as a Custom Cognitive Skill. For more on Azure Search and Cognitive Skills [see this page](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface).",
            "url": "https://github.com/microsoft/cookiecutter-spacy-fastapi",
            "image": "https://raw.githubusercontent.com/microsoft/cookiecutter-spacy-fastapi/master/images/cookiecutter-docs.png",
            "github": "microsoft/cookiecutter-spacy-fastapi",
            "category": ["apis"],
            "thumb": "https://avatars0.githubusercontent.com/u/6154722",
            "author": "Microsoft",
            "author_links": {
                "github": "microsoft"
            }
        },
        {
            "id": "dframcy",
            "title": "Dframcy",
            "slogan": "Dataframe Integration with spaCy NLP",
            "github": "yash1994/dframcy",
            "description": "DframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy's linguistic annotation and training tasks.",
            "pip": "dframcy",
            "category": ["pipeline", "training"],
            "tags": ["pandas"],
            "code_example": [
                "import spacy",
                "from dframcy import DframCy",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "dframcy = DframCy(nlp)",
                "doc = dframcy.nlp(u'Apple is looking at buying U.K. startup for $1 billion')",
                "annotation_dataframe = dframcy.to_dataframe(doc)"
            ],
            "author": "Yash Patadia",
            "author_links": {
                "twitter": "PatadiaYash",
                "github": "yash1994"
            }
        },
        {
            "id": "spacy-pytextrank",
            "title": "PyTextRank",
            "slogan": "Py impl of TextRank for lightweight phrase extraction",
            "description": "An implementation of TextRank in Python for use in spaCy pipelines which provides fast, effective phrase extraction from texts, along with extractive summarization. The graph algorithm works independent of a specific natural language and does not require domain knowledge. See (Mihalcea 2004) https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf",
            "github": "DerwenAI/pytextrank",
            "pip": "pytextrank",
            "code_example": [
                "import spacy",
                "import pytextrank",
                "",
                "# example text",
                "text = \"\"\"Compatibility of systems of linear constraints over the set of natural numbers.",
                "Criteria of compatibility of a system of linear Diophantine equations, strict inequations,",
                "and nonstrict inequations are considered. Upper bounds for components of a minimal set of",
                "solutions and algorithms of construction of minimal generating sets of solutions for all types",
                "of systems are given. These criteria and the corresponding algorithms for constructing a minimal",
                "supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.\"\"\"",
                "",
                "# load a spaCy model, depending on language, scale, etc.",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "# add PyTextRank to the spaCy pipeline",
                "nlp.add_pipe(\"textrank\")",
                "",
                "doc = nlp(text)",
                "# examine the top-ranked phrases in the document",
                "for phrase in doc._.phrases:",
                "    print(phrase.text)",
                "    print(phrase.rank, phrase.count)",
                "    print(phrase.chunks)"
            ],
            "code_language": "python",
            "url": "https://github.com/DerwenAI/pytextrank/wiki",
            "thumb": "https://memegenerator.net/img/instances/66942896.jpg",
            "image": "https://memegenerator.net/img/instances/66942896.jpg",
            "author": "Paco Nathan",
            "author_links": {
                "twitter": "pacoid",
                "github": "ceteri",
                "website": "https://derwen.ai/paco"
            },
            "category": ["pipeline"],
            "tags": ["phrase extraction", "ner", "summarization", "graph algorithms", "textrank"]
        },
        {
            "id": "spacy_syllables",
            "title": "Spacy Syllables",
            "slogan": "Multilingual syllable annotations",
            "description": "Spacy Syllables is a pipeline component that adds multilingual syllable annotations to Tokens. It uses Pyphen under the hood and has support for a long list of languages.",
            "github": "sloev/spacy-syllables",
            "pip": "spacy_syllables",
            "code_example": [
                "import spacy",
                "from spacy_syllables import SpacySyllables",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "nlp.add_pipe(\"syllables\", after=\"tagger\")",
                "",
                "assert nlp.pipe_names == [\"tok2vec\", \"tagger\", \"syllables\", \"parser\",  \"attribute_ruler\", \"lemmatizer\", \"ner\"]",
                "doc = nlp(\"terribly long\")",
                "data = [(token.text, token._.syllables, token._.syllables_count) for token in doc]",
                "assert data == [(\"terribly\", [\"ter\", \"ri\", \"bly\"], 3), (\"long\", [\"long\"], 1)]"
            ],
            "thumb": "https://raw.githubusercontent.com/sloev/spacy-syllables/master/logo.png",
            "author": "Johannes Valbjørn",
            "author_links": {
                "github": "sloev"
            },
            "category": ["pipeline"],
            "tags": ["syllables", "multilingual"]
        },
        {
            "id": "sentimental-onix",
            "title": "Sentimental Onix",
            "slogan": "Use onnx for sentiment models",
            "description": "spaCy pipeline component for sentiment analysis using onnx",
            "github": "sloev/sentimental-onix",
            "pip": "sentimental-onix",
            "code_example": [
                "# Download model:",
                "#   python -m sentimental_onix download en",
                "import spacy",
                "from sentimental_onix import pipeline",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "nlp.add_pipe(\"sentencizer\")",
                "nlp.add_pipe(\"sentimental_onix\", after=\"sentencizer\")",
                "",
                "sentences = [",
                "    (sent.text, sent._.sentiment)",
                "    for doc in nlp.pipe(",
                "        [",
                "            \"i hate pasta on tuesdays\",",
                "            \"i like movies on wednesdays\",",
                "            \"i find your argument ridiculous\",",
                "            \"soda with straws are my favorite\",",
                "        ]",
                "    )",
                "    for sent in doc.sents",
                "]",
                "",
                "assert sentences == [",
                "    (\"i hate pasta on tuesdays\", \"Negative\"),",
                "    (\"i like movies on wednesdays\", \"Positive\"),",
                "    (\"i find your argument ridiculous\", \"Negative\"),",
                "    (\"soda with straws are my favorite\", \"Positive\"),",
                "]"
            ],
            "thumb": "https://raw.githubusercontent.com/sloev/sentimental-onix/master/.github/onix.webp",
            "author": "Johannes Valbjørn",
            "author_links": {
                "github": "sloev"
            },
            "category": ["pipeline"],
            "tags": ["sentiment", "english"]
        },
        {
            "id": "gobbli",
            "title": "gobbli",
            "slogan": "Deep learning for text classification doesn't have to be scary",
            "description": "gobbli is a Python library which wraps several modern deep learning models in a uniform interface that makes it easy to evaluate feasibility and conduct analyses. It leverages the abstractive powers of Docker to hide nearly all dependency management and functional differences between models from the user. It also contains an interactive app for exploring text data and evaluating classification models. spaCy's base text classification models, as well as models integrated from `spacy-transformers`, are available in the collection of classification models. In addition, spaCy is used for data augmentation and document embeddings.",
            "url": "https://github.com/rtiinternational/gobbli",
            "github": "rtiinternational/gobbli",
            "pip": "gobbli",
            "thumb": "https://i.postimg.cc/NGpzhrdr/gobbli-lg.png",
            "code_example": [
                "from gobbli.io import PredictInput, TrainInput",
                "from gobbli.model.bert import BERT",
                "",
                "train_input = TrainInput(",
                "    X_train=['This is a training document.', 'This is another training document.'],",
                "    y_train=['0', '1'],",
                "    X_valid=['This is a validation sentence.', 'This is another validation sentence.'],",
                "    y_valid=['1', '0'],",
                ")",
                "",
                "clf = BERT()",
                "",
                "# Set up classifier resources -- Docker image, etc.",
                "clf.build()",
                "",
                "# Train model",
                "train_output = clf.train(train_input)",
                "",
                "predict_input = PredictInput(",
                "    X=['Which class is this document?'],",
                "    labels=train_output.labels,",
                "    checkpoint=train_output.checkpoint,",
                ")",
                "",
                "predict_output = clf.predict(predict_input)"
            ],
            "category": ["standalone"]
        },
        {
            "id": "spacy_fastlang",
            "title": "Spacy FastLang",
            "slogan": "Language detection done fast",
            "description": "Fast language detection using FastText and Spacy.",
            "github": "thomasthiebaud/spacy-fastlang",
            "pip": "spacy_fastlang",
            "code_example": [
                "import spacy_fastlang",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "nlp.add_pipe(\"language_detector\")",
                "doc = nlp('Life is like a box of chocolates. You never know what you are gonna get.')",
                "",
                "assert doc._.language == 'en'",
                "assert doc._.language_score >= 0.8"
            ],
            "author": "Thomas Thiebaud",
            "author_links": {
                "github": "thomasthiebaud"
            },
            "category": ["pipeline"]
        },
        {
            "id": "mlflow",
            "title": "MLflow",
            "slogan": "An open source platform for the machine learning lifecycle",
            "description": "MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components: Tracking, Projects, Models and Registry.",
            "github": "mlflow/mlflow",
            "pip": "mlflow",
            "thumb": "https://www.mlflow.org/docs/latest/_static/MLflow-logo-final-black.png",
            "image": "",
            "url": "https://mlflow.org/",
            "author": "Databricks",
            "author_links": {
                "github": "databricks",
                "twitter": "databricks",
                "website": "https://databricks.com/"
            },
            "category": ["standalone", "apis"],
            "code_example": [
                "import mlflow",
                "import mlflow.spacy",
                "",
                "# MLflow Tracking",
                "nlp = spacy.load('my_best_model_path/output/model-best')",
                "with mlflow.start_run(run_name='Spacy'):",
                "    mlflow.set_tag('model_flavor', 'spacy')",
                "    mlflow.spacy.log_model(spacy_model=nlp, artifact_path='model')",
                "    mlflow.log_metric(('accuracy', 0.72))",
                "    my_run_id = mlflow.active_run().info.run_id",
                "",
                "",
                "# MLflow Models",
                "model_uri = f'runs:/{my_run_id}/model'",
                "nlp2 = mlflow.spacy.load_model(model_uri=model_uri)"
            ]
        },
        {
            "id": "pyate",
            "title": "PyATE",
            "slogan": "Python Automated Term Extraction",
            "description": "PyATE is a term extraction library written in Python using Spacy POS tagging with Basic, Combo Basic, C-Value, TermExtractor, and Weirdness.",
            "github": "kevinlu1248/pyate",
            "pip": "pyate",
            "code_example": [
                "import spacy",
                "import pyate",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "nlp.add_pipe(\"combo_basic\") # or any of `basic`, `weirdness`, `term_extractor` or `cvalue`",
                "# source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994795/",
                "string = 'Central to the development of cancer are genetic changes that endow these “cancer cells” with many of the hallmarks of cancer, such as self-sufficient growth and resistance to anti-growth and pro-death signals. However, while the genetic changes that occur within cancer cells themselves, such as activated oncogenes or dysfunctional tumor suppressors, are responsible for many aspects of cancer development, they are not sufficient. Tumor promotion and progression are dependent on ancillary processes provided by cells of the tumor environment but that are not necessarily cancerous themselves. Inflammation has long been associated with the development of cancer. This review will discuss the reflexive relationship between cancer and inflammation with particular focus on how considering the role of inflammation in physiologic processes such as the maintenance of tissue homeostasis and repair may provide a logical framework for understanding the connection between the inflammatory response and cancer.'",
                "",
                "doc = nlp(string)",
                "print(doc._.combo_basic.sort_values(ascending=False).head(5))",
                "\"\"\"\"\"\"",
                "dysfunctional tumor                1.443147",
                "tumor suppressors                  1.443147",
                "genetic changes                    1.386294",
                "cancer cells                       1.386294",
                "dysfunctional tumor suppressors    1.298612",
                "\"\"\"\"\"\""
            ],
            "code_language": "python",
            "url": "https://github.com/kevinlu1248/pyate",
            "author": "Kevin Lu",
            "author_links": {
                "twitter": "kevinlu1248",
                "github": "kevinlu1248",
                "website": "https://github.com/kevinlu1248/pyate"
            },
            "category": ["pipeline", "research"],
            "tags": ["term_extraction"]
        },
        {
            "id": "contextualSpellCheck",
            "title": "Contextual Spell Check",
            "slogan": "Contextual spell correction using BERT (bidirectional representations)",
            "description": "This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting NWE.",
            "github": "R1j1t/contextualSpellCheck",
            "pip": "contextualSpellCheck",
            "code_example": [
                "import spacy",
                "import contextualSpellCheck",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "contextualSpellCheck.add_to_pipe(nlp)",
                "doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')",
                "",
                "print(doc._.performed_spellCheck) #Should be True",
                "print(doc._.outcome_spellCheck) #Income was $9.4 million compared to the prior year of $2.7 million."
            ],
            "code_language": "python",
            "url": "https://github.com/R1j1t/contextualSpellCheck",
            "thumb": "https://user-images.githubusercontent.com/22280243/82760949-98e68480-9e14-11ea-952e-4738620fd9e3.png",
            "image": "https://user-images.githubusercontent.com/22280243/82138959-2852cd00-9842-11ea-918a-49b2a7873ef6.png",
            "author": "Rajat Goel",
            "author_links": {
                "github": "r1j1t",
                "website": "https://github.com/R1j1t"
            },
            "category": ["pipeline", "conversational", "research"],
            "tags": ["spell check", "correction", "preprocessing", "translation", "correction"]
        },
        {
            "id": "texthero",
            "title": "Texthero",
            "slogan": "Text preprocessing, representation and visualization from zero to hero.",
            "description": "Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.",
            "github": "jbesomi/texthero",
            "pip": "texthero",
            "code_example": [
                "import texthero as hero",
                "import pandas as pd",
                "",
                "df = pd.read_csv('https://github.com/jbesomi/texthero/raw/master/dataset/bbcsport.csv')",
                "df['named_entities'] = hero.named_entities(df['text'])",
                "df.head()"
            ],
            "code_language": "python",
            "url": "https://texthero.org",
            "thumb": "https://texthero.org/img/T.png",
            "image": "https://texthero.org/docs/assets/texthero.png",
            "author": "Jonathan Besomi",
            "author_links": {
                "github": "jbesomi",
                "website": "https://besomi.ai"
            },
            "category": ["standalone"]
        },
        {
            "id": "cov-bsv",
            "title": "VA COVID-19 NLP BSV",
            "slogan": "spaCy pipeline for COVID-19 surveillance.",
            "github": "abchapman93/VA_COVID-19_NLP_BSV",
            "description": "A spaCy rule-based pipeline for identifying positive cases of COVID-19 from clinical text. A version of this system was deployed as part of the US Department of Veterans Affairs biosurveillance response to COVID-19.",
            "pip": "cov-bsv",
            "code_example": [
                "import cov_bsv",
                "",
                "nlp = cov_bsv.load()",
                "doc = nlp('Pt tested for COVID-19. His wife was recently diagnosed with novel coronavirus. SARS-COV-2: Detected')",
                "",
                "print(doc.ents)",
                "print(doc._.cov_classification)",
                "cov_bsv.visualize_doc(doc)"
            ],
            "category": ["pipeline", "standalone", "biomedical", "scientific"],
            "tags": ["clinical", "epidemiology", "covid-19", "surveillance"],
            "author": "Alec Chapman",
            "author_links": {
                "github": "abchapman93"
            }
        },
        {
            "id": "medspacy",
            "title": "medspaCy",
            "thumb": "https://raw.githubusercontent.com/medspacy/medspacy/master/images/medspacy_logo.png",
            "slogan": "A toolkit for clinical NLP with spaCy.",
            "github": "medspacy/medspacy",
            "description": "A toolkit for clinical NLP with spaCy. Features include sentence splitting, section detection, and asserting negation, family history, and uncertainty.",
            "pip": "medspacy",
            "code_example": [
                "import medspacy",
                "from medspacy.ner import TargetRule",
                "",
                "nlp = medspacy.load()",
                "print(nlp.pipe_names)",
                "",
                "nlp.get_pipe('target_matcher').add([TargetRule('stroke', 'CONDITION'), TargetRule('diabetes', 'CONDITION'), TargetRule('pna', 'CONDITION')])",
                "doc = nlp('Patient has hx of stroke. Mother diagnosed with diabetes. No evidence of pna.')",
                "",
                "for ent in doc.ents:",
                "    print(ent, ent._.is_negated, ent._.is_family, ent._.is_historical)",
                "medspacy.visualization.visualize_ent(doc)"
            ],
            "category": ["biomedical", "scientific", "research"],
            "tags": ["clinical"],
            "author": "medspacy",
            "author_links": {
                "github": "medspacy"
            }
        },
        {
            "id": "rita-dsl",
            "title": "RITA DSL",
            "slogan": "Domain Specific Language for creating language rules",
            "github": "zaibacu/rita-dsl",
            "description": "A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format",
            "pip": "rita-dsl",
            "thumb": "https://raw.githubusercontent.com/zaibacu/rita-dsl/master/docs/assets/logo-100px.png",
            "code_language": "python",
            "code_example": [
                "import spacy",
                "from rita.shortcuts import setup_spacy",
                "",
                "rules = \"\"\"",
                "cuts = {\"fitted\", \"wide-cut\"}",
                "lengths = {\"short\", \"long\", \"calf-length\", \"knee-length\"}",
                "fabric_types = {\"soft\", \"airy\", \"crinkled\"}",
                "fabrics = {\"velour\", \"chiffon\", \"knit\", \"woven\", \"stretch\"}",
                "",
                "{IN_LIST(cuts)?, IN_LIST(lengths), WORD(\"dress\")}->MARK(\"DRESS_TYPE\")",
                "{IN_LIST(lengths), IN_LIST(cuts), WORD(\"dress\")}->MARK(\"DRESS_TYPE\")",
                "{IN_LIST(fabric_types)?, IN_LIST(fabrics)}->MARK(\"DRESS_FABRIC\")",
                "\"\"\"",
                "",
                "nlp = spacy.load(\"en\")",
                "setup_spacy(nlp, rules_string=rules)",
                "r = nlp(\"She was wearing a short wide-cut dress\")",
                "print(list([{\"label\": e.label_, \"text\": e.text} for e in r.ents]))"
            ],
            "category": ["standalone"],
            "tags": ["dsl", "language-patterns", "language-rules", "nlp"],
            "author": "Šarūnas Navickas",
            "author_links": {
                "github": "zaibacu"
            }
        },
        {
            "id": "PatternOmatic",
            "title": "PatternOmatic",
            "slogan": "Finds linguistic patterns effortlessly",
            "description": "Discover spaCy's linguistic patterns matching a given set of String samples to be used by the spaCy's Rule Based Matcher",
            "github": "revuel/PatternOmatic",
            "pip": "PatternOmatic",
            "code_example": [
                "from PatternOmatic.api import find_patterns",
                "",
                "samples = ['I am a cat!', 'You are a dog!', 'She is an owl!']",
                "",
                "patterns_found, _ = find_patterns(samples)",
                "",
                "print(f'Patterns found: {patterns_found}')"
            ],
            "code_language": "python",
            "thumb": "https://svgshare.com/i/R3P.svg",
            "image": "https://svgshare.com/i/R3P.svg",
            "author": "Miguel Revuelta Espinosa",
            "author_links": {
                "github": "revuel"
            },
            "category": ["scientific", "research", "standalone"],
            "tags": ["Evolutionary Computation", "Grammatical Evolution"]
        },
        {
            "id": "SpacyDotNet",
            "title": "spaCy .NET Wrapper",
            "slogan": "SpacyDotNet is a .NET Core compatible wrapper for spaCy, based on Python.NET",
            "description": "This projects relies on [Python.NET](http://pythonnet.github.io/) to interop with spaCy. It's not meant to be a complete and exhaustive implementation of all spaCy features and [APIs](https://spacy.io/api). Although it should be enough for basic tasks, it's considered as a starting point if you need to build a complex project using spaCy in .NET Most of the basic features in _Spacy101_ are available. All `Container` classes are present (`Doc`, `Token`, `Span` and `Lexeme`) with their basic properties/methods running and also `Vocab` and `StringStore` in a limited form. Anyway, any developer should be ready to add the missing properties or classes in a very straightforward manner.",
            "github": "AMArostegui/SpacyDotNet",
            "thumb": "https://raw.githubusercontent.com/AMArostegui/SpacyDotNet/master/cslogo.png",
            "code_example": [
                "var spacy = new Spacy();",
                "",
                "var nlp = spacy.Load(\"en_core_web_sm\");",
                "var doc = nlp.GetDocument(\"Apple is looking at buying U.K. startup for $1 billion\");",
                "",
                "foreach (Token token in doc.Tokens)",
                "    Console.WriteLine($\"{token.Text} {token.Lemma} {token.PoS} {token.Tag} {token.Dep} {token.Shape} {token.IsAlpha} {token.IsStop}\");",
                "",
                "Console.WriteLine(\"\");",
                "foreach (Span ent in doc.Ents)",
                "    Console.WriteLine($\"{ent.Text} {ent.StartChar} {ent.EndChar} {ent.Label}\");",
                "",
                "nlp = spacy.Load(\"en_core_web_md\");",
                "var tokens = nlp.GetDocument(\"dog cat banana afskfsd\");",
                "",
                "Console.WriteLine(\"\");",
                "foreach (Token token in tokens.Tokens)",
                "    Console.WriteLine($\"{token.Text} {token.HasVector} {token.VectorNorm}, {token.IsOov}\");",
                "",
                "tokens = nlp.GetDocument(\"dog cat banana\");",
                "Console.WriteLine(\"\");",
                "foreach (Token token1 in tokens.Tokens)",
                "{",
                "    foreach (Token token2 in tokens.Tokens)",
                "        Console.WriteLine($\"{token1.Text} {token2.Text} {token1.Similarity(token2) }\");",
                "}",
                "",
                "doc = nlp.GetDocument(\"I love coffee\");",
                "Console.WriteLine(\"\");",
                "Console.WriteLine(doc.Vocab.Strings[\"coffee\"]);",
                "Console.WriteLine(doc.Vocab.Strings[3197928453018144401]);",
                "",
                "Console.WriteLine(\"\");",
                "foreach (Token word in doc.Tokens)",
                "{",
                "    var lexeme = doc.Vocab[word.Text];",
                "    Console.WriteLine($@\"{lexeme.Text} {lexeme.Orth} {lexeme.Shape} {lexeme.Prefix} {lexeme.Suffix} {lexeme.IsAlpha} {lexeme.IsDigit} {lexeme.IsTitle} {lexeme.Lang}\");",
                "}"
            ],
            "code_language": "csharp",
            "author": "Antonio Miras",
            "author_links": {
                "github": "AMArostegui"
            },
            "category": ["nonpython"]
        },
        {
            "id": "ruts",
            "title": "ruTS",
            "slogan": "A library for statistics extraction from texts in Russian",
            "description": "The library allows extracting the following statistics from a text: basic statistics, readability metrics, lexical diversity metrics, morphological statistics",
            "github": "SergeyShk/ruTS",
            "pip": "ruts",
            "code_example": [
                "import spacy",
                "import ruts",
                "",
                "nlp = spacy.load('ru_core_news_sm')",
                "nlp.add_pipe('basic', last=True)",
                "doc = nlp('мама мыла раму')",
                "doc._.basic.get_stats()"
            ],
            "code_language": "python",
            "thumb": "https://habrastorage.org/webt/6z/le/fz/6zlefzjavzoqw_wymz7v3pwgfp4.png",
            "image": "https://clipartart.com/images/free-tree-roots-clipart-black-and-white-2.png",
            "author": "Sergey Shkarin",
            "author_links": {
                "twitter": "shk_sergey",
                "github": "SergeyShk"
            },
            "category": ["pipeline", "standalone"],
            "tags": ["Text Analytics", "Russian"]
        },
        {
            "id": "trunajod",
            "title": "TRUNAJOD",
            "slogan": "A text complexity library for text analysis built on spaCy",
            "description": "With all the basic NLP capabilities provided by spaCy (dependency parsing, POS tagging, tokenizing), `TRUNAJOD` focuses on extracting measurements from texts that might be interesting for different applications and use cases.",
            "github": "dpalmasan/TRUNAJOD2.0",
            "pip": "trunajod",
            "code_example": [
                "import spacy",
                "from TRUNAJOD.entity_grid import EntityGrid",
                "",
                "nlp = spacy.load('es_core_news_sm', disable=['ner', 'textcat'])",
                "example_text = (",
                "    'El espectáculo del cielo nocturno cautiva la mirada y suscita preguntas'",
                "    'sobre el universo, su origen y su funcionamiento. No es sorprendente que '",
                "    'todas las civilizaciones y culturas hayan formado sus propias '",
                "    'cosmologías. Unas relatan, por ejemplo, que el universo ha'",
                "    'sido siempre tal como es, con ciclos que inmutablemente se repiten; '",
                "    'otras explican que este universo ha tenido un principio, '",
                "    'que ha aparecido por obra creadora de una divinidad.'",
                ")",
                "doc = nlp(example_text)",
                "egrid = EntityGrid(doc)",
                "print(egrid.get_egrid())"
            ],
            "code_language": "python",
            "thumb": "https://raw.githubusercontent.com/dpalmasan/TRUNAJOD2.0/master/imgs/trunajod_thumb.png",
            "image": "https://raw.githubusercontent.com/dpalmasan/TRUNAJOD2.0/master/imgs/trunajod_logo.png",
            "author": "Diego Palma",
            "author_links": {
                "github": "dpalmasan"
            },
            "category": ["research", "standalone", "scientific"],
            "tags": ["Text Analytics", "Coherence", "Cohesion"]
        },
        {
            "id": "lingfeat",
            "title": "LingFeat",
            "slogan": "A Linguistic Feature Extraction (Text Analysis) Tool for Readability Assessment and Text Simplification",
            "description": "LingFeat is a feature extraction library which currently extracts 255 linguistic features from English string input. Categories include syntax, semantics, discourse, and also traditional readability formulas. Published in EMNLP 2021.",
            "github": "brucewlee/lingfeat",
            "pip": "lingfeat",
            "code_example": [
                "from lingfeat import extractor",
                "",
                "",
                "text = 'TAEAN, South Chungcheong Province -- Just before sunup, Lee Young-ho, a seasoned fisherman with over 30 years of experience, silently waits for boats carrying blue crabs as the season for the seafood reaches its height. Soon afterward, small and big boats sail into Sinjin Port in Taean County, South Chungcheong Province, the second-largest source of blue crab after Incheon, accounting for 29 percent of total production of the country. A crane lifts 28 boxes filled with blue crabs weighing 40 kilograms each from the boat, worth about 10 million won ($8,500). “It has been a productive fall season for crabbing here. The water temperature is a very important factor affecting crab production. They hate cold water,” Lee said. The temperature of the sea off Taean appeared to have stayed at the level where crabs become active. If the sea temperature suddenly drops, crabs go into their winter dormancy mode, burrowing into the mud and sleeping through the cold months.'",
                "",
                "",
                "#Pass text",
                "LingFeat = extractor.pass_text(text)",
                "",
                "",
                "#Preprocess text",
                "LingFeat.preprocess()",
                "",
                "",
                "#Extract features",
                "#each method returns a dictionary of the corresponding features",
                "#Advanced Semantic (AdSem) Features",
                "WoKF = LingFeat.WoKF_() #Wikipedia Knowledge Features",
                "WBKF = LingFeat.WBKF_() #WeeBit Corpus Knowledge Features",
                "OSKF = LingFeat.OSKF_() #OneStopEng Corpus Knowledge Features",
                "",
                "#Discourse (Disco) Features",
                "EnDF = LingFeat.EnDF_() #Entity Density Features",
                "EnGF = LingFeat.EnGF_() #Entity Grid Features",
                "",
                "#Syntactic (Synta) Features",
                "PhrF = LingFeat.PhrF_() #Noun/Verb/Adj/Adv/... Phrasal Features",
                "TrSF = LingFeat.TrSF_() #(Parse) Tree Structural Features",
                "POSF = LingFeat.POSF_() #Noun/Verb/Adj/Adv/... Part-of-Speech Features",
                "",
                "#Lexico Semantic (LxSem) Features",
                "TTRF = LingFeat.TTRF_() #Type Token Ratio Features",
                "VarF = LingFeat.VarF_() #Noun/Verb/Adj/Adv Variation Features",
                "PsyF = LingFeat.PsyF_() #Psycholinguistic Difficulty of Words (AoA Kuperman)",
                "WoLF = LingFeat.WorF_() #Word Familiarity from Frequency Count (SubtlexUS)",
                "",
                "Shallow Traditional (ShTra) Features",
                "ShaF = LingFeat.ShaF_() #Shallow Features (e.g. avg number of tokens)",
                "TraF = LingFeat.TraF_() #Traditional Formulas"
            ],
            "code_language": "python",
            "thumb": "https://raw.githubusercontent.com/brucewlee/lingfeat/master/img/lingfeat_logo2.png",
            "image": "https://raw.githubusercontent.com/brucewlee/lingfeat/master/img/lingfeat_logo.png",
            "author": "Bruce W. Lee (이웅성)",
            "author_links": {
                "github": "brucewlee",
                "website": "https://brucewlee.github.io/"
            },
            "category": ["research", "scientific"],
            "tags": [
                "Readability",
                "Simplification",
                "Feature Extraction",
                "Syntax",
                "Discourse",
                "Semantics",
                "Lexical"
            ]
        },
        {
            "id": "hmrb",
            "title": "Hammurabi",
            "slogan": "Python Rule Processing Engine 🏺",
            "description": "Hammurabi works as a rule engine to parse input using a defined set of rules. It uses a simple and readable syntax to define complex rules to handle phrase matching. The syntax supports nested logical statements, regular expressions, reusable or side-loaded variables and match triggered callback functions to modularize your rules. The latest version works with both spaCy 2.X and 3.X. For more information check the documentation on [ReadTheDocs](https://hmrb.readthedocs.io/en/latest/).",
            "github": "babylonhealth/hmrb",
            "pip": "hmrb",
            "code_example": [
                "import spacy",
                "from hmrb.core import SpacyCore",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "sentences = \"I love gorillas. Peter loves gorillas. Jane loves Tarzan.\"",
                "",
                "def conj_be(subj: str) -> str:",
                "   if subj == \"I\":",
                "       return \"am\"",
                "   elif subj == \"you\":",
                "       return \"are\"",
                "   else:",
                "       return \"is\"",
                "",
                "@spacy.registry.callbacks(\"gorilla_callback\")",
                "def gorilla_clb(seq: list, span: slice, data: dict) -> None:",
                "   subj = seq[span.start].text",
                "   be = conj_be(subj)",
                "   print(f\"{subj} {be} a gorilla person.\")",
                "@spacy.registry.callbacks(\"lover_callback\")",
                "def lover_clb(seq: list, span: slice, data: dict) -> None:",
                "   print(f\"{seq[span][-1].text} is a love interest of {seq[span.start].text}.\")",
                "",
                "grammar = \"\"\"",
                "   Law:",
                "   - callback: \"loves_gorilla\"",
                "   (",
                "   ((pos: \"PROPN\") or (pos: \"PRON\"))",
                "   (lemma: \"love\")",
                "   (lemma: \"gorilla\")",
                "   )",
                "   Law:",
                "   - callback: \"loves_someone\"",
                "   (",
                "   (pos: \"PROPN\")",
                "   (lower: \"loves\")",
                "   (pos: \"PROPN\")",
                "   )",
                "\"\"\"",
                "",
                "@spacy.registry.augmenters(\"jsonify_span\")",
                "def jsonify_span(span):",
                "   return [{\"lemma\": token.lemma_, \"pos\": token.pos_, \"lower\": token.lower_} for token in span]",
                "",
                "conf = {",
                "   \"rules\": grammar,",
                "   \"callbacks\": {",
                "       \"loves_gorilla\": \"callbacks.gorilla_callback\",",
                "       \"loves_someone\": \"callbacks.lover_callback\",",
                "   },",
                "   \"map_doc\": \"augmenters.jsonify_span\",",
                "   \"sort_length\": True,",
                "}",
                "",
                "nlp.add_pipe(\"hmrb\", config=conf)",
                "nlp(sentences)"
            ],
            "code_language": "python",
            "thumb": "https://user-images.githubusercontent.com/6807878/118643685-cae6b880-b7d4-11eb-976e-066aec9505da.png",
            "image": "https://user-images.githubusercontent.com/6807878/118643685-cae6b880-b7d4-11eb-976e-066aec9505da.png",
            "author": "Kristian Boda",
            "author_links": {
                "github": "bodak",
                "twitter": "bodak",
                "website": "https://github.com/babylonhealth/"
            },
            "category": ["pipeline", "standalone", "scientific", "biomedical"],
            "tags": ["babylonhealth", "rule-engine", "matcher"]
        },
        {
            "id": "forte",
            "title": "Forte",
            "slogan": "Forte is a toolkit for building Natural Language Processing pipelines, featuring cross-task interaction, adaptable data-model interfaces and composable pipelines.",
            "description": "Forte provides a platform to assemble state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural Language Generation.",
            "github": "asyml/forte",
            "pip": "forte.spacy stave torch",
            "code_example": [
                "from fortex.spacy import SpacyProcessor",
                "from forte.processors.stave import StaveProcessor",
                "from forte import Pipeline",
                "from forte.data.readers import StringReader",
                "",
                "pipeline = Pipeline()",
                "pipeline.set_reader(StringReader())",
                "pipeline.add(SpacyProcessor())",
                "pipeline.add(StaveProcessor())",
                "pipeline.run('Running SpaCy with Forte!')"
            ],
            "code_language": "python",
            "url": "https://medium.com/casl-project/forte-building-modular-and-re-purposable-nlp-pipelines-cf5b5c5abbe9",
            "thumb": "https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_graphic.png",
            "image": "https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/logo_h.png",
            "author": "Petuum",
            "author_links": {
                "twitter": "PetuumInc",
                "github": "asyml",
                "website": "https://petuum.com"
            },
            "category": ["pipeline", "standalone"],
            "tags": ["pipeline"]
        },
        {
            "id": "spacy-api-docker-v3",
            "slogan": "spaCy v3 REST API, wrapped in a Docker container",
            "github": "bbieniek/spacy-api-docker",
            "url": "https://hub.docker.com/r/bbieniek/spacyapi/",
            "thumb": "https://i.imgur.com/NRnDKyj.jpg",
            "code_example": [
                "version: '3'",
                "",
                "services:",
                "  spacyapi:",
                "    image: bbieniek/spacyapi:en_v3",
                "    ports:",
                "      - \"127.0.0.1:8080:80\"",
                "    restart: always"
            ],
            "code_language": "docker",
            "author": "Baltazar Bieniek",
            "author_links": {
                "github": "bbieniek"
            },
            "category": ["apis"]
        },
        {
            "id": "phruzz_matcher",
            "title": "phruzz-matcher",
            "slogan": "Phrase matcher using RapidFuzz",
            "description": "Combination of the RapidFuzz library with Spacy PhraseMatcher The goal of this component is to find matches when there were NO \"perfect matches\" due to typos or abbreviations between a Spacy doc and a list of phrases.",
            "github": "mjvallone/phruzz-matcher",
            "pip": "phruzz_matcher",
            "code_example": [
                "import spacy",
                "from spacy.language import Language",
                "from phruzz_matcher.phrase_matcher import PhruzzMatcher",
                "",
                "famous_people = [",
                "        \"Brad Pitt\",",
                "        \"Demi Moore\",",
                "        \"Bruce Willis\",",
                "        \"Jim Carrey\",",
                "]",
                "",
                "@Language.factory(\"phrase_matcher\")",
                "def phrase_matcher(nlp: Language, name: str):",
                "    return PhruzzMatcher(nlp, famous_people, \"FAMOUS_PEOPLE\", 85)",
                "",
                "nlp = spacy.blank('es')",
                "nlp.add_pipe(\"phrase_matcher\")",
                "",
                "doc = nlp(\"El otro día fui a un bar donde vi a brad pit y a Demi Moore, estaban tomando unas cervezas mientras charlaban de sus asuntos.\")",
                "print(f\"doc.ents: {doc.ents}\")",
                "",
                "#OUTPUT",
                "#doc.ents: (brad pit, Demi Moore)"
            ],
            "thumb": "https://avatars.githubusercontent.com/u/961296?v=4",
            "image": "",
            "code_language": "python",
            "author": "Martin Vallone",
            "author_links": {
                "github": "mjvallone",
                "twitter": "vallotin",
                "website": "https://fiqus.coop/"
            },
            "category": ["pipeline", "research", "standalone"],
            "tags": ["spacy", "python", "nlp", "ner"]
        },
        {
            "id": "WordDumb",
            "title": "WordDumb",
            "slogan": "A calibre plugin that generates Word Wise and X-Ray files.",
            "description": "A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.",
            "github": "xxyzz/WordDumb",
            "code_language": "python",
            "thumb": "https://raw.githubusercontent.com/xxyzz/WordDumb/master/starfish.svg",
            "image": "https://user-images.githubusercontent.com/21101839/130245435-b874f19a-7785-4093-9975-81596efc42bb.png",
            "author": "xxyzz",
            "author_links": {
                "github": "xxyzz"
            },
            "category": ["standalone"]
        },
        {
            "id": "eng_spacysentiment",
            "title": "eng_spacysentiment",
            "slogan": "Simple sentiment analysis using spaCy pipelines",
            "description": "Sentiment analysis for simple english sentences using pre-trained spaCy pipelines",
            "github": "vishnunkumar/spacysentiment",
            "pip": "eng-spacysentiment",
            "code_example": [
                "import eng_spacysentiment",
                "nlp = eng_spacysentiment.load()",
                "text = \"Welcome to Arsenals official YouTube channel Watch as we take you closer and show you the personality of the club\"",
                "doc = nlp(text)",
                "print(doc.cats)",
                "# {'positive': 0.29878824949264526, 'negative': 0.7012117505073547}"
            ],
            "thumb": "",
            "image": "",
            "code_language": "python",
            "author": "Vishnu Nandakumar",
            "author_links": {
                "github": "Vishnunkumar",
                "twitter": "vishnun_uchiha"
            },
            "category": ["pipeline"],
            "tags": ["pipeline", "nlp", "sentiment"]
        },
        {
            "id": "textnets",
            "slogan": "Text analysis with networks",
            "description": "textnets represents collections of texts as networks of documents and words. This provides novel possibilities for the visualization and analysis of texts.",
            "github": "jboynyc/textnets",
            "image": "https://user-images.githubusercontent.com/2187261/152641425-6c0fb41c-b8e0-44fb-a52a-7c1ba24eba1e.png",
            "code_example": [
                "import textnets as tn",
                "",
                "corpus = tn.Corpus(tn.examples.moon_landing)",
                "t = tn.Textnet(corpus.tokenized(), min_docs=1)",
                "t.plot(label_nodes=True,",
                "       show_clusters=True,",
                "       scale_nodes_by=\"birank\",",
                "       scale_edges_by=\"weight\")"
            ],
            "author": "John Boy",
            "author_links": {
                "github": "jboynyc",
                "twitter": "jboy"
            },
            "category": ["visualizers", "standalone"]
        },
        {
            "id": "tmtoolkit",
            "slogan": "Text mining and topic modeling toolkit",
            "description": "tmtoolkit is a set of tools for text mining and topic modeling with Python developed especially for the use in the social sciences, in journalism or related disciplines. It aims for easy installation, extensive documentation and a clear programming interface while offering good performance on large datasets by the means of vectorized operations (via NumPy) and parallel computation (using Python’s multiprocessing module and the loky package).",
            "github": "WZBSocialScienceCenter/tmtoolkit",
            "code_example": [
                "# Note: This requires these setup steps:",
                "#   pip install tmtoolkit[recommended]",
                "#   python -m tmtoolkit setup en",
                "from tmtoolkit.corpus import Corpus, tokens_table, lemmatize, to_lowercase, dtm",
                "from tmtoolkit.bow.bow_stats import tfidf, sorted_terms_table",
                "# load built-in sample dataset and use 4 worker processes",
                "corp = Corpus.from_builtin_corpus('en-News100', max_workers=4)",
                "# investigate corpus as dataframe",
                "toktbl = tokens_table(corp)",
                "print(toktbl)",
                "# apply some text normalization",
                "lemmatize(corp)",
                "to_lowercase(corp)",
                "# build sparse document-token matrix (DTM)",
                "# document labels identify rows, vocabulary tokens identify columns",
                "mat, doc_labels, vocab = dtm(corp, return_doc_labels=True, return_vocab=True)",
                "# apply tf-idf transformation to DTM",
                "# operation is applied on sparse matrix and uses few memory",
                "tfidf_mat = tfidf(mat)",
                "# show top 5 tokens per document ranked by tf-idf",
                "top_tokens = sorted_terms_table(tfidf_mat, vocab, doc_labels, top_n=5)",
                "print(top_tokens)"
            ],
            "author": "Markus Konrad / WZB Social Science Center",
            "author_links": {
                "github": "internaut",
                "twitter": "_knrd"
            },
            "category": ["scientific", "standalone"]
        },
        {
            "id": "edsnlp",
            "title": "EDS-NLP",
            "slogan": "spaCy components to extract information from clinical notes written in French.",
            "description": "EDS-NLP provides a set of rule-based spaCy components to extract information for French clinical notes. It also features _qualifier_ pipelines that detect negations, speculations and family context, among other modalities. Check out the [demo](https://aphp.github.io/edsnlp/demo/)!",
            "github": "aphp/edsnlp",
            "pip": "edsnlp",
            "code_example": [
                "import spacy",
                "",
                "nlp = spacy.blank(\"fr\")",
                "",
                "terms = dict(",
                "    covid=[\"covid\", \"coronavirus\"],",
                ")",
                "",
                "# Sentencizer component, needed for negation detection",
                "nlp.add_pipe(\"eds.sentences\")",
                "# Matcher component",
                "nlp.add_pipe(\"eds.matcher\", config=dict(terms=terms))",
                "# Negation detection",
                "nlp.add_pipe(\"eds.negation\")",
                "",
                "# Process your text in one call !",
                "doc = nlp(\"Le patient est atteint de covid\")",
                "",
                "doc.ents",
                "# Out: (covid,)",
                "",
                "doc.ents[0]._.negation",
                "# Out: False"
            ],
            "code_language": "python",
            "url": "https://aphp.github.io/edsnlp/",
            "author": "AP-HP",
            "author_links": {
                "github": "aphp",
                "website": "https://github.com/aphp"
            },
            "category": ["biomedical", "scientific", "research", "pipeline"],
            "tags": ["clinical"]
        },
        {
            "id": "sent-pattern",
            "title": "English Interpretation Sentence Pattern",
            "slogan": "English interpretation for accurate translation from English to Japanese",
            "description": "This package categorizes English sentences into one of five basic sentence patterns and identifies the subject, verb, object, and other components. The five basic sentence patterns are based on C. T. Onions's Advanced English Syntax and are frequently used when teaching English in Japan.",
            "github": "lll-lll-lll-lll/sent-pattern",
            "pip": "sent-pattern",
            "author": "Shunpei Nakayama",
            "author_links": {
                "twitter": "ExZ79575296",
                "github": "lll-lll-lll-lll"
            },
            "category": ["pipeline"],
            "tags": ["interpretation", "ja"]
        },
        {
            "id": "spacy-partial-tagger",
            "title": "spaCy - Partial Tagger",
            "slogan": "Sequence Tagger for Partially Annotated Dataset in spaCy",
            "description": "This is a library to build a CRF tagger with a partially annotated dataset in spaCy. You can build your own tagger only from dictionary.",
            "github": "doccano/spacy-partial-tagger",
            "pip": "spacy-partial-tagger",
            "category": ["pipeline", "training"],
            "author": "Yasufumi Taniguchi",
            "author_links": {
                "github": "yasufumy"
            }
        },
        {
            "id": "spacy-pythainlp",
            "title": "spaCy-PyThaiNLP",
            "slogan": "PyThaiNLP for spaCy",
            "description": "This package wraps the PyThaiNLP library to add support for Thai to spaCy.",
            "github": "PyThaiNLP/spaCy-PyThaiNLP",
            "code_example": [
                "import spacy",
                "import spacy_pythainlp.core",
                "",
                "nlp = spacy.blank('th')",
                "nlp.add_pipe('pythainlp')",
                "doc = nlp('ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  ผมอยากไปเที่ยว')",
                "",
                "print(list(doc.sents))",
                "# output: [ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  , ผมอยากไปเที่ยว]"
            ],
            "code_language": "python",
            "author": "Wannaphong Phatthiyaphaibun",
            "author_links": {
                "twitter": "@wannaphong_p",
                "github": "wannaphong",
                "website": "https://iam.wannaphong.com/"
            },
            "category": ["pipeline", "research"],
            "tags": ["Thai"]
        },
        {
            "id": "vetiver",
            "title": "Vetiver",
            "slogan": "Version, share, deploy, and monitor models.",
            "description": "The goal of vetiver is to provide fluent tooling to version, deploy, and monitor a trained model. Functions handle creating model objects, versioning models, predicting from a remote API endpoint, deploying Dockerfiles, and more.",
            "github": "rstudio/vetiver-python",
            "pip": "vetiver",
            "code_example": [
                "import spacy",
                "from vetiver import VetiverModel, VetiverAPI",
                "",
                "# If you use this model, you'll need to download it first:",
                "# python -m spacy download en_core_web_md",
                "nlp = spacy.load('en_core_web_md')",
                "# Create deployable model object with your nlp Language object",
                "v = VetiverModel(nlp, model_name = 'my_model')",
                "# Try out your API endpoint locally",
                "VetiverAPI(v).run()"
            ],
            "code_language": "python",
            "url": "https://vetiver.rstudio.com/",
            "thumb": "https://raw.githubusercontent.com/rstudio/vetiver-python/main/docs/figures/square-logo.svg",
            "author": "Posit, PBC",
            "author_links": {
                "twitter": "posit_pbc",
                "github": "rstudio",
                "website": "https://posit.co/"
            },
            "category": ["apis", "standalone"],
            "tags": ["apis", "deployment"]
        },
        {
            "id": "span_marker",
            "title": "SpanMarker",
            "slogan": "Effortless state-of-the-art NER in spaCy",
            "description": "The SpanMarker integration with spaCy allows you to seamlessly replace the default spaCy `\"ner\"` pipeline component with any [SpanMarker model available on the Hugging Face Hub](https://huggingface.co/models?library=span-marker). Through this, you can take advantage of the advanced Named Entity Recognition capabilities of SpanMarker within the familiar and powerful spaCy framework.\n\nBy default, the `span_marker` pipeline component uses a [SpanMarker model using RoBERTa-large trained on OntoNotes v5.0](https://huggingface.co/tomaarsen/span-marker-roberta-large-ontonotes5). This model reaches a competitive 91.54 F1, notably higher than the [85.5 and 89.8 F1](https://spacy.io/usage/facts-figures#section-benchmarks) from `en_core_web_lg` and `en_core_web_trf`, respectively. A short head-to-head between this SpanMarker model and the `trf` spaCy model has been posted [here](https://github.com/tomaarsen/SpanMarkerNER/pull/12).\n\nAdditionally, see [here](https://tomaarsen.github.io/SpanMarkerNER/notebooks/spacy_integration.html) for documentation on using SpanMarker with spaCy.",
            "github": "tomaarsen/SpanMarkerNER",
            "pip": "span_marker",
            "code_example": [
                "import spacy",
                "",
                "nlp = spacy.load(\"en_core_web_sm\", exclude=[\"ner\"])",
                "nlp.add_pipe(\"span_marker\", config={\"model\": \"tomaarsen/span-marker-roberta-large-ontonotes5\"})",
                "",
                "text = \"\"\"Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the \\",
                "Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her \\",
                "death in 30 BCE.\"\"\"",
                "doc = nlp(text)",
                "print([(entity, entity.label_) for entity in doc.ents])",
                "# [(Cleopatra VII, \"PERSON\"), (Cleopatra the Great, \"PERSON\"), (the Ptolemaic Kingdom of Egypt, \"GPE\"),",
                "# (69 BCE, \"DATE\"), (Egypt, \"GPE\"), (51 BCE, \"DATE\"), (30 BCE, \"DATE\")]"
            ],
            "code_language": "python",
            "url": "https://tomaarsen.github.io/SpanMarkerNER",
            "author": "Tom Aarsen",
            "author_links": {
                "github": "tomaarsen",
                "website": "https://www.linkedin.com/in/tomaarsen"
            },
            "category": ["pipeline", "standalone", "scientific"],
            "tags": ["ner"]
        },
        {
            "id": "hobbit-spacy",
            "title": "Hobbit spaCy",
            "slogan": "NLP for Middle Earth",
            "description": "Hobbit spaCy is a custom spaCy pipeline designed specifically for working with Middle Earth and texts from the world of J.R.R. Tolkien.",
            "github": "wjbmattingly/hobbit-spacy",
            "pip": "en-hobbit",
            "code_example": [
                "import spacy",
                "",
                "nlp = spacy.load('en_hobbit')",
                "doc = nlp('Frodo saw Glorfindel and Glóin; and in a corner alone Strider was sitting, clad in his old travel - worn clothes again')"
            ],
            "code_language": "python",
            "thumb": "https://github.com/wjbmattingly/hobbit-spacy/blob/main/images/hobbit-thumbnail.png?raw=true",
            "image": "https://github.com/wjbmattingly/hobbit-spacy/raw/main/images/hobbitspacy.png",
            "author": "W.J.B. Mattingly",
            "author_links": {
                "twitter": "wjb_mattingly",
                "github": "wjbmattingly",
                "website": "https://wjbmattingly.com"
            },
            "category": ["pipeline", "standalone"],
            "tags": ["spans", "rules", "ner"]
        },
        {
            "id": "rolegal",
            "title": "A spaCy Package for Romanian Legal Document Processing",
            "thumb": "https://raw.githubusercontent.com/senisioi/rolegal/main/img/paper200x200.jpeg",
            "slogan": "rolegal: a spaCy Package for Noisy Romanian Legal Document Processing",
            "description": "This is a spaCy language model for Romanian legal domain trained with floret 4-gram to 5-gram embeddings and `LEGAL` entity recognition. Useful for processing OCR-resulted noisy legal documents.",
            "github": "senisioi/rolegal",
            "pip": "ro-legal-fl",
            "tags": ["legal", "floret", "ner", "romanian"],
            "code_example": [
                "import spacy",
                "nlp = spacy.load(\"ro_legal_fl\")",
                "",
                "doc = nlp(\"Titlul III din LEGEA nr. 255 din 19 iulie 2013, publicată în MONITORUL OFICIAL\")",
                "# legal entity identification",
                "for entity in doc.ents:",
                "    print('entity: ', entity, '; entity type: ', entity.label_)",
                "",
                "# floret n-gram embeddings robust to typos",
                "print(nlp('achizit1e public@').similarity(nlp('achiziții publice')))",
                "# 0.7393895566928835",
                "print(nlp('achizitii publice').similarity(nlp('achiziții publice')))",
                "# 0.8996480808279399"
            ],
            "author": "Sergiu Nisioi",
            "author_links": {
                "github": "senisioi",
                "website": "https://nlp.unibuc.ro/people/snisioi.html"
            },
            "category": ["pipeline", "training", "models"]
        }
    ],

    "categories": [
        {
            "label": "Projects",
            "items": [
                {
                    "id": "pipeline",
                    "title": "Pipeline",
                    "description": "Custom pipeline components and extensions"
                },
                {
                    "id": "training",
                    "title": "Training",
                    "description": "Helpers and toolkits for training spaCy models"
                },
                {
                    "id": "conversational",
                    "title": "Conversational",
                    "description": "Frameworks and utilities for working with conversational text, e.g. for chat bots"
                },
                {
                    "id": "research",
                    "title": "Research",
                    "description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
                },
                {
                    "id": "scientific",
                    "title": "Scientific",
                    "description": "Frameworks and utilities for scientific text processing"
                },
                {
                    "id": "biomedical",
                    "title": "Biomedical",
                    "description": "Frameworks and utilities for processing biomedical text"
                },
                {
                    "id": "visualizers",
                    "title": "Visualizers",
                    "description": "Demos and tools to visualize NLP annotations or systems"
                },
                {
                    "id": "apis",
                    "title": "Containers & APIs",
                    "description": "Infrastructure tools for managing or deploying spaCy"
                },
                {
                    "id": "nonpython",
                    "title": "Non-Python",
                    "description": "Wrappers, bindings and implementations in other programming languages"
                },
                {
                    "id": "standalone",
                    "title": "Standalone",
                    "description": "Self-contained libraries or tools that use spaCy under the hood"
                },
                {
                    "id": "models",
                    "title": "Models",
                    "description": "Third-party pretrained models for different languages and domains"
                }
            ]
        },
        {
            "label": "Education",
            "items": [
                {
                    "id": "books",
                    "title": "Books",
                    "description": "Books about or featuring spaCy"
                },
                {
                    "id": "courses",
                    "title": "Courses",
                    "description": "Online courses and interactive tutorials"
                },
                {
                    "id": "videos",
                    "title": "Videos",
                    "description": "Talks and tutorials in video format"
                },
                {
                    "id": "podcasts",
                    "title": "Podcasts",
                    "description": "Episodes about spaCy or interviews with the spaCy team"
                }
            ]
        }
    ]
}
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								{
 								    "resources": [
-												Add spaCy VSCode extension materials (#12592)


											
										
										
											2023-05-19 15:38:53 +03:00
+								        {
 								            "id": "spacy-vscode",
 								            "title": "spaCy Visual Studio Code Extension",
 								            "thumb": "https://raw.githubusercontent.com/explosion/spacy-vscode/main/icon.png",
 								            "slogan": "Work with spaCy's config files in VS Code",
 								            "description": "The spaCy VS Code Extension provides additional tooling and features for working with spaCy's config files. Version 1.0.0 includes hover descriptions for registry functions, variables, and section names within the config as an installable extension.",
 								            "url": "https://marketplace.visualstudio.com/items?itemName=Explosion.spacy-extension",
 								            "github": "explosion/spacy-vscode",
 								            "code_language": "python",
 								            "author": "Explosion",
 								            "author_links": {
 								                "twitter": "@explosion_ai",
 								                "github": "explosion"
 								            },
 								            "category": ["extension"],
 								            "tags": []
 								        },
-												added entry for SaysWho (#12828)

* Update universe.json

added entry for Sayswho

* Update universe.json

updated sayswho entry

* Update universe.json

* Update website/meta/universe.json

* Update website/meta/universe.json

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2023-07-31 11:52:32 +03:00
+								        {
 								            "id": "sayswho",
 								            "title": "SaysWho",
 								            "slogan": "Quote identification, attribution and resolution",
 								            "description": "A Python package for identifying and attributing quotes in text. It uses a combination of spaCy functionality, logic and grammar to find quotes and their speakers, then uses the spaCy coreferencing model to better clarify who is speaking. Currently English only.",
 								            "github": "afriedman412/sayswho",
 								            "pip": "sayswho",
 								            "code_language": "python",
 								            "author": "Andy Friedman",
 								            "author_links": {
 								                "twitter": "@steadynappin",
 								                "github": "afriedman412"
 								            },
 								            "code_example": [
 								                "from sayswho import SaysWho",
 								                "text = open(\"path/to/your/text_file.txt\").read()",
 								                "sw = SaysWho()",
 								                "sw.attribute(text)",
 								                "sw.expand_match() # see quote/cluster matches",
 								                "sw.render_to_html() # output your text, quotes and cluster matches to an html file called \"temp.html\""
 								            ],
 								            "category": ["standalone"],
 								            "tags": ["attribution", "coref", "text-processing"]
 								        },
-												parsigs universe (#12616)

* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:19:28 +03:00
+								        {
 								            "id": "parsigs",
 								            "title": "parsigs",
 								            "slogan": "Structuring prescriptions text made simple using spaCy",
 								            "description": "Parsigs is an open-source project that aims to extract the relevant dosage information from prescriptions text without compromising the patient's privacy.\n\nNotice you also need to install the model in order to use the package: `pip install https://huggingface.co/royashcenazi/en_parsigs/resolve/main/en_parsigs-any-py3-none-any.whl`",
 								            "github": "royashcenazi/parsigs",
 								            "pip": "parsigs",
 								            "code_language": "python",
 								            "author": "Roy Ashcenazi",
 								            "code_example": [
 								                "# You'll need to install the trained model, see instructions in the description section",
 								                "from parsigs.parse_sig_api import StructuredSig, SigParser",
 								                "sig_parser = SigParser()",
 								                "",
 								                "sig = 'Take 1 tablet of ibuprofen 200mg 3 times every day for 3 weeks'",
 								                "parsed_sig = sig_parser.parse(sig)"
 								            ],
 								            "author_links": {
 								                "github": "royashcenazi"
 								            },
-												Parsigs universe 3 (#12617)

* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

* added biomedical category

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:49:51 +03:00
+								            "category": ["model", "research", "biomedical"],
-												parsigs universe (#12616)

* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:19:28 +03:00
+								            "tags": ["sigs", "prescription","pharma"]
 								        },
-												Add LatinCy models to universe.json (#12597)

* Add LatinCy models to universe.json

* Update website/meta/universe.json

Add install code for LatinCy models to 'code_example'

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update LatinCy ‘code_example’ in website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2023-05-09 13:02:45 +03:00
+								        {
 								            "id": "latincy",
 								            "title": "LatinCy",
 								            "thumb": "https://raw.githubusercontent.com/diyclassics/la_core_web_lg/main/latincy-logo.png",
 								            "slogan": "Synthetic trained spaCy pipelines for Latin NLP",
 								            "description": "Set of trained general purpose Latin-language 'core' pipelines for use with spaCy. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other.",
 								            "url": "https://huggingface.co/latincy",
 								            "code_example": [
 								                "# pip install https://huggingface.co/latincy/la_core_web_lg/resolve/main/la_core_web_lg-any-py3-none-any.whl",
 								                "import spacy",
 								                "nlp = spacy.load('la_core_web_lg')",
-												Fix typo (#12615)


											
										
										
											2023-05-09 16:52:34 +03:00
+								                "doc = nlp('Haec narrantur a poetis de Perseo')",
-												Add LatinCy models to universe.json (#12597)

* Add LatinCy models to universe.json

* Update website/meta/universe.json

Add install code for LatinCy models to 'code_example'

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update LatinCy ‘code_example’ in website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2023-05-09 13:02:45 +03:00
+								                "",
 								                "print(f'{doc[0].text}, {doc[0].norm_}, {doc[0].lemma_}, {doc[0].pos_}')",
 								                "",
 								                "# > Haec, haec, hic, DET"
 								            ],
 								            "code_language": "python",
 								            "author": "Patrick J. Burns",
 								            "author_links": {
 								                "twitter": "@diyclassics",
 								                "github": "diyclassics",
 								                "website": "https://diyclassics.github.io/"
 								            },
 								            "category": ["pipeline", "research"],
 								            "tags": ["latin"]
-												parsigs universe (#12616)

* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:19:28 +03:00
+								        },
-												Added OdyCy to spaCy Universe (#12826)

* Added OdyCy to spaCy Universe

* Replaced template tags

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2023-07-26 17:05:53 +03:00
+								        {
 								            "id": "odycy",
 								            "title": "OdyCy",
 								            "slogan": "General-purpose language pipelines for premodern Greek.",
 								            "description": "Academically validated modular NLP pipelines for premodern Greek. odyCy achieves state of the art performance on multiple tasks on unseen test data from the Universal Dependencies Perseus treebank, and performs second best on the PROIEL treebank’s test set on even more tasks. In addition performance also seems relatively stable across the two evaluation datasets in comparison with other NLP pipelines. OdyCy is being used at the Center for Humanities Computing for preprocessing and analyzing Ancient Greek corpora for New Testament research, meaning that you can expect consistent maintenance and improvements.",
 								            "github": "centre-for-humanities-computing/odyCy",
 								            "code_example": [
 								                "# To install the high-accuracy transformer-based pipeline",
 								                "# pip install https://huggingface.co/chcaa/grc_odycy_joint_trf/resolve/main/grc_odycy_joint_trf-any-py3-none-any.whl",
 								                "import spacy",
 								                "",
 								                "nlp = spacy.load('grc_odycy_joint_trf')",
 								                "",
 								                "doc = nlp('τὴν γοῦν Ἀττικὴν ἐκ τοῦ ἐπὶ πλεῖστον διὰ τὸ λεπτόγεων ἀστασίαστον οὖσαν ἄνθρωποι ᾤκουν οἱ αὐτοὶ αἰεί.')"
 								            ],
 								            "code_language": "python",
 								            "url": "https://centre-for-humanities-computing.github.io/odyCy/",
 								            "thumb": "https://raw.githubusercontent.com/centre-for-humanities-computing/odyCy/7b94fec60679d06272dca88a4dcfe0f329779aea/docs/_static/logo.svg",
 								            "image": "https://github.com/centre-for-humanities-computing/odyCy/raw/main/docs/_static/logo_with_text_below.svg",
 								            "author": "Jan Kostkan, Márton Kardos (Center for Humanities Computing, Aarhus University)",
 								            "author_links": {
 								                "github": "centre-for-humanities-computing",
 								                "website": "https://chc.au.dk/"
 								            },
 								            "category": ["pipeline", "standalone", "research"],
 								            "tags": ["ancient Greek"]
 								        },
-												Add spacy-wasm to universe (#12572)

* add spacy-wasm to universe

* add tag
											
										
										
											2023-04-26 15:18:40 +03:00
+								        {
 								            "id": "spacy-wasm",
 								            "title": "spacy-wasm",
 								            "slogan": "spaCy in the browser using WebAssembly",
 								            "description": "Run spaCy directly in the browser with WebAssembly. Using Pyodide, the application loads the spaCy model and renders the text prompt with displaCy.",
 								            "url": "https://spacy-wasm.vercel.app/",
 								            "github": "SyedAhkam/spacy-wasm",
 								            "code_language": "python",
 								            "author": "Syed Ahkam",
 								            "author_links": {
 								                "twitter": "@SyedAhkam1",
 								                "github": "SyedAhkam"
 								            },
 								            "category": ["visualizers"],
 								            "tags": ["visualization", "deployment"]
 								        },
-												add spacysee project (#12568)


											
										
										
											2023-04-25 13:30:19 +03:00
+								        {
 								            "id": "spacysee",
 								            "title": "spaCysee",
 								            "slogan": "Visualize spaCy's Dependency Parsing, POS tagging, and morphological analysis",
 								            "description": "A project that helps you visualize your spaCy docs in Jupyter notebooks. Each of the dependency tags, POS tags and morphological features are clickable. Clicking on a tag will bring up the relevant documentation for that tag.",
 								            "github": "moxley01/spacysee",
 								            "pip": "spacysee",
 								            "code_example": [
 								                "import spacy",
 								                "from spacysee import render",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "doc = nlp('This is a neat way to visualize your spaCy docs')",
 								                "render(doc, width='500', height='500')"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://www.mattoxley.com/static/images/spacysee_logo.svg",
 								            "image": "https://www.mattoxley.com/static/images/spacysee_logo.svg",
 								            "author": "Matt Oxley",
 								            "author_links": {
 								                "twitter": "matt0xley",
 								                "github": "moxley01",
 								                "website": "https://mattoxley.com"
 								            },
 								            "category": ["visualizers"],
 								            "tags": ["visualization"]
 								        },
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								        {
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								            "id": "grecy",
 								            "title": "greCy",
 								            "slogan": "Ancient Greek pipelines for spaCy",
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								            "description": "greCy offers state-of-the-art pipelines for ancient Greek NLP. It installs language models available in various sizes, some of them containing either word vectors or the aristoBERTo transformer.",
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								            "github": "jmyerston/greCy",
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								            "pip": "grecy",
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								            "code_example": [
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								                "python -m grecy install grc_proiel_trf",
 								                "",
 								                "#After installing grc_proiel_trf or any other model",
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								                "import spacy",
 								                "",
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								                "nlp = spacy.load('grc_proiel_trf')",
 								                "doc = nlp('δοκῶ μοι περὶ ὧν πυνθάνεσθε οὐκ ἀμελέτητος εἶναι')",
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								                "",
 								                "for token in doc:",
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								                "   print(f'{token.text}, lemma: {token.lemma_}, pos: {token.pos_}, dep: {token.dep_}')"
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								            ],
 								            "code_language": "python",
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								            "thumb": "https://jacobo-syntax.hf.space/media/03a5317fa660c142e41dd2870b4273ce4e668e6fcdee0a276891f563.png",
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								            "author": "Jacobo Myerston",
 								            "author_links": {
 								                "twitter": "@jcbmyrstn",
 								                "github": "jmyerston",
 								                "website": "https://huggingface.co/spaces/Jacobo/syntax"
 								            },
-												Update universe.json (#12709)

* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
											
										
										
											2023-06-12 14:55:20 +03:00
+								            "category": ["pipeline", "research","models"],
-												Add greCy to Universe (#11774)

* Update universe.json

* Update universe.json

fixes Github value
											
										
										
											2022-11-10 07:21:20 +03:00
+								            "tags": ["ancient Greek"]
 								        },
-												Adding `spacy-cleaner` to the spaCy universe (#11674)

* added spacy-cleaner to the spaCy universe

* Move data to righ section of universe.json

* Cleanup

- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2022-10-20 14:38:29 +03:00
+								        {
 								            "id": "spacy-cleaner",
 								            "title": "spacy-cleaner",
 								            "slogan": "Easily clean text with spaCy!",
 								            "description": "**spacy-cleaner** utilises spaCy `Language` models to replace, remove, and \n  mutate spaCy tokens. Cleaning actions available are:\n\n* Remove/replace stopwords.\n* Remove/replace punctuation.\n* Remove/replace numbers.\n* Remove/replace emails.\n* Remove/replace URLs.\n* Perform lemmatisation.\n\nSee our [docs](https://ce11an.github.io/spacy-cleaner/) for more information.",
 								            "github": "Ce11an/spacy-cleaner",
 								            "pip": "spacy-cleaner",
 								            "code_example": [
 								                "import spacy",
 								                "import spacy_cleaner",
 								                "from spacy_cleaner.processing import removers, replacers, mutators",
 								                "",
 								                "model = spacy.load(\"en_core_web_sm\")",
 								                "pipeline = spacy_cleaner.Pipeline(",
 								                "    model,",
 								                "    removers.remove_stopword_token,",
 								                "    replacers.replace_punctuation_token,",
 								                "    mutators.mutate_lemma_token,",
 								                ")",
 								                "",
 								                "texts = [\"Hello, my name is Cellan! I love to swim!\"]",
 								                "",
 								                "pipeline.clean(texts)",
 								                "# ['hello _IS_PUNCT_ Cellan _IS_PUNCT_ love swim _IS_PUNCT_']"
 								            ],
 								            "code_language": "python",
 								            "url": "https://ce11an.github.io/spacy-cleaner/",
 								            "image": "https://raw.githubusercontent.com/Ce11an/spacy-cleaner/main/docs/assets/images/spacemen.png",
 								            "author": "Cellan Hall",
 								            "author_links": {
 								                "twitter": "Ce11an",
 								                "github": "Ce11an",
 								                "website": "https://www.linkedin.com/in/cellan-hall/"
 								            },
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "category": ["extension"],
 								            "tags": ["text-processing"]
-												Adding `spacy-cleaner` to the spaCy universe (#11674)

* added spacy-cleaner to the spaCy universe

* Move data to righ section of universe.json

* Cleanup

- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2022-10-20 14:38:29 +03:00
+								        },
-												Add Zshot Spacy plugin (#11557)

* Add Zshot Spacy plugin

Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-09-29 18:34:44 +03:00
+								        {
 								            "id": "Zshot",
 								            "title": "Zshot",
 								            "slogan": "Zero and Few shot named entity & relationships recognition",
 								            "github": "ibm/zshot",
 								            "pip": "zshot",
 								            "code_example": [
 								                "import spacy",
 								                "from zshot import PipelineConfig, displacy",
 								                "from zshot.linker import LinkerRegen",
 								                "from zshot.mentions_extractor import MentionsExtractorSpacy",
 								                "from zshot.utils.data_models import Entity",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "# zero shot definition of entities",
 								                "nlp_config = PipelineConfig(",
 								                "    mentions_extractor=MentionsExtractorSpacy(),",
 								                "    linker=LinkerRegen(),",
 								                "    entities=[",
 								                "        Entity(name='Paris',",
 								                "               description='Paris is located in northern central France, in a north-bending arc of the river Seine'),",
 								                "        Entity(name='IBM',",
 								                "               description='International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York'),",
 								                "        Entity(name='New York', description='New York is a city in U.S. state'),",
 								                "        Entity(name='Florida', description='southeasternmost U.S. state'),",
 								                "        Entity(name='American',",
 								                "              description='American, something of, from, or related to the United States of America, commonly known as the United States or America'),",
 								                "        Entity(name='Chemical formula',",
 								                "               description='In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecul'),",
 								                "        Entity(name='Acetamide',",
 								                "               description='Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent.'),",
 								                "        Entity(name='Armonk',",
 								                "               description='Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States.'),",
 								                "        Entity(name='Acetic Acid',",
 								                "               description='Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH'),",
 								                "        Entity(name='Industrial solvent',",
 								                "               description='Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent.'),",
 								                "    ]",
 								                ")",
 								                "nlp.add_pipe('zshot', config=nlp_config, last=True)",
 								                "",
 								                "text = 'International Business Machines Corporation (IBM) is an American multinational technology corporation' \\",
 								                "        ' headquartered in Armonk, New York, with operations in over 171 countries.'",
 								                "",
 								                "doc = nlp(text)",
 								                "displacy.serve(doc, style='ent')"
 								            ],
 								            "thumb": "https://ibm.github.io/zshot/img/graph.png",
 								            "url": "https://ibm.github.io/zshot/",
 								            "author": "IBM Research",
 								            "author_links": {
 								                "github": "ibm",
 								                "twitter": "IBMResearch",
 								                "website": "https://research.ibm.com/labs/ireland/"
 								            },
 								            "category": ["scientific", "models", "research"]
 								        },
-												chore: add 'concepCy' to spacy universe (#11255)

* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
											
										
										
											2022-08-04 09:42:38 +03:00
+								        {
 								            "id": "concepcy",
 								            "title": "concepCy",
 								            "slogan": "A multilingual knowledge graph in spaCy",
 								            "description": "A spaCy wrapper for ConceptNet, a freely-available semantic network designed to help computers understand the meaning of words.",
 								            "github": "JulesBelveze/concepcy",
 								            "pip": "concepcy",
 								            "code_example": [
 								                "import spacy",
 								                "import concepcy",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "# Using default concepCy configuration",
 								                "nlp.add_pipe('concepcy')",
 								                "",
 								                "doc = nlp('WHO is a lovely company')",
 								                "",
 								                "# Access all the 'RelatedTo' relations from the Doc",
 								                "for word, relations in doc._.relatedto.items():",
 								                "    print(f'Word: {word}\n{relations}')",
 								                "",
 								                "# Access the 'RelatedTo' relations word by word",
 								                "for token in doc:",
 								                "    print(f'Word: {token}\n{token._.relatedto}')"
 								            ],
 								            "category": ["pipeline"],
 								            "image": "https://github.com/JulesBelveze/concepcy/blob/main/figures/concepcy.png",
 								            "tags": ["semantic", "ConceptNet"],
 								            "author": "Jules Belveze",
 								            "author_links": {
 								                "github": "JulesBelveze",
 								                "website": "https://www.linkedin.com/in/jules-belveze/"
 								            }
 								        },
-												updated spacy universe for spacyfishing

											
										
										
											2022-06-20 15:28:49 +03:00
+								        {
 								            "id": "spacyfishing",
 								            "title": "spaCy fishing",
 								            "slogan": "Named entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.",
 								            "description": "A spaCy wrapper of Entity-Fishing for named entity disambiguation and linking against a Wikidata knowledge base.",
 								            "github": "Lucaterre/spacyfishing",
 								            "pip": "spacyfishing",
 								            "code_example": [
 								                "import spacy",
 								                "text = 'Victor Hugo and Honoré de Balzac are French writers who lived in Paris.'",
 								                "nlp = spacy.load('en_core_web_sm')",
-												correct typo in universe.json for 'code_example' key : pipe name 'entityfishing'

											
										
										
											2022-06-20 16:26:23 +03:00
+								                "nlp.add_pipe('entityfishing')",
-												updated spacy universe for spacyfishing

											
										
										
											2022-06-20 15:28:49 +03:00
+								                "doc = nlp(text)",
 								                "for span in doc.ents:",
 								                "    print((ent.text, ent.label_, ent._.kb_qid, ent._.url_wikidata, ent._.nerd_score))",
 								                "# ('Victor Hugo', 'PERSON', 'Q535', 'https://www.wikidata.org/wiki/Q535', 0.972)",
 								                "# ('Honoré de Balzac', 'PERSON', 'Q9711', 'https://www.wikidata.org/wiki/Q9711', 0.9724)",
 								                "# ('French', 'NORP', 'Q121842', 'https://www.wikidata.org/wiki/Q121842', 0.3739)",
 								                "# ('Paris', 'GPE', 'Q90', 'https://www.wikidata.org/wiki/Q90', 0.5652)",
 								                "## Set parameter `extra_info` to `True` and check also span._.description, span._.src_description, span._.normal_term, span._.other_ids"
 								            ],
 								            "category": ["models", "pipeline"],
-												Update meta for spacyfishing in spaCy Universe (#11185)

* add new logo for spacyfishing to update spacy universe

* change logo location
											
										
										
											2022-07-24 11:10:29 +03:00
+								            "image": "https://raw.githubusercontent.com/Lucaterre/spacyfishing/main/docs/spacyfishing-logo-resized.png",
-												updated spacy universe for spacyfishing

											
										
										
											2022-06-20 15:28:49 +03:00
+								            "tags": ["NER", "NEL"],
 								            "author": "Lucas Terriel",
 								            "author_links": {
 								                "twitter": "TerreLuca",
 								                "github": "Lucaterre"
 								            }
 								        },
-												Add "Aim-spaCy" to spaCy Universe (#10943)

* Add Aim-spaCy to spaCy universe

* Update Aim thumbnail

* Fix author links

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2022-06-10 12:33:17 +03:00
+								        {
 								            "id": "aim-spacy",
 								            "title": "Aim-spaCy",
 								            "slogan": "Aim-spaCy is an Aim-based spaCy experiment tracker.",
 								            "description": "Aim-spaCy helps to easily collect, store and explore training logs for spaCy, including: hyper-parameters, metrics and displaCy visualizations",
 								            "github": "aimhubio/aim-spacy",
 								            "pip": "aim-spacy",
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "code_example": ["https://github.com/aimhubio/aim-spacy/tree/master/examples"],
-												Add "Aim-spaCy" to spaCy Universe (#10943)

* Add Aim-spaCy to spaCy universe

* Update Aim thumbnail

* Fix author links

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2022-06-10 12:33:17 +03:00
+								            "code_language": "python",
 								            "url": "https://aimstack.io/spacy",
 								            "thumb": "https://user-images.githubusercontent.com/13848158/172912427-ee9327ea-3cd8-47fa-8427-6c0d36cd831f.png",
 								            "image": "https://user-images.githubusercontent.com/13848158/136364717-0939222c-55b6-44f0-ad32-d9ab749546e4.png",
 								            "author": "AimStack",
 								            "author_links": {
 								                "twitter": "aimstackio",
 								                "github": "aimhubio",
 								                "website": "https://aimstack.io"
 								            },
 								            "category": ["visualizers"],
 								            "tags": ["experiment-tracking", "visualization"]
 								        },
-												Add spacy-report to universe (#10910)

* Add spacy-report to universe

* Remove extra comma

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2022-06-05 12:57:58 +03:00
+								        {
 								            "id": "spacy-report",
 								            "title": "spacy-report",
 								            "slogan": "Generates interactive reports for spaCy models.",
 								            "description": "The goal of spacy-report is to offer static reports for spaCy models that help users make better decisions on how the models can be used.",
 								            "github": "koaning/spacy-report",
 								            "pip": "spacy-report",
 								            "thumb": "https://github.com/koaning/spacy-report/raw/main/icon.png",
 								            "image": "https://raw.githubusercontent.com/koaning/spacy-report/main/gif.gif",
 								            "code_example": [
 								                "python -m spacy report textcat training/model-best/ corpus/train.spacy corpus/dev.spacy"
 								            ],
 								            "category": ["visualizers", "research"],
 								            "author": "Vincent D. Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning",
 								                "website": "https://koaning.io"
 								            }
 								        },
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								        {
 								            "id": "scrubadub_spacy",
 								            "title": "scrubadub_spacy",
 								            "category": ["pipeline"],
 								            "slogan": "Remove personally identifiable information from text using spaCy.",
 								            "description": "scrubadub removes personally identifiable information from text. scrubadub_spacy is an extension that uses spaCy NLP models to remove personal information from text.",
 								            "github": "LeapBeyond/scrubadub_spacy",
 								            "pip": "scrubadub-spacy",
 								            "url": "https://github.com/LeapBeyond/scrubadub_spacy",
 								            "code_language": "python",
 								            "author": "Leap Beyond",
 								            "author_links": {
-												Fix some of the broken links on universe pages (#11011)

Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
											
										
										
											2022-06-23 18:53:00 +03:00
+								                "github": "LeapBeyond",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "website": "https://leapbeyond.ai"
 								            },
 								            "code_example": [
 								                "import scrubadub, scrubadub_spacy",
 								                "scrubber = scrubadub.Scrubber()",
 								                "scrubber.add_detector(scrubadub_spacy.detectors.SpacyEntityDetector)",
 								                "print(scrubber.clean(\"My name is Alex, I work at LifeGuard in London, and my eMail is alex@lifeguard.com btw. my super secret twitter login is username: alex_2000 password: g-dragon180888\"))",
 								                "# My name is {{NAME}}, I work at {{ORGANIZATION}} in {{LOCATION}}, and my eMail is {{EMAIL}} btw. my super secret twitter login is username: {{USERNAME}} password: {{PASSWORD}}"
 								            ]
 								        },
 								        {
 								            "id": "spacy-setfit-textcat",
 								            "title": "spacy-setfit-textcat",
 								            "category": ["research"],
 								            "tags": ["SetFit", "Few-Shot"],
 								            "slogan": "spaCy Project: Experiments with SetFit & Few-Shot Classification",
 								            "description": "This project is an experiment with spaCy and few-shot text classification using SetFit",
 								            "github": "pmbaumgartner/spacy-setfit-textcat",
 								            "url": "https://github.com/pmbaumgartner/spacy-setfit-textcat",
 								            "code_language": "python",
 								            "author": "Peter Baumgartner",
 								            "author_links": {
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								                "twitter": "pmbaumgartner",
-												Fix some of the broken links on universe pages (#11011)

Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
											
										
										
											2022-06-23 18:53:00 +03:00
+								                "github": "pmbaumgartner",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "website": "https://www.peterbaumgartner.com/"
 								            },
 								            "code_example": [
 								                "https://colab.research.google.com/drive/1CvGEZC0I9_v8gWrBxSJQ4Z8JGPJz-HYb?usp=sharing"
 								            ]
 								        },
 								        {
 								            "id": "spacy-experimental",
 								            "title": "spacy-experimental",
 								            "category": ["extension"],
 								            "slogan": "Cutting-edge experimental spaCy components and features",
 								            "description": "This package includes experimental components and features for spaCy v3.x, for example model architectures, pipeline components and utilities.",
 								            "github": "explosion/spacy-experimental",
 								            "pip": "spacy-experimental",
 								            "url": "https://github.com/explosion/spacy-experimental",
 								            "code_language": "python",
 								            "author": "Explosion",
 								            "author_links": {
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								                "twitter": "explosion_ai",
-												Fix some of the broken links on universe pages (#11011)

Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
											
										
										
											2022-06-23 18:53:00 +03:00
+								                "github": "explosion",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "website": "https://explosion.ai/"
 								            },
 								            "code_example": [
 								                "python -m pip install -U pip setuptools wheel",
 								                "python -m pip install spacy-experimental"
 								            ]
 								        },
-												Added spacypdfreader to universe.json (#9963)


											
										
										
											2022-01-03 10:34:36 +03:00
+								        {
 								            "id": "spacypdfreader",
-												update universe for spacypdfreader (#12661)


											
										
										
											2023-05-23 14:28:48 +03:00
+								            "title": "spacypdfreader",
-												Added spacypdfreader to universe.json (#9963)


											
										
										
											2022-01-03 10:34:36 +03:00
+								            "category": ["pipeline"],
 								            "tags": ["PDF"],
 								            "slogan": "Easy PDF to text to spaCy text extraction in Python.",
 								            "description": "*spacypdfreader* is a Python library that allows you to convert PDF files directly into *spaCy* `Doc` objects. The library provides several built in parsers or bring your own parser. `Doc` objects are annotated with several custom attributes including: `token._.page_number`, `doc._.page_range`, `doc._.first_page`, `doc._.last_page`, `doc._.pdf_file_name`, and `doc._.page(int)`.",
 								            "github": "SamEdwardes/spacypdfreader",
 								            "pip": "spacypdfreader",
 								            "url": "https://samedwardes.github.io/spacypdfreader/",
 								            "code_language": "python",
 								            "author": "Sam Edwardes",
 								            "author_links": {
 								                "twitter": "TheReaLSamlam",
 								                "github": "SamEdwardes",
 								                "website": "https://samedwardes.com"
 								            },
 								            "code_example": [
 								                "import spacy",
-												update universe for spacypdfreader (#12661)


											
										
										
											2023-05-23 14:28:48 +03:00
+								                "from spacypdfreader.spacypdfreader import pdf_reader",
-												Added spacypdfreader to universe.json (#9963)


											
										
										
											2022-01-03 10:34:36 +03:00
+								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "doc = pdf_reader('tests/data/test_pdf_01.pdf', nlp)",
 								                "",
 								                "# Get the page number of any token.",
 								                "print(doc[0]._.page_number)  # 1",
 								                "print(doc[-1]._.page_number) # 4",
 								                "",
 								                "# Get page meta data about the PDF document.",
 								                "print(doc._.pdf_file_name)   # 'tests/data/test_pdf_01.pdf'",
 								                "print(doc._.page_range)      # (1, 4)",
 								                "print(doc._.first_page)      # 1",
 								                "print(doc._.last_page)       # 4",
 								                "",
 								                "# Get all of the text from a specific PDF page.",
 								                "print(doc._.page(4))         # 'able to display the destination page (unless...'"
 								            ]
 								        },
-												Add NLP Cloud to Universe.

											
										
										
											2021-05-14 12:13:44 +03:00
+								        {
 								            "id": "nlpcloud",
 								            "title": "NLPCloud.io",
 								            "slogan": "Production-ready API for spaCy models in production",
 								            "description": "A highly-available hosted API to easily deploy and use spaCy models in production. Supports NER, POS tagging, dependency parsing, and tokenization.",
 								            "github": "nlpcloud",
 								            "pip": "nlpcloud",
 								            "code_example": [
 								                "import nlpcloud",
 								                "",
 								                "client = nlpcloud.Client('en_core_web_lg', '4eC39HqLyjWDarjtT1zdp7dc')",
 								                "client.entities('John Doe is a Go Developer at Google')",
 								                "# [{'end': 8, 'start': 0, 'text': 'John Doe', 'type': 'PERSON'}, {'end': 25, 'start': 13, 'text': 'Go Developer', 'type': 'POSITION'}, {'end': 35,'start': 30, 'text': 'Google', 'type': 'ORG'}]"
 								            ],
-												Fix universe.json and auto-format [ci skip]

											
										
										
											2021-06-14 03:18:06 +03:00
+								            "thumb": "https://avatars.githubusercontent.com/u/77671902",
 								            "image": "https://nlpcloud.io/assets/images/logo.svg",
-												Add NLP Cloud to Universe.

											
										
										
											2021-05-14 12:13:44 +03:00
+								            "code_language": "python",
 								            "author": "NLPCloud.io",
 								            "author_links": {
-												Fix universe.json and auto-format [ci skip]

											
										
										
											2021-06-14 03:18:06 +03:00
+								                "github": "nlpcloud",
 								                "twitter": "cloud_nlp",
 								                "website": "https://nlpcloud.io"
-												Add NLP Cloud to Universe.

											
										
										
											2021-05-14 12:13:44 +03:00
+								            },
 								            "category": ["apis", "nonpython", "standalone"],
 								            "tags": ["api", "deploy", "production"]
 								        },
-												Fix universe.json and auto-format [ci skip]

											
										
										
											2021-06-14 03:18:06 +03:00
+								        {
-												Update universe.json
											
										
										
											2021-05-12 05:18:19 +03:00
+								            "id": "eMFDscore",
 								            "title": "eMFDscore : Extended Moral Foundation Dictionary Scoring for Python",
 								            "slogan": "Extended Moral Foundation Dictionary Scoring for Python",
-												Update universe.json

fixed typo
											
										
										
											2021-05-13 17:40:05 +03:00
+								            "description": "eMFDscore is a library for the fast and flexible extraction of various moral information metrics from textual input data. eMFDscore is built on spaCy for faster execution and performs minimal preprocessing consisting of tokenization, syntactic dependency parsing, lower-casing, and stopword/punctuation/whitespace removal. eMFDscore lets users score documents with multiple Moral Foundations Dictionaries, provides various metrics for analyzing moral information, and extracts moral patient, agent, and attribute words related to entities.",
-												Various docs updates for v3.0 (#8353)

* Update cats score names in Scorer API docs

* Refer to performance in meta

* Update package naming/versions, lemmatizer details

* Minor formatting fixes

* Provide more explanation for cats_score_desc

* Provide language-specific lemmatizer defaults in API docs

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-06-14 13:19:36 +03:00
+								            "github": "medianeuroscience/emfdscore",
-												Update universe.json
											
										
										
											2021-05-12 05:18:19 +03:00
+								            "code_example": [
 								                "from emfdscore.scoring import score_docs",
 								                "import pandas as pd",
 								                "template_input = pd.read_csv('emfdscore/template_input.csv', header=None)",
 								                "DICT_TYPE = 'emfd'",
 								                "PROB_MAP = 'single'",
 								                "SCORE_METHOD = 'bow'",
-												Fix universe.json and auto-format [ci skip]

											
										
										
											2021-06-14 03:18:06 +03:00
+								                "OUT_METRICS = 'vice-virtue'",
 								                "OUT_CSV_PATH = 'single-vv.csv'",
 								                "df = score_docs(template_input,DICT_TYPE,PROB_MAP,SCORE_METHOD,OUT_METRICS,num_docs)"
-												Update universe.json
											
										
										
											2021-05-12 05:18:19 +03:00
+								            ],
 								            "code_language": "python",
 								            "author": "Media Neuroscience Lab",
 								            "author_links": {
-												Fix universe.json and auto-format [ci skip]

											
										
										
											2021-06-14 03:18:06 +03:00
+								                "github": "medianeuroscience",
 								                "twitter": "medianeuro"
-												Update universe.json
											
										
										
											2021-05-12 05:18:19 +03:00
+								            },
 								            "category": ["research", "teaching"],
 								            "tags": ["morality", "dictionary", "sentiment"]
-												Project Idea : denomme | Multilingual Name Detection (#7845)

* Add denomme

* spaCy contributor agreement

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2021-04-22 09:48:17 +03:00
+								        },
-												Fix universe.json and auto-format [ci skip]

											
										
										
											2021-06-14 03:18:06 +03:00
+								        {
-												adding skweak to the SpaCy universe

											
										
										
											2021-04-22 01:58:09 +03:00
+								            "id": "skweak",
 								            "title": "skweak",
 								            "slogan": "Weak supervision for NLP",
 								            "description": "`skweak` brings the power of weak supervision to NLP tasks, and in particular sequence labelling and text classification. Instead of annotating documents by hand, `skweak` allows you to define *labelling functions* to automatically label your documents, and then aggregate their results using a statistical model that estimates the accuracy and confusions of each labelling function.",
-												Fix skweak Github URL

Github entry should not contain url, just user/repo

											
										
										
											2021-05-24 14:31:43 +03:00
+								            "github": "NorskRegnesentral/skweak",
-												adding skweak to the SpaCy universe

											
										
										
											2021-04-22 01:58:09 +03:00
+								            "pip": "skweak",
 								            "code_example": [
 								                "import spacy, re",
 								                "from skweak import heuristics, gazetteers, aggregation, utils",
 								                "",
 								                "# LF 1: heuristic to detect occurrences of MONEY entities",
 								                "def money_detector(doc):",
 								                "   for tok in doc[1:]:",
 								                "      if tok.text[0].isdigit() and tok.nbor(-1).is_currency:",
 								                "          yield tok.i-1, tok.i+1, 'MONEY'",
 								                "lf1 = heuristics.FunctionAnnotator('money', money_detector)",
 								                "",
 								                "# LF 2: detection of years with a regex",
 								                "lf2= heuristics.TokenConstraintAnnotator ('years', lambda tok: re.match('(19|20)\\d{2}$', tok.text), 'DATE')",
 								                "",
 								                "# LF 3: a gazetteer with a few names",
 								                "NAMES = [('Barack', 'Obama'), ('Donald', 'Trump'), ('Joe', 'Biden')]",
 								                "trie = gazetteers.Trie(NAMES)",
 								                "lf3 = gazetteers.GazetteerAnnotator('presidents', {'PERSON':trie})",
 								                "",
 								                "# We create a corpus (here with a single text)",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "doc = nlp('Donald Trump paid $750 in federal income taxes in 2016')",
 								                "",
 								                "# apply the labelling functions",
 								                "doc = lf3(lf2(lf1(doc)))",
 								                "",
 								                "# and aggregate them",
 								                "hmm = aggregation.HMM('hmm', ['PERSON', 'DATE', 'MONEY'])",
 								                "hmm.fit_and_aggregate([doc])",
 								                "",
 								                "# we can then visualise the final result (in Jupyter)",
 								                "utils.display_entities(doc, 'hmm')"
 								            ],
 								            "code_language": "python",
 								            "url": "https://github.com/NorskRegnesentral/skweak",
 								            "thumb": "https://raw.githubusercontent.com/NorskRegnesentral/skweak/main/data/skweak_logo_thumbnail.jpg",
 								            "image": "https://raw.githubusercontent.com/NorskRegnesentral/skweak/main/data/skweak_logo.jpg",
 								            "author": "Pierre Lison",
 								            "author_links": {
 								                "twitter": "plison2",
 								                "github": "plison",
 								                "website": "https://www.nr.no/~plison"
 								            },
 								            "category": ["pipeline", "standalone", "research", "training"],
-												Support version tags in universe and add note about reporting (#10093)

* Support version tags in universe and add note about reporting

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-01-21 01:21:26 +03:00
+								            "tags": [],
 								            "spacy_version": 3
-												adding skweak to the SpaCy universe

											
										
										
											2021-04-22 01:58:09 +03:00
+								        },
-												Auto-format [ci skip]

											
										
										
											2021-04-22 03:58:05 +03:00
+								        {
 								            "id": "numerizer",
 								            "title": "numerizer",
 								            "slogan": "Convert natural language numerics into ints and floats.",
 								            "description": "A SpaCy extension for Docs, Spans and Tokens that converts numerical words and quantitative named entities into numeric strings.",
 								            "github": "jaidevd/numerizer",
 								            "pip": "numerizer",
 								            "code_example": [
 								                "from spacy import load",
 								                "import numerizer",
 								                "nlp = load('en_core_web_sm') # or any other model",
 								                "doc = nlp('The Hogwarts Express is at platform nine and three quarters')",
 								                "doc._.numerize()",
 								                "# {nine and three quarters: '9.75'}"
 								            ],
 								            "author": "Jaidev Deshpande",
 								            "author_links": {
 								                "github": "jaidevd",
 								                "twitter": "jaidevd"
 								            },
 								            "category": ["standalone"]
 								        },
-												added spacy-dbpedia-spotlight

											
										
										
											2021-02-12 21:06:51 +03:00
+								        {
 								            "id": "spacy-dbpedia-spotlight",
 								            "title": "DBpedia Spotlight for SpaCy",
 								            "slogan": "Use DBpedia Spotlight to link entities inside SpaCy",
 								            "description": "This library links SpaCy with [DBpedia Spotlight](https://www.dbpedia-spotlight.org/). You can easily get the DBpedia entities from your documents, using the public web service or by using your own instance of DBpedia Spotlight. The `doc.ents` are populated with the entities and all their details (URI, type, ...).",
 								            "github": "MartinoMensio/spacy-dbpedia-spotlight",
 								            "pip": "spacy-dbpedia-spotlight",
 								            "code_example": [
 								                "import spacy_dbpedia_spotlight",
 								                "# load your model as usual",
 								                "nlp = spacy.load('en_core_web_lg')",
 								                "# add the pipeline stage",
 								                "nlp.add_pipe('dbpedia_spotlight')",
 								                "# get the document",
 								                "doc = nlp('The president of USA is calling Boris Johnson to decide what to do about coronavirus')",
 								                "# see the entities",
 								                "print('Entities', [(ent.text, ent.label_, ent.kb_id_) for ent in doc.ents])",
 								                "# inspect the raw data from DBpedia spotlight",
 								                "print(doc.ents[0]._.dbpedia_raw_result)"
 								            ],
 								            "category": ["models", "pipeline"],
 								            "author": "Martino Mensio",
 								            "author_links": {
 								                "twitter": "MartinoMensio",
 								                "github": "MartinoMensio",
 								                "website": "https://martinomensio.github.io"
 								            }
 								        },
 								        {
-												Added spaCyTextBlob to universe.json (#6395)


											
										
										
											2020-11-17 16:38:34 +03:00
+								            "id": "spacy-textblob",
-												Updated spaCy universe for spacytextblob (#10335)

* Updated spacytextblob in universe.json

* Fixed json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added spacy_version tag to spacytextblob

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-02-24 08:18:10 +03:00
+								            "title": "spacytextblob",
 								            "slogan": "A TextBlob sentiment analysis pipeline component for spaCy.",
 								            "thumb": "https://github.com/SamEdwardes/spacytextblob/raw/main/docs/static/img/logo-thumb-square-250x250.png",
 								            "description": "spacytextblob is a pipeline component that enables sentiment analysis using the [TextBlob](https://github.com/sloria/TextBlob) library. It will add the additional extension `._.blob` to `Doc`, `Span`, and `Token` objects.",
 								            "github": "SamEdwardes/spacytextblob",
-												Added spaCyTextBlob to universe.json (#6395)


											
										
										
											2020-11-17 16:38:34 +03:00
+								            "pip": "spacytextblob",
 								            "code_example": [
-												update spaCy Universe: spacytextblob (code example)

											
										
										
											2022-05-12 19:23:00 +03:00
+								                "# the following installations are required",
 								                "# python -m textblob.download_corpora",
 								                "# python -m spacy download en_core_web_sm",
 								                "",
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								                "import spacy",
 								                "from spacytextblob.spacytextblob import SpacyTextBlob",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
-												Updates to universe.json for spaCyTextBlob (#7647)

* Updates to universe.json for spaCyTextBlob

Updated the documentation for spaCy 3.0.

* SamEdwardes.md

* Update SamEdwardes.md
											
										
										
											2021-04-04 21:17:57 +03:00
+								                "nlp.add_pipe('spacytextblob')",
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								                "text = 'I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy.'",
 								                "doc = nlp(text)",
-												Updated spaCy universe for spacytextblob (#10335)

* Updated spacytextblob in universe.json

* Fixed json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added spacy_version tag to spacytextblob

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-02-24 08:18:10 +03:00
+								                "doc._.blob.polarity                            # Polarity: -0.125",
 								                "doc._.blob.subjectivity                        # Subjectivity: 0.9",
 								                "doc._.blob.sentiment_assessments.assessments   # Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]",
 								                "doc._.blob.ngrams()                            # [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]"
-												Added spaCyTextBlob to universe.json (#6395)


											
										
										
											2020-11-17 16:38:34 +03:00
+								            ],
 								            "code_language": "python",
 								            "url": "https://spacytextblob.netlify.app/",
 								            "author": "Sam Edwardes",
 								            "author_links": {
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								                "twitter": "TheReaLSamlam",
 								                "github": "SamEdwardes",
 								                "website": "https://samedwardes.com"
-												Added spaCyTextBlob to universe.json (#6395)


											
										
										
											2020-11-17 16:38:34 +03:00
+								            },
 								            "category": ["pipeline"],
-												Updated spaCy universe for spacytextblob (#10335)

* Updated spacytextblob in universe.json

* Fixed json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added spacy_version tag to spacytextblob

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-02-24 08:18:10 +03:00
+								            "tags": ["sentiment", "textblob"],
 								            "spacy_version": 3
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								        },
-												Sentence transformers added to spaCy universe (#5814)

* fix details for spacy-universal-sentence-encoder

* added sentence-transformers
											
										
										
											2020-07-27 10:44:33 +03:00
+								        {
 								            "id": "spacy-sentence-bert",
-												Normalize spelling for spaCy (#5822)


											
										
										
											2020-07-27 11:09:33 +03:00
+								            "title": "spaCy - sentence-transformers",
 								            "slogan": "Pipelines for pretrained sentence-transformers (BERT, RoBERTa, XLM-RoBERTa & Co.) directly within spaCy",
-												Sentence transformers added to spaCy universe (#5814)

* fix details for spacy-universal-sentence-encoder

* added sentence-transformers
											
										
										
											2020-07-27 10:44:33 +03:00
+								            "description": "This library lets you use the embeddings from [sentence-transformers](https://github.com/UKPLab/sentence-transformers) of Docs, Spans and Tokens directly from spaCy. Most models are for the english language but three of them are multilingual.",
 								            "github": "MartinoMensio/spacy-sentence-bert",
 								            "pip": "spacy-sentence-bert",
 								            "code_example": [
 								                "import spacy_sentence_bert",
 								                "# load one of the models listed at https://github.com/MartinoMensio/spacy-sentence-bert/",
 								                "nlp = spacy_sentence_bert.load_model('en_roberta_large_nli_stsb_mean_tokens')",
 								                "# get two documents",
 								                "doc_1 = nlp('Hi there, how are you?')",
 								                "doc_2 = nlp('Hello there, how are you doing today?')",
 								                "# use the similarity method that is based on the vectors, on Doc, Span or Token",
 								                "print(doc_1.similarity(doc_2[0:7]))"
 								            ],
 								            "category": ["models", "pipeline"],
 								            "author": "Martino Mensio",
 								            "author_links": {
 								                "twitter": "MartinoMensio",
 								                "github": "MartinoMensio",
 								                "website": "https://martinomensio.github.io"
 								            }
 								        },
-												Fix and update universe.json [ci skip]

											
										
										
											2020-07-07 22:12:28 +03:00
+								        {
 								            "id": "spacy-streamlit",
 								            "title": "spacy-streamlit",
 								            "slogan": "spaCy building blocks for Streamlit apps",
 								            "github": "explosion/spacy-streamlit",
 								            "description": "This package contains utilities for visualizing spaCy models and building interactive spaCy-powered apps with [Streamlit](https://streamlit.io). It includes various building blocks you can use in your own Streamlit app, like visualizers for **syntactic dependencies**, **named entities**, **text classification**, **semantic similarity** via word vectors, token attributes, and more.",
 								            "pip": "spacy-streamlit",
 								            "category": ["visualizers"],
 								            "thumb": "https://i.imgur.com/mhEjluE.jpg",
 								            "image": "https://user-images.githubusercontent.com/13643239/85388081-f2da8700-b545-11ea-9bd4-e303d3c5763c.png",
 								            "code_example": [
 								                "import spacy_streamlit",
 								                "",
 								                "models = [\"en_core_web_sm\", \"en_core_web_md\"]",
 								                "default_text = \"Sundar Pichai is the CEO of Google.\"",
-												Remove extra parenthesis from the example for spacy-streamlit (#8527)


											
										
										
											2021-06-28 15:03:31 +03:00
+								                "spacy_streamlit.visualize(models, default_text)"
-												Fix and update universe.json [ci skip]

											
										
										
											2020-07-07 22:12:28 +03:00
+								            ],
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            }
 								        },
-												Adding spaczz package to universe.json (#5717)

* Adding spaczz package to universe.json

* Adding contributor agreement.
											
										
										
											2020-07-07 21:55:24 +03:00
+								        {
 								            "id": "spaczz",
 								            "title": "spaczz",
 								            "slogan": "Fuzzy matching and more for spaCy.",
 								            "description": "Spaczz provides fuzzy matching and multi-token regex matching functionality for spaCy. Spaczz's components have similar APIs to their spaCy counterparts and spaczz pipeline components can integrate into spaCy pipelines where they can be saved/loaded as models.",
 								            "github": "gandersen101/spaczz",
 								            "pip": "spaczz",
 								            "code_example": [
 								                "import spacy",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "from spaczz.matcher import FuzzyMatcher",
-												Adding spaczz package to universe.json (#5717)

* Adding spaczz package to universe.json

* Adding contributor agreement.
											
										
										
											2020-07-07 21:55:24 +03:00
+								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "nlp = spacy.blank(\"en\")",
 								                "text = \"\"\"Grint Anderson created spaczz in his home at 555 Fake St,",
 								                "Apt 5 in Nashv1le, TN 55555-1234 in the US.\"\"\"  # Spelling errors intentional.",
 								                "doc = nlp(text)",
 								                "",
 								                "matcher = FuzzyMatcher(nlp.vocab)",
 								                "matcher.add(\"NAME\", [nlp(\"Grant Andersen\")])",
 								                "matcher.add(\"GPE\", [nlp(\"Nashville\")])",
 								                "matches = matcher(doc)",
-												Adding spaczz package to universe.json (#5717)

* Adding spaczz package to universe.json

* Adding contributor agreement.
											
										
										
											2020-07-07 21:55:24 +03:00
+								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "for match_id, start, end, ratio in matches:",
 								                "    print(match_id, doc[start:end], ratio)"
-												Adding spaczz package to universe.json (#5717)

* Adding spaczz package to universe.json

* Adding contributor agreement.
											
										
										
											2020-07-07 21:55:24 +03:00
+								            ],
 								            "code_language": "python",
 								            "url": "https://spaczz.readthedocs.io/en/latest/",
 								            "author": "Grant Andersen",
 								            "author_links": {
 								                "twitter": "gandersen101",
 								                "github": "gandersen101"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["fuzzy-matching", "regex"]
 								        },
-												adding spacy-universal-sentence-encoder (#5534)

* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
											
										
										
											2020-06-08 21:26:30 +03:00
+								        {
 								            "id": "spacy-universal-sentence-encoder",
-												Normalize spelling for spaCy (#5822)


											
										
										
											2020-07-27 11:09:33 +03:00
+								            "title": "spaCy - Universal Sentence Encoder",
 								            "slogan": "Make use of Google's Universal Sentence Encoder directly within spaCy",
-												adding spacy-universal-sentence-encoder (#5534)

* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
											
										
										
											2020-06-08 21:26:30 +03:00
+								            "description": "This library lets you use Universal Sentence Encoder embeddings of Docs, Spans and Tokens directly from TensorFlow Hub",
-												Sentence transformers added to spaCy universe (#5814)

* fix details for spacy-universal-sentence-encoder

* added sentence-transformers
											
										
										
											2020-07-27 10:44:33 +03:00
+								            "github": "MartinoMensio/spacy-universal-sentence-encoder",
 								            "pip": "spacy-universal-sentence-encoder",
-												adding spacy-universal-sentence-encoder (#5534)

* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
											
										
										
											2020-06-08 21:26:30 +03:00
+								            "code_example": [
 								                "import spacy_universal_sentence_encoder",
-												Sentence transformers added to spaCy universe (#5814)

* fix details for spacy-universal-sentence-encoder

* added sentence-transformers
											
										
										
											2020-07-27 10:44:33 +03:00
+								                "# load one of the models: ['en_use_md', 'en_use_lg', 'xx_use_md', 'xx_use_lg']",
-												adding spacy-universal-sentence-encoder (#5534)

* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
											
										
										
											2020-06-08 21:26:30 +03:00
+								                "nlp = spacy_universal_sentence_encoder.load_model('en_use_lg')",
 								                "# get two documents",
 								                "doc_1 = nlp('Hi there, how are you?')",
 								                "doc_2 = nlp('Hello there, how are you doing today?')",
 								                "# use the similarity method that is based on the vectors, on Doc, Span or Token",
 								                "print(doc_1.similarity(doc_2[0:7]))"
 								            ],
 								            "category": ["models", "pipeline"],
 								            "author": "Martino Mensio",
 								            "author_links": {
 								                "twitter": "MartinoMensio",
 								                "github": "MartinoMensio",
 								                "website": "https://martinomensio.github.io"
 								            }
 								        },
-												add "whatlies" to spaCy universe (#5252)

* Add "whatlies"

We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)

* sign contributor thing

* Added fancy gif

as the image

* Update universe.json

Spellin error and spaCy clarification.
											
										
										
											2020-04-06 12:29:30 +03:00
+								        {
 								            "id": "whatlies",
 								            "title": "whatlies",
 								            "slogan": "Make interactive visualisations to figure out 'what lies' in word embeddings.",
 								            "description": "This small library offers tools to make visualisation easier of both word embeddings as well as operations on them. It has support for spaCy prebuilt models as a first class citizen but also offers support for sense2vec. There's a convenient API to perform linear algebra as well as support for popular transformations like PCA/UMAP/etc.",
-												Update universe.json (#10490)

The project moved away from Rasa and into my personal GitHub account.
											
										
										
											2022-03-15 13:12:04 +03:00
+								            "github": "koaning/whatlies",
-												add "whatlies" to spaCy universe (#5252)

* Add "whatlies"

We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)

* sign contributor thing

* Added fancy gif

as the image

* Update universe.json

Spellin error and spaCy clarification.
											
										
										
											2020-04-06 12:29:30 +03:00
+								            "pip": "whatlies",
 								            "thumb": "https://i.imgur.com/rOkOiLv.png",
-												Update universe.json (#10490)

The project moved away from Rasa and into my personal GitHub account.
											
										
										
											2022-03-15 13:12:04 +03:00
+								            "image": "https://raw.githubusercontent.com/koaning/whatlies/master/docs/gif-two.gif",
-												add "whatlies" to spaCy universe (#5252)

* Add "whatlies"

We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)

* sign contributor thing

* Added fancy gif

as the image

* Update universe.json

Spellin error and spaCy clarification.
											
										
										
											2020-04-06 12:29:30 +03:00
+								            "code_example": [
 								                "from whatlies import EmbeddingSet",
 								                "from whatlies.language import SpacyLanguage",
 								                "",
 								                "lang = SpacyLanguage('en_core_web_md')",
-												fix json (#5267)


											
										
										
											2020-04-08 13:58:09 +03:00
+								                "words = ['cat', 'dog', 'fish', 'kitten', 'man', 'woman', 'king', 'queen', 'doctor', 'nurse']",
-												add "whatlies" to spaCy universe (#5252)

* Add "whatlies"

We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)

* sign contributor thing

* Added fancy gif

as the image

* Update universe.json

Spellin error and spaCy clarification.
											
										
										
											2020-04-06 12:29:30 +03:00
+								                "",
 								                "emb = lang[words]",
 								                "emb.plot_interactive(x_axis='man', y_axis='woman')"
 								            ],
 								            "category": ["visualizers", "research"],
 								            "author": "Vincent D. Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning",
 								                "website": "https://koaning.io"
 								            }
 								        },
-												Added BERTopic to Spacy Universe (#11159)

* Added BERTopic to Spacy Universe

* Fix no render of visualization
											
										
										
											2022-07-19 13:37:18 +03:00
+								        {
 								            "id": "bertopic",
 								            "title": "BERTopic",
 								            "slogan": "Leveraging BERT and c-TF-IDF to create easily interpretable topics.",
 								            "description": "BERTopic is a topic modeling technique that leverages embedding models and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. BERTopic supports guided, (semi-) supervised, hierarchical, and dynamic topic modeling.",
 								            "github": "maartengr/bertopic",
 								            "pip": "bertopic",
 								            "thumb": "https://i.imgur.com/Rx2LfBm.png",
 								            "image": "https://raw.githubusercontent.com/MaartenGr/BERTopic/master/images/topic_visualization.gif",
 								            "code_example": [
 								                "import spacy",
 								                "from bertopic import BERTopic",
 								                "from sklearn.datasets import fetch_20newsgroups",
 								                "",
 								                "docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']",
 								                "nlp = spacy.load('en_core_web_md', exclude=['tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer'])",
 								                "",
 								                "topic_model = BERTopic(embedding_model=nlp)",
 								                "topics, probs = topic_model.fit_transform(docs)",
 								                "",
 								                "fig = topic_model.visualize_topics()",
 								                "fig.show()"
 								            ],
 								            "category": ["visualizers", "training"],
 								            "author": "Maarten Grootendorst",
 								            "author_links": {
 								                "twitter": "maartengr",
 								                "github": "maartengr",
 								                "website": "https://maartengrootendorst.com"
 								            }
 								        },
-												Add Tokenwiser to Projects (#7541)

* Add tokenwiser

* Update universe.json
											
										
										
											2021-04-01 15:39:36 +03:00
+								        {
 								            "id": "tokenwiser",
 								            "title": "tokenwiser",
 								            "slogan": "Connect vowpal-wabbit & scikit-learn models to spaCy to run simple classification benchmarks. Comes with many utility functions for spaCy pipelines.",
 								            "github": "koaning/tokenwiser",
 								            "pip": "tokenwiser",
 								            "thumb": "https://koaning.github.io/tokenwiser/token.png",
 								            "image": "https://koaning.github.io/tokenwiser/logo-tokw.png",
 								            "code_example": [
 								                "import spacy",
 								                "",
 								                "from sklearn.pipeline import make_pipeline",
 								                "from sklearn.feature_extraction.text import CountVectorizer",
 								                "from sklearn.linear_model import LogisticRegression",
 								                "",
 								                "from tokenwiser.component import attach_sklearn_categoriser",
 								                "",
 								                "X = [",
 								                "    'i really like this post',",
 								                "    'thanks for that comment',",
 								                "    'i enjoy this friendly forum',",
 								                "    'this is a bad post',",
 								                "    'i dislike this article',",
 								                "    'this is not well written'",
 								                "]",
 								                "",
 								                "y = ['pos', 'pos', 'pos', 'neg', 'neg', 'neg']",
 								                "",
 								                "# Note that we're training a pipeline here via a single-batch `.fit()` method",
 								                "pipe = make_pipeline(CountVectorizer(), LogisticRegression()).fit(X, y)",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "# This is where we attach our pre-trained model as a pipeline step.",
 								                "attach_sklearn_categoriser(nlp, pipe_name='silly_sentiment', estimator=pipe)"
 								            ],
 								            "category": ["pipeline", "training"],
 								            "author": "Vincent D. Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning",
 								                "website": "https://koaning.io"
 								            }
 								        },
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								        {
 								            "id": "Klayers",
 								            "title": "Klayers",
 								            "category": ["pipeline"],
 								            "tags": ["AWS"],
 								            "slogan": "spaCy as a AWS Lambda Layer",
 								            "description": "A collection of Python Packages as AWS Lambda(λ) Layers",
 								            "github": "keithrozario/Klayers",
 								            "pip": "",
 								            "url": "https://github.com/keithrozario/Klayers",
 								            "code_language": "python",
 								            "author": "Keith Rozario",
 								            "author_links": {
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								                "twitter": "keithrozario",
-												Fix some of the broken links on universe pages (#11011)

Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
											
										
										
											2022-06-23 18:53:00 +03:00
+								                "github": "keithrozario",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "website": "https://www.keithrozario.com"
 								            },
 								            "code_example": [
 								                "# SAM Template",
 								                "MyLambdaFunction:",
 								                "    Type: AWS::Serverless::Function",
 								                "    Handler: 02_pipeline/spaCy.main",
 								                "    Description: Name Entity Extraction",
 								                "    Runtime: python3.8",
 								                "    Layers:",
 								                "        - arn:aws:lambda:${self:provider.region}:113088814899:layer:Klayers-python37-spacy:18"
 								            ]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-spacys-ner-model-alt",
 								            "title": "Named Entity Recognition (NER) using spaCy",
 								            "slogan": "",
 								            "description": "In this video, I show you how to do named entity recognition using the spaCy library for Python.",
 								            "youtube": "Gn_PjruUtrc",
 								            "author": "Applied Language Technology",
 								            "author_links": {
 								                "twitter": "HelsinkiNLP",
 								                "github": "Applied-Language-Technology",
 								                "website": "https://applied-language-technology.mooc.fi/"
 								            },
 								            "category": ["videos"]
 								        },
 								        {
 								            "id": "HuSpaCy",
 								            "title": "HuSpaCy",
 								            "category": ["models"],
 								            "tags": ["Hungarian"],
 								            "slogan": "HuSpaCy: industrial-strength Hungarian natural language processing",
 								            "description": "HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing facilities.",
 								            "github": "huspacy/huspacy",
 								            "pip": "huspacy",
 								            "url": "https://github.com/huspacy/huspacy",
 								            "code_language": "python",
 								            "author": "SzegedAI",
 								            "author_links": {
 								                "github": "https://szegedai.github.io/",
 								                "website": "https://u-szeged.hu/english"
 								            },
 								            "code_example": [
 								                "# Load the model using huspacy",
 								                "import huspacy",
 								                "",
 								                "nlp = huspacy.load()",
 								                "",
 								                "# Load the mode using spacy.load()",
 								                "import spacy",
 								                "",
 								                "nlp = spacy.load(\"hu_core_news_lg\")",
 								                "",
 								                "# Load the model directly as a module",
 								                "import hu_core_news_lg",
 								                "",
 								                "nlp = hu_core_news_lg.load()\n",
 								                "# Either way you get the same model and can start processing texts.",
 								                "doc = nlp(\"Csiribiri csiribiri zabszalma - négy csillag közt alszom ma.\")"
 								            ]
 								        },
-												Update universe.json [ci skip]

											
										
										
											2020-03-17 21:53:31 +03:00
+								        {
 								            "id": "spacy-stanza",
 								            "title": "spacy-stanza",
 								            "slogan": "Use the latest Stanza (StanfordNLP) research models directly in spaCy",
 								            "description": "This package wraps the Stanza (formerly StanfordNLP) library, so you can use Stanford's models as a spaCy pipeline. Using this wrapper, you'll be able to use the following annotations, computed by your pretrained `stanza` model:\n\n- Statistical tokenization (reflected in the `Doc` and its tokens)\n - Lemmatization (`token.lemma` and `token.lemma_`)\n - Part-of-speech tagging (`token.tag`, `token.tag_`, `token.pos`, `token.pos_`)\n - Dependency parsing (`token.dep`, `token.dep_`, `token.head`)\n - Named entity recognition (`doc.ents`, `token.ent_type`, `token.ent_type_`, `token.ent_iob`, `token.ent_iob_`)\n - Sentence segmentation (`doc.sents`)",
 								            "github": "explosion/spacy-stanza",
-												Update universe.json [ci skip]

											
										
										
											2020-03-18 00:21:34 +03:00
+								            "pip": "spacy-stanza",
-												Update universe.json [ci skip]

											
										
										
											2020-03-17 21:53:31 +03:00
+								            "thumb": "https://i.imgur.com/myhLjMJ.png",
 								            "code_example": [
 								                "import stanza",
-												Update universe.json code_example

											
										
										
											2021-07-13 11:22:49 +03:00
+								                "import spacy_stanza",
-												Update universe.json [ci skip]

											
										
										
											2020-03-17 21:53:31 +03:00
+								                "",
-												Update universe.json code_example

											
										
										
											2021-07-13 11:22:49 +03:00
+								                "stanza.download(\"en\")",
 								                "nlp = spacy_stanza.load_pipeline(\"en\")",
-												Update universe.json [ci skip]

											
										
										
											2020-03-17 21:53:31 +03:00
+								                "",
 								                "doc = nlp(\"Barack Obama was born in Hawaii. He was elected president in 2008.\")",
 								                "for token in doc:",
 								                "    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)",
 								                "print(doc.ents)"
 								            ],
 								            "category": ["pipeline", "standalone", "models", "research"],
 								            "author": "Explosion",
 								            "author_links": {
 								                "twitter": "explosion_ai",
 								                "github": "explosion",
 								                "website": "https://explosion.ai"
 								            }
 								        },
-												Add TakeLab/spacy-udpipe to Universe (#8698)

* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
											
										
										
											2021-07-16 12:15:52 +03:00
+								        {
 								            "id": "spacy-udpipe",
 								            "title": "spacy-udpipe",
 								            "slogan": "Use the latest UDPipe models directly in spaCy",
 								            "description": "This package wraps the fast and efficient UDPipe language-agnostic NLP pipeline (via its Python bindings), so you can use UDPipe pre-trained models as a spaCy pipeline for 50+ languages out-of-the-box. Inspired by spacy-stanza, this package offers slightly less accurate models that are in turn much faster.",
 								            "github": "TakeLab/spacy-udpipe",
 								            "pip": "spacy-udpipe",
 								            "code_example": [
 								                "import spacy_udpipe",
 								                "",
 								                "spacy_udpipe.download(\"en\") # download English model",
 								                "",
 								                "text = \"Wikipedia is a free online encyclopedia, created and edited by volunteers around the world.\"",
 								                "nlp = spacy_udpipe.load(\"en\")",
 								                "",
 								                "doc = nlp(text)",
 								                "for token in doc:",
 								                "    print(token.text, token.lemma_, token.pos_, token.dep_)"
 								            ],
 								            "category": ["pipeline", "standalone", "models", "research"],
 								            "author": "TakeLab",
 								            "author_links": {
 								                "github": "TakeLab",
 								                "website": "https://takelab.fer.hr/"
 								            }
 								        },
-												Add "spaCy Server" to spaCy Universe (#4553)

* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement

											
										
										
											2019-10-30 15:20:46 +03:00
+								        {
 								            "id": "spacy-server",
 								            "title": "spaCy Server",
 								            "slogan": "\uD83E\uDD9C Containerized HTTP API for spaCy NLP",
 								            "description": "For developers who need programming language agnostic NLP, spaCy Server is a containerized HTTP API that provides industrial-strength natural language processing. Unlike other servers, our server is fast, idiomatic, and well documented.",
 								            "github": "neelkamath/spacy-server",
 								            "code_example": [
 								                "docker run --rm -dp 8080:8080 neelkamath/spacy-server",
 								                "curl http://localhost:8080/ner -H 'Content-Type: application/json' -d '{\"sections\": [\"My name is John Doe. I grew up in California.\"]}'"
 								            ],
 								            "code_language": "shell",
 								            "url": "https://hub.docker.com/r/neelkamath/spacy-server",
 								            "author": "Neel Kamath",
 								            "author_links": {
 								                "github": "neelkamath",
 								                "website": "https://neelkamath.com"
 								            },
 								            "category": ["apis"],
 								            "tags": ["docker"]
 								        },
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								        {
 								            "id": "nlp-architect",
 								            "title": "NLP Architect",
 								            "slogan": "Python lib for exploring Deep NLP & NLU by Intel AI",
 								            "github": "NervanaSystems/nlp-architect",
 								            "pip": "nlp-architect",
-												Tidy up universe [ci skip]

											
										
										
											2019-06-02 13:38:48 +03:00
+								            "thumb": "https://i.imgur.com/vMideRx.png",
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								            "category": ["standalone", "research"],
 								            "tags": ["pytorch"]
 								        },
 								        {
 								            "id": "Chatterbot",
 								            "title": "Chatterbot",
 								            "slogan": "A machine-learning based conversational dialog engine for creating chat bots",
 								            "github": "gunthercox/ChatterBot",
 								            "pip": "chatterbot",
-												Tidy up universe [ci skip]

											
										
										
											2019-06-02 13:38:48 +03:00
+								            "thumb": "https://i.imgur.com/eyAhwXk.jpg",
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								            "code_example": [
 								                "from chatterbot import ChatBot",
 								                "from chatterbot.trainers import ListTrainer",
 								                "# Create a new chat bot named Charlie",
 								                "chatbot = ChatBot('Charlie')",
 								                "trainer = ListTrainer(chatbot)",
 								                "trainer.train([",
 								                "'Hi, can I help you?',",
-												fix broken example in spaCy universe Chatterbot

											
										
										
											2021-07-25 18:53:32 +03:00
+								                "'Sure, I would like to book a flight to Iceland.',",
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								                "'Your flight has been booked.'",
 								                "])",
 								                "",
 								                "response = chatbot.get_response('I would like to book a flight.')"
 								            ],
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								            "author": "Gunther Cox",
 								            "author_links": {
 								                "github": "gunthercox"
 								            },
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								            "category": ["conversational", "standalone"],
 								            "tags": ["chatbots"]
 								        },
 								        {
 								            "id": "alibi",
 								            "title": "alibi",
 								            "slogan": "Algorithms for monitoring and explaining machine learning models ",
 								            "github": "SeldonIO/alibi",
 								            "pip": "alibi",
-												Tidy up universe [ci skip]

											
										
										
											2019-06-02 13:38:48 +03:00
+								            "thumb": "https://i.imgur.com/YkzQHRp.png",
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								            "code_example": [
-												Tidy up universe [ci skip]

											
										
										
											2019-06-02 13:38:48 +03:00
+								                "from alibi.explainers import AnchorTabular",
 								                "explainer = AnchorTabular(predict_fn, feature_names)",
 								                "explainer.fit(X_train)",
 								                "explainer.explain(x)"
-												Add multiple packages to universe.json (#3809) [ci skip]

* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories

											
										
										
											2019-06-02 13:35:52 +03:00
+								            ],
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								            "author": "Seldon",
-												Tidy up universe [ci skip]

											
										
										
											2019-06-02 13:38:48 +03:00
+								            "category": ["standalone", "research"]
-												Add Baderlab/saber to universe.json (#3806)


											
										
										
											2019-06-01 18:36:40 +03:00
+								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "spacymoji",
 								            "slogan": "Emoji handling and meta data as a spaCy pipeline component",
 								            "github": "ines/spacymoji",
-												removed outdated spacy version for spacymoji

From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
											
										
										
											2021-07-17 16:19:43 +03:00
+								            "description": "spaCy extension and pipeline component for adding emoji meta data to `Doc` objects. Detects emoji consisting of one or more unicode characters, and can optionally merge multi-char emoji (combined pictures, emoji with skin tone modifiers) into one token. Human-readable emoji descriptions are added as a custom attribute, and an optional lookup table can be provided for your own descriptions. The extension sets the custom `Doc`, `Token` and `Span` attributes `._.is_emoji`, `._.emoji_desc`, `._.has_emoji` and `._.emoji`.",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "pip": "spacymoji",
 								            "category": ["pipeline"],
 								            "tags": ["emoji", "unicode"],
 								            "thumb": "https://i.imgur.com/XOTYIgn.jpg",
 								            "code_example": [
 								                "import spacy",
 								                "from spacymoji import Emoji",
 								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "nlp.add_pipe(\"emoji\", first=True)",
 								                "doc = nlp(\"This is a test 😻 👍🏿\")",
 								                "",
 								                "assert doc._.has_emoji is True",
 								                "assert doc[2:5]._.has_emoji is True",
 								                "assert doc[0]._.is_emoji is False",
 								                "assert doc[4]._.is_emoji is True",
 								                "assert doc[5]._.emoji_desc == \"thumbs up dark skin tone\"",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								                "assert len(doc._.emoji) == 2",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "assert doc._.emoji[1] == (\"👍🏿\", 5, \"thumbs up dark skin tone\")"
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            ],
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            }
 								        },
-												added spaCyOpenTapioca (#9181)

* add spaCyOpenTapioca to universe

* add agreement

* fix misprint in tags
											
										
										
											2021-09-11 07:16:51 +03:00
+								        {
 								            "id": "spacyopentapioca",
 								            "title": "spaCyOpenTapioca",
 								            "slogan": "Named entity linking on Wikidata in spaCy via OpenTapioca",
 								            "description": "A spaCy wrapper of OpenTapioca for named entity linking on Wikidata",
 								            "github": "UB-Mannheim/spacyopentapioca",
 								            "pip": "spacyopentapioca",
 								            "code_example": [
 								                "import spacy",
 								                "nlp = spacy.blank('en')",
 								                "nlp.add_pipe('opentapioca')",
 								                "doc = nlp('Christian Drosten works in Germany.')",
 								                "for span in doc.ents:",
 								                "    print((span.text, span.kb_id_, span.label_, span._.description, span._.score))",
 								                "# ('Christian Drosten', 'Q1079331', 'PERSON', 'German virologist and university teacher', 3.6533377082098895)",
 								                "# ('Germany', 'Q183', 'LOC', 'sovereign state in Central Europe', 2.1099332471902863)",
 								                "## Check also span._.types, span._.aliases, span._.rank"
 								            ],
 								            "category": ["models", "pipeline"],
 								            "tags": ["NER", "NEL"],
 								            "author": "Renat Shigapov",
 								            "author_links": {
 								                "twitter": "_shigapov",
 								                "github": "shigapov"
 								            }
 								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "spacy_readability",
 								            "slogan": "Add text readability meta data to Doc objects",
 								            "description": "spaCy v2.0 pipeline component for calculating readability scores of of text. Provides scores for Flesh-Kincaid grade level, Flesh-Kincaid reading ease, and Dale-Chall.",
 								            "github": "mholtzscher/spacy_readability",
 								            "pip": "spacy-readability",
 								            "code_example": [
 								                "import spacy",
 								                "from spacy_readability import Readability",
 								                "",
 								                "nlp = spacy.load('en')",
 								                "read = Readability(nlp)",
 								                "nlp.add_pipe(read, last=True)",
 								                "doc = nlp(\"I am some really difficult text to read because I use obnoxiously large words.\")",
 								                "doc._.flesch_kincaid_grade_level",
 								                "doc._.flesch_kincaid_reading_ease",
 								                "doc._.dale_chall"
 								            ],
 								            "author": "Michael Holtzscher",
 								            "author_links": {
 								                "github": "mholtzscher"
 								            },
 								            "category": ["pipeline"]
 								        },
 								        {
 								            "id": "spacy_cld",
 								            "title": "spaCy-CLD",
 								            "slogan": "Add language detection to your spaCy pipeline using CLD2",
 								            "description": "spaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1).\n\nspacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.",
 								            "github": "nickdavidhaynes/spacy-cld",
 								            "pip": "spacy_cld",
 								            "code_example": [
 								                "import spacy",
 								                "from spacy_cld import LanguageDetector",
 								                "",
 								                "nlp = spacy.load('en')",
 								                "language_detector = LanguageDetector()",
 								                "nlp.add_pipe(language_detector)",
 								                "doc = nlp('This is some English text.')",
 								                "",
 								                "doc._.languages  # ['en']",
 								                "doc._.language_scores['en']  # 0.96"
 								            ],
 								            "author": "Nicholas D Haynes",
 								            "author_links": {
 								                "github": "nickdavidhaynes"
 								            },
 								            "category": ["pipeline"]
 								        },
 								        {
 								            "id": "spacy-iwnlp",
 								            "slogan": "German lemmatization with IWNLP",
 								            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add [IWNLP-py](https://github.com/Liebeck/iwnlp-py) as German lemmatizer directly into your spaCy pipeline.",
 								            "github": "Liebeck/spacy-iwnlp",
 								            "pip": "spacy-iwnlp",
 								            "code_example": [
 								                "import spacy",
 								                "from spacy_iwnlp import spaCyIWNLP",
 								                "",
 								                "nlp = spacy.load('de')",
 								                "iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20170501.json')",
 								                "nlp.add_pipe(iwnlp)",
 								                "doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')",
 								                "for token in doc:",
 								                "    print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas))"
 								            ],
 								            "author": "Matthias Liebeck",
 								            "author_links": {
 								                "github": "Liebeck"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["lemmatizer", "german"]
 								        },
 								        {
 								            "id": "spacy-sentiws",
 								            "slogan": "German sentiment scores with SentiWS",
 								            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add [SentiWS](http://wortschatz.uni-leipzig.de/en/download) as German sentiment score directly into your spaCy pipeline.",
 								            "github": "Liebeck/spacy-sentiws",
 								            "pip": "spacy-sentiws",
 								            "code_example": [
 								                "import spacy",
 								                "from spacy_sentiws import spaCySentiWS",
 								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "nlp = spacy.load('de_core_news_sm')",
 								                "nlp.add_pipe('sentiws', config={'sentiws_path': 'data/sentiws'})",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								                "doc = nlp('Die Dummheit der Unterwerfung blüht in hübschen Farben.')",
 								                "",
 								                "for token in doc:",
 								                "    print('{}, {}, {}'.format(token.text, token._.sentiws, token.pos_))"
 								            ],
 								            "author": "Matthias Liebeck",
 								            "author_links": {
 								                "github": "Liebeck"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["sentiment", "german"]
 								        },
 								        {
 								            "id": "spacy-lefff",
-												Updating description and code snippet spacy-lefff (#2623)

* updating description and code snippet spacy-lefff

* contributors agreement

											
										
										
											2018-08-02 18:25:27 +03:00
+								            "slogan": "POS and French lemmatization with Lefff",
 								            "description": "spacy v2.0 extension and pipeline component for adding a French POS and lemmatizer based on [Lefff](https://hal.inria.fr/inria-00521242/).",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "github": "sammous/spacy-lefff",
 								            "pip": "spacy-lefff",
 								            "code_example": [
 								                "import spacy",
-												Updating description and code snippet spacy-lefff (#2623)

* updating description and code snippet spacy-lefff

* contributors agreement

											
										
										
											2018-08-02 18:25:27 +03:00
+								                "from spacy_lefff import LefffLemmatizer, POSTagger",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								                "",
 								                "nlp = spacy.load('fr')",
-												Updating description and code snippet spacy-lefff (#2623)

* updating description and code snippet spacy-lefff

* contributors agreement

											
										
										
											2018-08-02 18:25:27 +03:00
+								                "pos = POSTagger()",
 								                "french_lemmatizer = LefffLemmatizer(after_melt=True)",
 								                "nlp.add_pipe(pos, name='pos', after='parser')",
 								                "nlp.add_pipe(french_lemmatizer, name='lefff', after='pos')",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								                "doc = nlp(u\"Paris est une ville très chère.\")",
 								                "for d in doc:",
-												Updating description and code snippet spacy-lefff (#2623)

* updating description and code snippet spacy-lefff

* contributors agreement

											
										
										
											2018-08-02 18:25:27 +03:00
+								                "    print(d.text, d.pos_, d._.melt_tagger, d._.lefff_lemma, d.tag_, d.lemma_)"
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            ],
 								            "author": "Sami Moustachir",
 								            "author_links": {
 								                "github": "sammous"
 								            },
 								            "category": ["pipeline"],
-												Updating description and code snippet spacy-lefff (#2623)

* updating description and code snippet spacy-lefff

* contributors agreement

											
										
										
											2018-08-02 18:25:27 +03:00
+								            "tags": ["pos", "lemmatizer", "french"]
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        },
 								        {
 								            "id": "lemmy",
 								            "title": "Lemmy",
 								            "slogan": "A Danish lemmatizer",
 								            "description": "Lemmy is a lemmatizer for Danish 🇩🇰 . It comes already trained on Dansk Sprognævns (DSN) word list (‘fuldformliste’) and the Danish Universal Dependencies and is ready for use. Lemmy also supports training on your own dataset. The model currently included in Lemmy was evaluated on the Danish Universal Dependencies dev dataset and scored an accruacy > 99%.\n\nYou can use Lemmy as a spaCy extension, more specifcally a spaCy pipeline component. This is highly recommended and makes the lemmas easily accessible from the spaCy tokens. Lemmy makes use of POS tags to predict the lemmas. When wired up to the spaCy pipeline, Lemmy has the benefit of using spaCy’s builtin POS tagger.",
 								            "github": "sorenlind/lemmy",
 								            "pip": "lemmy",
 								            "code_example": [
 								                "import da_custom_model as da # name of your spaCy model",
 								                "import lemmy.pipe",
 								                "nlp = da.load()",
 								                "",
 								                "# create an instance of Lemmy's pipeline component for spaCy",
 								                "pipe = lemmy.pipe.load()",
 								                "",
 								                "# add the comonent to the spaCy pipeline.",
 								                "nlp.add_pipe(pipe, after='tagger')",
 								                "",
 								                "# lemmas can now be accessed using the `._.lemma` attribute on the tokens",
 								                "nlp(\"akvariernes\")[0]._.lemma"
 								            ],
 								            "thumb": "https://i.imgur.com/RJVFRWm.jpg",
 								            "author": "Søren Lind Kristiansen",
 								            "author_links": {
 								                "github": "sorenlind"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["lemmatizer", "danish"]
 								        },
-												Added Augmenty to universe (#10229)

* Added Augmenty to universe

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-02-08 10:32:11 +03:00
+								        {
 								            "id": "augmenty",
 								            "title": "Augmenty",
 								            "slogan": "The cherry on top of your NLP pipeline",
 								            "description": "Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.",
 								            "github": "kennethenevoldsen/augmenty",
 								            "pip": "augmenty",
 								            "code_example": [
 								                "import spacy",
 								                "import augmenty",
 								                "",
 								                "nlp = spacy.load('en_core_web_md')",
 								                "",
 								                "docs = nlp.pipe(['Augmenty is a great tool for text augmentation'])",
 								                "",
 								                "ent_dict = {'ORG': [['spaCy'], ['spaCy', 'Universe']]}",
 								                "entity_augmenter = augmenty.load('ents_replace.v1',",
 								                "                                 ent_dict = ent_dict, level=1)",
 								                "",
 								                "for doc in augmenty.docs(docs, augmenter=entity_augmenter, nlp=nlp):",
 								                "    print(doc)"
 								            ],
 								            "thumb": "https://github.com/KennethEnevoldsen/augmenty/blob/master/img/icon.png?raw=true",
 								            "author": "Kenneth Enevoldsen",
 								            "author_links": {
 								                "github": "kennethenevoldsen",
 								                "website": "https://www.kennethenevoldsen.com"
 								            },
 								            "category": ["training", "research"],
 								            "tags": ["training", "research", "augmentation"]
 								        },
-												added dacy to universe
											
										
										
											2021-07-13 10:54:08 +03:00
+								        {
 								            "id": "dacy",
 								            "title": "DaCy",
 								            "slogan": "An efficient Pipeline for Danish NLP",
 								            "description": "DaCy is a Danish preprocessing pipeline trained in SpaCy. It has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency parsing for Danish. This repository contains material for using the DaCy, reproducing the results and guides on usage of the package. Furthermore, it also contains a series of behavioural test for biases and robustness of Danish NLP pipelines.",
 								            "github": "centre-for-humanities-computing/DaCy",
 								            "pip": "dacy",
 								            "code_example": [
 								                "import dacy",
-												Update website/meta/universe.json

Co-authored-by: Ines Montani <ines@ines.io>
											
										
										
											2021-07-17 08:14:46 +03:00
+								                "print(dacy.models()) # get a list of dacy models",
-												added dacy to universe
											
										
										
											2021-07-13 10:54:08 +03:00
+								                "nlp = dacy.load('medium')  # load your spacy pipeline",
 								                "",
 								                "# DaCy also includes functionality for adding other Danish models to the pipeline",
-												Fix JSON [ci skip]

											
										
										
											2021-07-18 06:21:33 +03:00
+								                "# For instance you can add the BertTone model for classification of sentiment polarity to the pipeline:",
 								                "nlp = add_berttone_polarity(nlp)"
-												added dacy to universe
											
										
										
											2021-07-13 10:54:08 +03:00
+								            ],
-												fixed GitHub link and thumbnail

Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
											
										
										
											2021-07-18 11:22:00 +03:00
+								            "thumb": "https://github.com/centre-for-humanities-computing/DaCy/blob/main/img/icon_no_title.png?raw=true",
-												added dacy to universe
											
										
										
											2021-07-13 10:54:08 +03:00
+								            "author": "Centre for Humanities Computing Aarhus",
 								            "author_links": {
-												fixed GitHub link and thumbnail

Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
											
										
										
											2021-07-18 11:22:00 +03:00
+								                "github": "centre-for-humanities-computing",
-												added dacy to universe
											
										
										
											2021-07-13 10:54:08 +03:00
+								                "website": "https://chcaa.io/#/"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["pipeline", "danish"]
 								        },
-												Added spacy-wrap to universe (#10168)

* Added spacy-wrap to universe 

Added spacy-wrap to universe a small package for wrapping fine-tuned huggingface transformers to a spacy pipeline following the same API as spacy-transformers. (Currently limited to classification models)

* Update website/meta/universe.json

* Update website/meta/universe.json

* Update website/meta/universe.json

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-02-03 14:30:09 +03:00
+								        {
 								            "id": "spacy-wrap",
 								            "title": "spaCy-wrap",
 								            "slogan": "For Wrapping fine-tuned transformers in spaCy pipelines",
 								            "description": "spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing inclusion of existing models within existing workflows.",
 								            "github": "kennethenevoldsen/spacy-wrap",
 								            "pip": "spacy_wrap",
 								            "code_example": [
 								                "import spacy",
 								                "import spacy_wrap",
 								                "",
 								                "nlp = spacy.blank('en')",
 								                "config = {",
 								                "   'doc_extension_trf_data': 'clf_trf_data',  # document extention for the forward pass",
 								                "   'doc_extension_prediction': 'sentiment',  # document extention for the prediction",
 								                "   'labels': ['negative', 'neutral', 'positive'],",
 								                "   'model': {",
 								                "       'name': 'cardiffnlp/twitter-roberta-base-sentiment',  # the model name or path of huggingface model",
 								                "},",
 								                "}",
 								                "",
 								                "transformer = nlp.add_pipe('classification_transformer', config=config)",
 								                "transformer.model.initialize()",
 								                "",
 								                "doc = nlp('spaCy is a wonderful tool')",
 								                "",
 								                "print(doc._.clf_trf_data)",
 								                "# TransformerData(wordpieces=...",
 								                "print(doc._.sentiment)",
 								                "# 'positive'",
 								                "print(doc._.sentiment_prob)",
 								                "# {'prob': array([0.004, 0.028, 0.969], dtype=float32), 'labels': ['negative', 'neutral', 'positive']}"
 								            ],
 								            "thumb": "https://raw.githubusercontent.com/KennethEnevoldsen/spacy-wrap/main/docs/_static/icon.png",
 								            "author": "Kenneth Enevoldsen",
 								            "author_links": {
 								                "github": "KennethEnevoldsen",
 								                "website": "https://www.kennethenevoldsen.com"
 								            },
 								            "category": ["pipeline", "models", "training"],
 								            "tags": ["pipeline", "models", "transformers"]
 								        },
-												Added asent to spacy universe (#11078)

* Added asent to spacy universe

* Update addition of asent following correction
											
										
										
											2022-07-07 07:25:25 +03:00
+								        {
 								            "id": "asent",
 								            "title": "Asent",
 								            "slogan": "Fast, flexible and transparent sentiment analysis",
 								            "description": "Asent is a rule-based sentiment analysis library for Python made using spaCy. It is inspired by VADER, but uses a more modular ruleset, that allows the user to change e.g. the method for finding negations. Furthermore it includes visualisers to visualize the model predictions, making the model easily interpretable.",
 								            "github": "kennethenevoldsen/asent",
-												Fix asent pip package name

											
										
										
											2022-09-01 05:35:52 +03:00
+								            "pip": "asent",
-												Added asent to spacy universe (#11078)

* Added asent to spacy universe

* Update addition of asent following correction
											
										
										
											2022-07-07 07:25:25 +03:00
+								            "code_example": [
 								                "import spacy",
 								                "import asent",
 								                "",
 								                "# load spacy pipeline",
 								                "nlp = spacy.blank('en')",
 								                "nlp.add_pipe('sentencizer')",
 								                "",
 								                "# add the rule-based sentiment model",
 								                "nlp.add_pipe('asent_en_v1')",
 								                "",
 								                "# try an example",
 								                "text = 'I am not very happy, but I am also not especially sad'",
 								                "doc = nlp(text)",
 								                "",
 								                "# print polarity of document, scaled to be between -1, and 1",
 								                "print(doc._.polarity)",
 								                "# neg=0.0 neu=0.631 pos=0.369 compound=0.7526",
 								                "",
 								                "# Naturally, a simple score can be quite unsatisfying, thus Asent implements a series of visualizer to interpret the results:",
 								                "asent.visualize(doc, style='prediction')",
 								                " # or",
 								                "asent.visualize(doc[:5], style='analysis')"
 								            ],
 								            "thumb": "https://github.com/KennethEnevoldsen/asent/raw/main/docs/img/logo_black_font.png?raw=true",
 								            "author": "Kenneth Enevoldsen",
 								            "author_links": {
 								                "github": "KennethEnevoldsen",
 								                "website": "https://www.kennethenevoldsen.com"
 								            },
 								            "category": ["pipeline", "models"],
 								            "tags": ["pipeline", "models", "sentiment"]
 								        },
-												add textdescriptives to universe

											
										
										
											2021-08-13 15:35:18 +03:00
+								        {
 								            "id": "textdescriptives",
 								            "title": "TextDescriptives",
 								            "slogan": "Extraction of descriptive stats, readability, and syntactic complexity measures",
 								            "description": "Pipeline component for spaCy v.3 that calculates descriptive statistics, readability metrics, and syntactic complexity (dependency distance).",
 								            "github": "HLasse/TextDescriptives",
 								            "pip": "textdescriptives",
 								            "code_example": [
 								                "import spacy",
 								                "import textdescriptives as td",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "nlp.add_pipe('textdescriptives')",
 								                "doc = nlp('This is a short test text')",
 								                "doc._.readability # access some of the values",
 								                "td.extract_df(doc) # extract all metrics to DataFrame"
 								            ],
 								            "author": "Lasse Hansen, Kenneth Enevoldsen, Ludvig Olsen",
 								            "author_links": {
 								                "github": "HLasse"
 								            },
-												Fix universe.json [ci skip]

											
										
										
											2021-08-20 04:26:29 +03:00
+								            "category": ["pipeline"],
-												change tags formatting to match

											
										
										
											2021-08-13 15:40:08 +03:00
+								            "tags": ["pipeline", "readability", "syntactic complexity", "descriptive statistics"]
-												add textdescriptives to universe

											
										
										
											2021-08-13 15:35:18 +03:00
+								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "neuralcoref",
 								            "slogan": "State-of-the-art coreference resolution based on neural nets and spaCy",
-												update neuralcoref example (#4317)


											
										
										
											2019-09-24 11:47:17 +03:00
+								            "description": "This coreference resolution module is based on the super fast [spaCy](https://spacy.io/) parser and uses the neural net scoring model described in [Deep Reinforcement Learning for Mention-Ranking Coreference Models](http://cs.stanford.edu/people/kevclark/resources/clark-manning-emnlp2016-deep.pdf) by Kevin Clark and Christopher D. Manning, EMNLP 2016. Since ✨Neuralcoref v2.0, you can train the coreference resolution system on your own dataset — e.g., another language than English! — **provided you have an annotated dataset**. Note that to use neuralcoref with spaCy > 2.1.0, you'll have to install neuralcoref from source.",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "github": "huggingface/neuralcoref",
 								            "thumb": "https://i.imgur.com/j6FO9O6.jpg",
 								            "code_example": [
-												update neuralcoref example (#4317)


											
										
										
											2019-09-24 11:47:17 +03:00
+								                "import spacy",
 								                "import neuralcoref",
 								                "",
 								                "nlp = spacy.load('en')",
 								                "neuralcoref.add_to_pipe(nlp)",
 								                "doc1 = nlp('My sister has a dog. She loves him.')",
 								                "print(doc1._.coref_clusters)",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								                "",
-												update neuralcoref example (#4317)


											
										
										
											2019-09-24 11:47:17 +03:00
+								                "doc2 = nlp('Angela lives in Boston. She is quite happy in that city.')",
 								                "for ent in doc2.ents:",
 								                "    print(ent._.coref_cluster)"
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            ],
 								            "author": "Hugging Face",
 								            "author_links": {
 								                "github": "huggingface"
 								            },
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								            "category": ["standalone", "conversational", "models"],
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "tags": ["coref"]
 								        },
 								        {
 								            "id": "neuralcoref-vizualizer",
 								            "title": "Neuralcoref Visualizer",
 								            "slogan": "State-of-the-art coreference resolution based on neural nets and spaCy",
 								            "description": "In short, coreference is the fact that two or more expressions in a text – like pronouns or nouns – link to the same person or thing. It is a classical Natural language processing task, that has seen a revival of interest in the past two years as several research groups applied cutting-edge deep-learning and reinforcement-learning techniques to it. It is also one of the key building blocks to building conversational Artificial intelligences.",
 								            "url": "https://huggingface.co/coref/",
 								            "image": "https://i.imgur.com/3yy4Qyf.png",
 								            "thumb": "https://i.imgur.com/j6FO9O6.jpg",
 								            "github": "huggingface/neuralcoref",
 								            "category": ["visualizers", "conversational"],
 								            "tags": ["coref", "chatbots"],
 								            "author": "Hugging Face",
 								            "author_links": {
 								                "github": "huggingface"
 								            }
 								        },
 								        {
 								            "id": "matcher-explorer",
 								            "title": "Rule-based Matcher Explorer",
 								            "slogan": "Test spaCy's rule-based Matcher by creating token patterns interactively",
-												Fix links [ci skip]

											
										
										
											2019-02-18 00:25:50 +03:00
+								            "description": "Test spaCy's rule-based `Matcher` by creating token patterns interactively and running them over your text. Each token can set multiple attributes like text value, part-of-speech tag or boolean flags. The token-based view lets you explore how spaCy processes your text – and why your pattern matches, or why it doesn't. For more details on rule-based matching, see the [documentation](https://spacy.io/usage/rule-based-matching).",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "image": "https://explosion.ai/assets/img/demos/matcher.png",
 								            "thumb": "https://i.imgur.com/rPK4AGt.jpg",
 								            "url": "https://explosion.ai/demos/matcher",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            },
 								            "category": ["visualizers"]
 								        },
 								        {
 								            "id": "displacy",
 								            "title": "displaCy",
 								            "slogan": "A modern syntactic dependency visualizer",
 								            "description": "Visualize spaCy's guess at the syntactic structure of a sentence. Arrows point from children to heads, and are labelled by their relation type.",
 								            "url": "https://explosion.ai/demos/displacy",
 								            "thumb": "https://i.imgur.com/nxDcHaL.jpg",
 								            "image": "https://explosion.ai/assets/img/demos/displacy.png",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            },
 								            "category": ["visualizers"]
 								        },
 								        {
 								            "id": "displacy-ent",
 								            "title": "displaCy ENT",
 								            "slogan": "A modern named entity visualizer",
 								            "description": "Visualize spaCy's guess at the named entities in the document. You can filter the displayed types, to only show the annotations you're interested in.",
 								            "url": "https://explosion.ai/demos/displacy-ent",
 								            "thumb": "https://i.imgur.com/A77Ecbs.jpg",
 								            "image": "https://explosion.ai/assets/img/demos/displacy-ent.png",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            },
 								            "category": ["visualizers"]
 								        },
 								        {
 								            "id": "explacy",
 								            "slogan": "A small tool that explains spaCy parse results",
 								            "github": "tylerneylon/explacy",
 								            "thumb": "https://i.imgur.com/V1hCWmn.jpg",
 								            "image": "https://raw.githubusercontent.com/tylerneylon/explacy/master/img/screenshot.png",
 								            "code_example": [
 								                "import spacy",
 								                "import explacy",
 								                "",
 								                "nlp = spacy.load('en')",
 								                "explacy.print_parse_info(nlp, 'The salad was surprisingly tasty.')"
 								            ],
 								            "author": "Tyler Neylon",
 								            "author_links": {
 								                "github": "tylerneylon"
 								            },
 								            "category": ["visualizers"]
 								        },
-												Add projects to spaCy Universe (#9269)

* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-09-23 11:56:45 +03:00
+								        {
 								            "id": "deplacy",
 								            "slogan": "CUI-based Tree Visualizer for Universal Dependencies and Immediate Catena Analysis",
-												fixed typo and URL (#9560)


											
										
										
											2021-10-29 07:57:44 +03:00
+								            "description": "Simple dependency visualizer for [spaCy](https://spacy.io/), [UniDic2UD](https://pypi.org/project/unidic2ud), [Stanza](https://stanfordnlp.github.io/stanza/), [NLP-Cube](https://github.com/Adobe/NLP-Cube), [Trankit](https://github.com/nlp-uoregon/trankit), etc.",
-												Add projects to spaCy Universe (#9269)

* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-09-23 11:56:45 +03:00
+								            "github": "KoichiYasuoka/deplacy",
 								            "image": "https://i.imgur.com/6uOI4Op.png",
 								            "code_example": [
 								                "import spacy",
 								                "import deplacy",
 								                "",
 								                "nlp=spacy.load('en_core_web_sm')",
 								                "doc=nlp('I saw a horse yesterday which had no name.')",
 								                "deplacy.render(doc)"
 								            ],
 								            "author": "Koichi Yasuoka",
 								            "author_links": {
 								                "github": "KoichiYasuoka"
 								            },
 								            "category": ["visualizers"]
 								        },
-												Add scattertext [ci skip]

											
										
										
											2018-05-07 20:10:23 +03:00
+								        {
 								            "id": "scattertext",
 								            "slogan": "Beautiful visualizations of how language differs among document types",
 								            "description": "A tool for finding distinguishing terms in small-to-medium-sized corpora, and presenting them in a sexy, interactive scatter plot with non-overlapping term labels. Exploratory data analysis just got more fun.",
 								            "github": "JasonKessler/scattertext",
 								            "image": "https://jasonkessler.github.io/2012conventions0.0.2.2.png",
 								            "code_example": [
 								                "import spacy",
 								                "",
-												Update scattertext example code (#11937)

* Update scattertext example code

* Remove PMI Filter Threshold
											
										
										
											2022-12-07 12:09:04 +03:00
+								                "from scattertext import SampleCorpora, produce_scattertext_explorer",
 								                "from scattertext import produce_scattertext_html",
 								                "from scattertext.CorpusFromPandas import CorpusFromPandas",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "convention_df = SampleCorpora.ConventionData2012.get_data()",
 								                "corpus = CorpusFromPandas(convention_df,",
 								                "                          category_col='party',",
 								                "                          text_col='text',",
 								                "                          nlp=nlp).build()",
 								                "",
 								                "html = produce_scattertext_html(corpus,",
 								                "                                    category='democrat',",
 								                "                                    category_name='Democratic',",
 								                "                                    not_category_name='Republican',",
 								                "                                    minimum_term_frequency=5,",
 								                "                                    width_in_pixels=1000)",
 								                "open('./simple.html', 'wb').write(html.encode('utf-8'))",
 								                "print('Open ./simple.html in Chrome or Firefox.')"
-												Add scattertext [ci skip]

											
										
										
											2018-05-07 20:10:23 +03:00
+								            ],
 								            "author": "Jason Kessler",
 								            "author_links": {
 								                "github": "JasonKessler",
 								                "twitter": "jasonkessler"
 								            },
 								            "category": ["visualizers"]
 								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "rasa",
-												Update information about Rasa (#4492)

Rasa has been updated and rasa core and rasa nlu have been merged.
											
										
										
											2019-10-22 15:32:31 +03:00
+								            "title": "Rasa",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "slogan": "Turn natural language into structured data",
-												Update information about Rasa (#4492)

Rasa has been updated and rasa core and rasa nlu have been merged.
											
										
										
											2019-10-22 15:32:31 +03:00
+								            "description": "Machine learning tools for developers to build, improve, and deploy contextual chatbots and assistants. Powered by open source.",
 								            "github": "RasaHQ/rasa",
 								            "pip": "rasa",
 								            "thumb": "https://i.imgur.com/TyZnpwL.png",
 								            "url": "https://rasa.com/",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "author": "Rasa",
 								            "author_links": {
 								                "github": "RasaHQ"
 								            },
 								            "category": ["conversational"],
 								            "tags": ["chatbots"]
 								        },
-												Adding MindMeld to Universe JSON (#6275)

* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
											
										
										
											2020-10-21 19:42:11 +03:00
+								        {
 								            "id": "mindmeld",
 								            "title": "MindMeld - Conversational AI platform",
 								            "slogan": "Conversational AI platform for deep-domain voice interfaces and chatbots",
 								            "description": "The MindMeld Conversational AI platform is among the most advanced AI platforms for building production-quality conversational applications. It is a Python-based machine learning framework which encompasses all of the algorithms and utilities required for this purpose. (https://github.com/cisco/mindmeld)",
 								            "github": "cisco/mindmeld",
 								            "pip": "mindmeld",
 								            "thumb": "https://www.mindmeld.com/img/mindmeld-logo.png",
 								            "category": ["conversational", "ner"],
 								            "tags": ["chatbots"],
 								            "author": "Cisco",
 								            "author_links": {
 								                "github": "cisco/mindmeld",
 								                "website": "https://www.mindmeld.com/"
 								            }
 								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
-												Small doc fixes (#5250)

* fix link

* torchtext instead tochtext
											
										
										
											2020-04-03 14:01:43 +03:00
+								            "id": "torchtext",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "title": "torchtext",
 								            "slogan": "Data loaders and abstractions for text and NLP",
 								            "github": "pytorch/text",
 								            "pip": "torchtext",
 								            "thumb": "https://i.imgur.com/WFkxuPo.png",
 								            "code_example": [
 								                ">>> pos = data.TabularDataset(",
 								                "...    path='data/pos/pos_wsj_train.tsv', format='tsv',",
 								                "...    fields=[('text', data.Field()),",
 								                "...            ('labels', data.Field())])",
 								                "...",
 								                ">>> sentiment = data.TabularDataset(",
 								                "...    path='data/sentiment/train.json', format='json',",
 								                "...    fields={'sentence_tokenized': ('text', data.Field(sequential=True)),",
 								                "...            'sentiment_gold': ('labels', data.Field(sequential=False))})"
 								            ],
 								            "category": ["standalone", "research"],
 								            "tags": ["pytorch"]
 								        },
 								        {
 								            "id": "allennlp",
 								            "title": "AllenNLP",
 								            "slogan": "An open-source NLP research library, built on PyTorch and spaCy",
 								            "description": "AllenNLP is a new library designed to accelerate NLP research, by providing a framework that supports modern deep learning workflows for cutting-edge language understanding problems. AllenNLP uses spaCy as a preprocessing component. You can also use Allen NLP to develop spaCy pipeline components, to add annotations to the `Doc` object.",
 								            "github": "allenai/allennlp",
 								            "pip": "allennlp",
 								            "thumb": "https://i.imgur.com/U8opuDN.jpg",
 								            "url": "http://allennlp.org",
 								            "author": " Allen Institute for Artificial Intelligence",
 								            "author_links": {
 								                "github": "allenai",
 								                "twitter": "allenai_org",
 								                "website": "http://allenai.org"
 								            },
 								            "category": ["standalone", "research"]
 								        },
-												Update universe [ci skip]

											
										
										
											2019-03-12 13:13:03 +03:00
+								        {
 								            "id": "scispacy",
 								            "title": "scispaCy",
 								            "slogan": "A full spaCy pipeline and models for scientific/biomedical documents",
 								            "github": "allenai/scispacy",
 								            "pip": "scispacy",
 								            "thumb": "https://i.imgur.com/dJQSclW.png",
 								            "url": "https://allenai.github.io/scispacy/",
 								            "author": " Allen Institute for Artificial Intelligence",
 								            "author_links": {
 								                "github": "allenai",
 								                "twitter": "allenai_org",
 								                "website": "http://allenai.org"
 								            },
-												Add category to spaCy project (#12506)

ScispaCy fits within biomedical domain. Consider adding this category.
											
										
										
											2023-04-07 16:31:04 +03:00
+								            "category": ["scientific", "models", "research", "biomedical"]
-												Update universe [ci skip]

											
										
										
											2019-03-12 13:13:03 +03:00
+								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "textacy",
 								            "slogan": "NLP, before and after spaCy",
 								            "description": "`textacy` is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance `spacy` library. With the fundamentals – tokenization, part-of-speech tagging, dependency parsing, etc. – delegated to another library, `textacy` focuses on the tasks that come before and follow after.",
 								            "github": "chartbeat-labs/textacy",
 								            "pip": "textacy",
-												fixed typo and URL (#9560)


											
										
										
											2021-10-29 07:57:44 +03:00
+								            "url": "https://github.com/chartbeat-labs/textacy",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "author": "Burton DeWilde",
 								            "author_links": {
 								                "github": "bdewilde",
 								                "twitter": "bjdewilde"
 								            },
 								            "category": ["standalone"]
 								        },
-												adds textpipe to universe (#3500) [ci skip]

* Adds textpipe to universe

* signed contributor agreement

* Adjust formatting, code style and use "standalone" category

											
										
										
											2019-03-28 17:13:19 +03:00
+								        {
 								            "id": "textpipe",
 								            "slogan": "clean and extract metadata from text",
 								            "description": "`textpipe` is a Python package for converting raw text in to clean, readable text and extracting metadata from that text. Its functionalities include transforming raw text into readable text by removing HTML tags and extracting metadata such as the number of words and named entities from the text.",
 								            "github": "textpipe/textpipe",
 								            "pip": "textpipe",
 								            "author": "Textpipe Contributors",
 								            "author_links": {
 								                "github": "textpipe",
 								                "website": "https://github.com/textpipe/textpipe/blob/master/CONTRIBUTORS.md"
 								            },
 								            "category": ["standalone"],
 								            "tags": ["text-processing", "named-entity-recognition"],
 								            "thumb": "https://avatars0.githubusercontent.com/u/40492530",
 								            "code_example": [
 								                "from textpipe import doc, pipeline",
 								                "sample_text = 'Sample text! <!DOCTYPE>'",
 								                "document = doc.Doc(sample_text)",
 								                "print(document.clean)",
 								                "'Sample text!'",
 								                "print(document.language)",
 								                "# 'en'",
 								                "print(document.nwords)",
 								                "# 2",
 								                "",
 								                "pipe = pipeline.Pipeline(['CleanText', 'NWords'])",
 								                "print(pipe(sample_text))",
 								                "# {'CleanText': 'Sample text!', 'NWords': 2}"
 								            ]
 								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "mordecai",
 								            "slogan": "Full text geoparsing using spaCy, Geonames and Keras",
 								            "description": "Extract the place names from a piece of text, resolve them to the correct place, and return their coordinates and structured geographic information.",
 								            "github": "openeventdata/mordecai",
 								            "pip": "mordecai",
 								            "thumb": "https://i.imgur.com/gPJ9upa.jpg",
 								            "code_example": [
 								                "from mordecai import Geoparser",
 								                "geo = Geoparser()",
 								                "geo.geoparse(\"I traveled from Oxford to Ottawa.\")"
 								            ],
 								            "author": "Andy Halterman",
 								            "author_links": {
 								                "github": "ahalterman",
 								                "twitter": "ahalterman"
 								            },
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								            "category": ["standalone", "scientific"]
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        },
 								        {
 								            "id": "kindred",
 								            "title": "Kindred",
 								            "slogan": "Biomedical relation extraction using spaCy",
 								            "description": "Kindred is a package for relation extraction in biomedical texts. Given some training data, it can build a model to identify relations between entities (e.g. drugs, genes, etc) in a sentence.",
 								            "github": "jakelever/kindred",
 								            "pip": "kindred",
 								            "code_example": [
 								                "import kindred",
 								                "",
 								                "trainCorpus = kindred.bionlpst.load('2016-BB3-event-train')",
 								                "devCorpus = kindred.bionlpst.load('2016-BB3-event-dev')",
 								                "predictionCorpus = devCorpus.clone()",
 								                "predictionCorpus.removeRelations()",
 								                "classifier = kindred.RelationClassifier()",
 								                "classifier.train(trainCorpus)",
 								                "classifier.predict(predictionCorpus)",
 								                "f1score = kindred.evaluate(devCorpus, predictionCorpus, metric='f1score')"
 								            ],
 								            "author": "Jake Lever",
 								            "author_links": {
 								                "github": "jakelever"
 								            },
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								            "category": ["standalone", "scientific"]
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        },
 								        {
 								            "id": "sense2vec",
 								            "slogan": "Use NLP to go beyond vanilla word2vec",
 								            "description": "sense2vec ([Trask et. al](https://arxiv.org/abs/1511.06388), 2015) is a nice twist on [word2vec](https://en.wikipedia.org/wiki/Word2vec) that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our [sense2vec demo](https://explosion.ai/demos/sense2vec) that lets you explore semantic similarities across all Reddit comments of 2015.",
 								            "github": "explosion/sense2vec",
-												update sense2vec version (#4320)


											
										
										
											2019-09-25 13:17:54 +03:00
+								            "pip": "sense2vec==1.0.0a1",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "thumb": "https://i.imgur.com/awfdhX6.jpg",
 								            "image": "https://explosion.ai/assets/img/demos/sense2vec.png",
 								            "url": "https://explosion.ai/demos/sense2vec",
 								            "code_example": [
 								                "import spacy",
 								                "",
-												Update universe example codes (#9422)

* Update universe plugins

* Adjust azure trigger

* Add init to tests/universe

* deliberatly trying to break the universe to see if the CI catches it

* revert

Co-authored-by: svlandeg <svlandeg@github.com>
											
										
										
											2021-10-13 17:29:19 +03:00
+								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "s2v = nlp.add_pipe(\"sense2vec\")",
 								                "s2v.from_disk(\"/path/to/s2v_reddit_2015_md\")",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								                "",
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 17:11:15 +03:00
+								                "doc = nlp(\"A sentence about natural language processing.\")",
-												Update universe example codes (#9422)

* Update universe plugins

* Adjust azure trigger

* Add init to tests/universe

* deliberatly trying to break the universe to see if the CI catches it

* revert

Co-authored-by: svlandeg <svlandeg@github.com>
											
										
										
											2021-10-13 17:29:19 +03:00
+								                "assert doc[3:6].text == \"natural language processing\"",
 								                "freq = doc[3:6]._.s2v_freq",
 								                "vector = doc[3:6]._.s2v_vec",
 								                "most_similar = doc[3:6]._.s2v_most_similar(3)",
 								                "# [(('machine learning', 'NOUN'), 0.8986967),",
 								                "#  (('computer vision', 'NOUN'), 0.8636297),",
 								                "#  (('deep learning', 'NOUN'), 0.8573361)]"
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            ],
 								            "category": ["pipeline", "standalone", "visualizers"],
 								            "tags": ["vectors"],
-												Update universe.json [ci skip]

											
										
										
											2019-08-05 15:30:07 +03:00
+								            "author": "Explosion",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "author_links": {
 								                "twitter": "explosion_ai",
 								                "github": "explosion",
 								                "website": "https://explosion.ai"
 								            }
 								        },
 								        {
 								            "id": "spacyr",
 								            "slogan": "An R wrapper for spaCy",
 								            "github": "quanteda/spacyr",
 								            "cran": "spacyr",
 								            "code_example": [
 								                "library(\"spacyr\")",
 								                "spacy_initialize()",
 								                "",
 								                "txt <- c(d1 = \"spaCy excels at large-scale information extraction tasks.\",",
 								                "         d2 = \"Mr. Smith goes to North Carolina.\")",
 								                "",
 								                "# process documents and obtain a data.table",
 								                "parsedtxt <- spacy_parse(txt)"
 								            ],
 								            "code_language": "r",
 								            "author": "Kenneth Benoit & Aki Matsuo",
 								            "category": ["nonpython"]
 								        },
 								        {
 								            "id": "cleannlp",
 								            "title": "CleanNLP",
 								            "slogan": "A tidy data model for NLP in R",
 								            "description": "The cleanNLP package is designed to make it as painless as possible to turn raw text into feature-rich data frames. the package offers four backends that can be used for parsing text: `tokenizers`, `udpipe`, `spacy` and `corenlp`.",
 								            "github": "statsmaths/cleanNLP",
 								            "cran": "cleanNLP",
 								            "author": "Taylor B. Arnold",
 								            "author_links": {
 								                "github": "statsmaths"
 								            },
 								            "category": ["nonpython"]
 								        },
 								        {
 								            "id": "spacy-cpp",
 								            "slogan": "C++ wrapper library for spaCy",
 								            "description": "The goal of spacy-cpp is to expose the functionality of spaCy to C++ applications, and to provide an API that is similar to that of spaCy, enabling rapid development in Python and simple porting to C++.",
 								            "github": "d99kris/spacy-cpp",
 								            "code_example": [
 								                "Spacy::Spacy spacy;",
 								                "auto nlp = spacy.load(\"en_core_web_sm\");",
 								                "auto doc = nlp.parse(\"This is a sentence.\");",
 								                "for (auto& token : doc.tokens())",
 								                "    std::cout << token.text() << \" [\" << token.pos_() << \"]\\n\";"
 								            ],
 								            "code_language": "cpp",
 								            "author": "Kristofer Berggren",
 								            "author_links": {
 								                "github": "d99kris"
 								            },
 								            "category": ["nonpython"]
 								        },
-												Update universe.json
											
										
										
											2021-07-04 02:44:39 +03:00
+								        {
 								            "id": "ruby-spacy",
 								            "title": "ruby-spacy",
 								            "slogan": "Wrapper module for using spaCy from Ruby via PyCall",
 								            "description": "ruby-spacy is a wrapper module for using spaCy from the Ruby programming language via PyCall. This module aims to make it easy and natural for Ruby programmers to use spaCy.",
-												Github repo info fixed for ruby-spacy
											
										
										
											2021-07-04 12:55:17 +03:00
+								            "github": "yohasebe/ruby-spacy",
-												Update universe.json
											
										
										
											2021-07-04 02:44:39 +03:00
+								            "code_example": [
 								                "require \"ruby-spacy\"",
 								                "require \"terminal-table\"",
 								                "nlp = Spacy::Language.new(\"en_core_web_sm\")",
 								                "doc = nlp.read(\"Apple is looking at buying U.K. startup for $1 billion\")",
 								                "headings = [\"text\", \"lemma\", \"pos\", \"tag\", \"dep\"]",
 								                "rows = []",
 								                "doc.each do |token|",
 								                "  rows << [token.text, token.lemma, token.pos, token.tag, token.dep]",
 								                "end",
 								                "table = Terminal::Table.new rows: rows, headings: headings",
 								                "puts table"
 								            ],
 								            "code_language": "ruby",
 								            "url": "https://rubygems.org/gems/ruby-spacy",
 								            "author": "Yoichiro Hasebe",
 								            "author_links": {
 								                "github": "yohasebe",
 								                "twitter": "yohasebe"
 								            },
 								            "category": ["nonpython"],
 								            "tags": ["ruby"]
-												Fix JSON [ci skip]

											
										
										
											2021-07-18 06:21:33 +03:00
+								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "id": "spacy_api",
 								            "slogan": "Server/client to load models in a separate, dedicated process",
 								            "github": "kootenpv/spacy_api",
 								            "pip": "spacy_api",
 								            "code_example": [
 								                "from spacy_api import Client",
 								                "",
 								                "spacy_client = Client() # default args host/port",
 								                "doc = spacy_client.single(\"How are you\")"
 								            ],
 								            "author": "Pascal van Kooten",
 								            "author_links": {
 								                "github": "kootenpv"
 								            },
 								            "category": ["apis"]
 								        },
 								        {
 								            "id": "spacy-api-docker",
 								            "slogan": "spaCy REST API, wrapped in a Docker container",
 								            "github": "jgontrum/spacy-api-docker",
 								            "url": "https://hub.docker.com/r/jgontrum/spacyapi/",
 								            "thumb": "https://i.imgur.com/NRnDKyj.jpg",
 								            "code_example": [
 								                "version: '2'",
 								                "",
 								                "services:",
 								                "  spacyapi:",
 								                "    image: jgontrum/spacyapi:en_v2",
 								                "    ports:",
 								                "      - \"127.0.0.1:8080:80\"",
 								                "    restart: always"
 								            ],
 								            "code_language": "docker",
 								            "author": "Johannes Gontrum",
 								            "author_links": {
 								                "github": "jgontrum"
 								            },
 								            "category": ["apis"]
 								        },
 								        {
 								            "id": "spacy-nlp",
 								            "slogan": " Expose spaCy NLP text parsing to Node.js (and other languages) via Socket.IO",
 								            "github": "kengz/spacy-nlp",
 								            "thumb": "https://i.imgur.com/w41VSr7.jpg",
 								            "code_example": [
 								                "const spacyNLP = require(\"spacy-nlp\")",
 								                "// default port 6466",
 								                "// start the server with the python client that exposes spacyIO (or use an existing socketIO server at IOPORT)",
 								                "var serverPromise = spacyNLP.server({ port: process.env.IOPORT });",
 								                "// Loading spacy may take up to 15s"
 								            ],
 								            "code_language": "javascript",
 								            "author": "Wah Loon Keng",
 								            "author_links": {
 								                "github": "kengz"
 								            },
 								            "category": ["apis", "nonpython"]
 								        },
 								        {
 								            "id": "prodigy",
 								            "title": "Prodigy",
 								            "slogan": "Radically efficient machine teaching, powered by active learning",
 								            "description": "Prodigy is an annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster. Stream in your own examples or real-world data from live APIs, update your model in real-time and chain models together to build more complex systems.",
 								            "thumb": "https://i.imgur.com/UVRtP6g.jpg",
 								            "image": "https://i.imgur.com/Dt5vrY6.png",
 								            "url": "https://prodi.gy",
 								            "code_example": [
 								                "prodigy dataset ner_product \"Improve PRODUCT on Reddit data\"",
 								                "✨ Created dataset 'ner_product'.",
 								                "",
 								                "prodigy ner.teach ner_product en_core_web_sm ~/data.jsonl --label PRODUCT",
 								                "✨ Starting the web server on port 8080..."
 								            ],
 								            "code_language": "bash",
-												Add ExcelCy into Universe list (#2572)

Hi guys,

This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.

## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.

### Types of change
Update to Universe list in website.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-07-19 20:28:33 +03:00
+								            "category": ["standalone", "training"],
-												Update universe.json [ci skip]

											
										
										
											2019-08-05 15:30:07 +03:00
+								            "author": "Explosion",
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								            "author_links": {
 								                "twitter": "explosion_ai",
 								                "github": "explosion",
 								                "website": "https://explosion.ai"
 								            }
 								        },
 								        {
 								            "id": "dragonfire",
 								            "title": "Dragonfire",
 								            "slogan": "An open-source virtual assistant for Ubuntu based Linux distributions",
 								            "github": "DragonComputer/Dragonfire",
 								            "thumb": "https://i.imgur.com/5fqguKS.jpg",
 								            "image": "https://raw.githubusercontent.com/DragonComputer/Dragonfire/master/docs/img/demo.gif",
 								            "author": "Dragon Computer",
 								            "author_links": {
 								                "github": "DragonComputer",
 								                "website": "http://dragon.computer"
 								            },
 								            "category": ["standalone"]
 								        },
-												Update universe [ci skip]

											
										
										
											2019-06-02 14:55:55 +03:00
+								        {
 								            "id": "prefect",
 								            "title": "Prefect",
 								            "slogan": "Workflow management system designed for modern infrastructure",
 								            "github": "PrefectHQ/prefect",
 								            "pip": "prefect",
 								            "thumb": "https://i.imgur.com/oLTwr0e.png",
 								            "code_example": [
 								                "from prefect import Flow",
 								                "from prefect.tasks.spacy.spacy_tasks import SpacyNLP",
 								                "import spacy",
 								                "",
 								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "",
 								                "with Flow(\"Natural Language Processing\") as flow:",
 								                "    doc = SpacyNLP(text=\"This is some text\", nlp=nlp)",
 								                "",
 								                "flow.run()"
 								            ],
 								            "author": "Prefect",
 								            "author_links": {
 								                "website": "https://prefect.io"
 								            },
 								            "category": ["standalone"]
 								        },
 								        {
 								            "id": "graphbrain",
 								            "title": "Graphbrain",
 								            "slogan": "Automated meaning extraction and text understanding",
 								            "description": "Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is to facilitate automated meaning extraction and text understanding, as well as the exploration and inference of knowledge.",
 								            "github": "graphbrain/graphbrain",
 								            "pip": "graphbrain",
 								            "thumb": "https://i.imgur.com/cct9W1E.png",
 								            "author": "Graphbrain",
 								            "category": ["standalone"]
 								        },
-												Update universe.json (#5022)

e-book is available from https://nostarch.com/NLPPython
											
										
										
											2020-02-15 17:44:55 +03:00
+								        {
 								            "type": "education",
 								            "id": "nostarch-nlp-python",
 								            "title": "Natural Language Processing Using Python",
 								            "slogan": "No Starch Press, 2020",
 								            "description": "Natural Language Processing Using Python is an introduction to natural language processing (NLP), the task of converting human language into data that a computer can process. The book uses spaCy, a leading Python library for NLP, to guide readers through common NLP tasks related to generating and understanding human language with code. It addresses problems like understanding a user's intent, continuing a conversation with a human, and maintaining the state of a conversation.",
-												Auto-format and fix image [ci skip]

											
										
										
											2020-02-23 15:56:50 +03:00
+								            "cover": "https://i.imgur.com/w0iycjl.jpg",
-												Update universe.json (#5022)

e-book is available from https://nostarch.com/NLPPython
											
										
										
											2020-02-15 17:44:55 +03:00
+								            "url": "https://nostarch.com/NLPPython",
 								            "author": "Yuli Vasiliev",
 								            "category": ["books"]
-												Auto-format and fix image [ci skip]

											
										
										
											2020-02-23 15:56:50 +03:00
+								        },
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        {
 								            "type": "education",
 								            "id": "oreilly-python-ds",
 								            "title": "Introduction to Machine Learning with Python: A Guide for Data Scientists",
 								            "slogan": "O'Reilly, 2016",
 								            "description": "Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.",
 								            "cover": "https://covers.oreillystatic.com/images/0636920030515/lrg.jpg",
 								            "url": "http://shop.oreilly.com/product/0636920030515.do",
 								            "author": "Andreas Müller, Sarah Guido",
 								            "category": ["books"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "text-analytics-python",
 								            "title": "Text Analytics with Python",
 								            "slogan": "Apress / Springer, 2016",
 								            "description": "*Text Analytics with Python* teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.",
 								            "github": "dipanjanS/text-analytics-with-python",
 								            "cover": "https://i.imgur.com/AOmzZu8.png",
 								            "url": "https://www.amazon.com/Text-Analytics-Python-Real-World-Actionable/dp/148422387X",
 								            "author": "Dipanjan Sarkar",
 								            "category": ["books"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "practical-ml-python",
 								            "title": "Practical Machine Learning with Python",
 								            "slogan": "Apress, 2017",
 								            "description": "Master the essential skills needed to recognize and solve complex problems with machine learning and deep learning. Using real-world examples that leverage the popular Python machine learning ecosystem, this book is your perfect companion for learning the art and science of machine learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute machine learning systems and projects successfully.",
 								            "github": "dipanjanS/practical-machine-learning-with-python",
 								            "cover": "https://i.imgur.com/5F4mkt7.jpg",
 								            "url": "https://www.amazon.com/Practical-Machine-Learning-Python-Problem-Solvers/dp/1484232062",
 								            "author": "Dipanjan Sarkar, Raghav Bali, Tushar Sharma",
 								            "category": ["books"]
 								        },
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								        {
 								            "type": "education",
 								            "id": "packt-nlp-computational-linguistics",
 								            "title": "Natural Language Processing and Computational Linguistics",
 								            "slogan": "Packt, 2018",
 								            "description": "This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.",
 								            "cover": "https://i.imgur.com/aleMf1Y.jpg",
 								            "url": "https://www.amazon.com/Natural-Language-Processing-Computational-Linguistics-ebook/dp/B07BWH779J",
 								            "author": "Bhargav Srinivasa-Desikan",
 								            "category": ["books"]
 								        },
-												updated unv json for new book
											
										
										
											2021-08-09 13:39:22 +03:00
+								        {
 								            "type": "education",
 								            "id": "mastering-spacy",
 								            "title": "Mastering spaCy",
 								            "slogan": "Packt, 2021",
 								            "description": "This is your ultimate spaCy book. Master the crucial skills to use spaCy components effectively to create real-world NLP applications with spaCy. Explaining linguistic concepts such as dependency parsing, POS-tagging and named entity extraction with many examples, this book will help you to conquer computational linguistics with spaCy. The book further focuses on ML topics with Keras and Tensorflow. You'll cover popular topics, including intent recognition, sentiment analysis and context resolution; and use them on popular datasets and interpret the results. A special hands-on section on chatbot design is included.",
 								            "github": "PacktPublishing/Mastering-spaCy",
 								            "cover": "https://tinyimg.io/i/aWEm0dh.jpeg",
 								            "url": "https://www.amazon.com/Mastering-spaCy-end-end-implementing/dp/1800563353",
 								            "author": "Duygu Altinok",
 								            "author_links": {
 								                "github": "DuyguA",
 								                "website": "https://www.linkedin.com/in/duygu-altinok-4021389a"
 								            },
 								            "category": ["books"]
 								        },
-												Add projects to spaCy Universe (#9269)

* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-09-23 11:56:45 +03:00
+								        {
 								            "type": "education",
 								            "id": "applied-nlp-in-enterprise",
 								            "title": "Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand",
 								            "slogan": "O'Reilly, 2021",
 								            "description": "Natural language processing (NLP) is one of the hottest topics in AI today. Having lagged behind other deep learning fields such as computer vision for years, NLP only recently gained mainstream popularity. Even though Google, Facebook, and OpenAI have open sourced large pretrained language models to make NLP easier, many organizations today still struggle with developing and productionizing NLP applications. This hands-on guide helps you learn the field quickly.",
 								            "github": "nlpbook/nlpbook",
 								            "cover": "https://i.imgur.com/6RxLBvf.jpg",
 								            "url": "https://www.amazon.com/dp/149206257X",
 								            "author": "Ankur A. Patel",
 								            "author_links": {
 								                "github": "aapatel09",
 								                "website": "https://www.ankurapatel.io"
 								            },
 								            "category": ["books"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "introduction-into-spacy-3",
 								            "title": "Introduction to spaCy 3",
 								            "slogan": "A free course for beginners by Dr. W.J.B. Mattingly",
 								            "url": "http://spacy.pythonhumanities.com/",
 								            "thumb": "https://spacy.pythonhumanities.com/_static/freecodecamp_small.jpg",
 								            "author": "Dr. W.J.B. Mattingly",
 								            "category": ["courses"]
 								        },
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								        {
 								            "type": "education",
 								            "id": "spacy-course",
 								            "title": "Advanced NLP with spaCy",
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								            "slogan": "A free online course",
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								            "description": "In this free interactive course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
 								            "url": "https://course.spacy.io",
 								            "image": "https://i.imgur.com/JC00pHW.jpg",
 								            "thumb": "https://i.imgur.com/5RXLtrr.jpg",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            },
 								            "category": ["courses"]
 								        },
-												add entry for Applied Language Technology under "Courses" (#9755)

Added the following entry into `universe.json`:

```
        {
            "type": "education",
            "id": "applt-course",
            "title": "Applied Language Technology",
            "slogan": "NLP for newcomers using spaCy and Stanza",
            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
            "url": "https://applied-language-technology.readthedocs.io/",
            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
            "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
            "author": "Tuomo Hiippala",
            "author_links": {
                "twitter": "tuomo_h",
                "github": "thiippal",
                "website": "https://www.mv.helsinki.fi/home/thiippal/"
            },
            "category": ["courses"]
        },
```
											
										
										
											2021-11-28 13:33:16 +03:00
+								        {
 								            "type": "education",
 								            "id": "applt-course",
 								            "title": "Applied Language Technology",
 								            "slogan": "NLP for newcomers using spaCy and Stanza",
 								            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
-												Update the entry for Applied Language Technology in spaCy Universe (#10068)

* add entry for Applied Language Technology under "Courses"

Added the following entry into `universe.json`:

```
        {
            "type": "education",
            "id": "applt-course",
            "title": "Applied Language Technology",
            "slogan": "NLP for newcomers using spaCy and Stanza",
            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
            "url": "https://applied-language-technology.readthedocs.io/",
            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
            "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
            "author": "Tuomo Hiippala",
            "author_links": {
                "twitter": "tuomo_h",
                "github": "thiippal",
                "website": "https://www.mv.helsinki.fi/home/thiippal/"
            },
            "category": ["courses"]
        },
```

* Update the entry for "Applied Language Technology"
											
										
										
											2022-01-17 10:28:51 +03:00
+								            "url": "https://applied-language-technology.mooc.fi",
-												add entry for Applied Language Technology under "Courses" (#9755)

Added the following entry into `universe.json`:

```
        {
            "type": "education",
            "id": "applt-course",
            "title": "Applied Language Technology",
            "slogan": "NLP for newcomers using spaCy and Stanza",
            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
            "url": "https://applied-language-technology.readthedocs.io/",
            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
            "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
            "author": "Tuomo Hiippala",
            "author_links": {
                "twitter": "tuomo_h",
                "github": "thiippal",
                "website": "https://www.mv.helsinki.fi/home/thiippal/"
            },
            "category": ["courses"]
        },
```
											
										
										
											2021-11-28 13:33:16 +03:00
+								            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
-												Update the entry for Applied Language Technology in spaCy Universe (#10068)

* add entry for Applied Language Technology under "Courses"

Added the following entry into `universe.json`:

```
        {
            "type": "education",
            "id": "applt-course",
            "title": "Applied Language Technology",
            "slogan": "NLP for newcomers using spaCy and Stanza",
            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
            "url": "https://applied-language-technology.readthedocs.io/",
            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
            "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
            "author": "Tuomo Hiippala",
            "author_links": {
                "twitter": "tuomo_h",
                "github": "thiippal",
                "website": "https://www.mv.helsinki.fi/home/thiippal/"
            },
            "category": ["courses"]
        },
```

* Update the entry for "Applied Language Technology"
											
										
										
											2022-01-17 10:28:51 +03:00
+								            "thumb": "https://www.mv.helsinki.fi/home/thiippal/images/applt-logo.png",
-												add entry for Applied Language Technology under "Courses" (#9755)

Added the following entry into `universe.json`:

```
        {
            "type": "education",
            "id": "applt-course",
            "title": "Applied Language Technology",
            "slogan": "NLP for newcomers using spaCy and Stanza",
            "description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
            "url": "https://applied-language-technology.readthedocs.io/",
            "image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
            "thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
            "author": "Tuomo Hiippala",
            "author_links": {
                "twitter": "tuomo_h",
                "github": "thiippal",
                "website": "https://www.mv.helsinki.fi/home/thiippal/"
            },
            "category": ["courses"]
        },
```
											
										
										
											2021-11-28 13:33:16 +03:00
+								            "author": "Tuomo Hiippala",
 								            "author_links": {
 								                "twitter": "tuomo_h",
 								                "github": "thiippal",
 								                "website": "https://www.mv.helsinki.fi/home/thiippal/"
 								            },
 								            "category": ["courses"]
 								        },
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-spacys-ner-model",
 								            "title": "spaCy's NER model",
 								            "slogan": "Incremental parsing with bloom embeddings and residual CNNs",
 								            "description": "spaCy v2.0's Named Entity Recognition system features a sophisticated word embedding strategy using subword features and \"Bloom\" embeddings, a deep convolutional neural network with residual connections, and a novel transition-based approach to named entity parsing. The system is designed to give a good balance of efficiency, accuracy and adaptability. In this talk, I sketch out the components of the system, explaining the intuition behind the various choices. I also give a brief introduction to the named entity recognition problem, with an overview of what else Explosion AI is working on, and why.",
 								            "youtube": "sqDHBH9IjRU",
 								            "author": "Matthew Honnibal",
 								            "author_links": {
 								                "twitter": "honnibal",
 								                "github": "honnibal",
 								                "website": "https://explosion.ai"
 								            },
 								            "category": ["videos"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-new-nlp-solutions",
 								            "title": "Building new NLP solutions with spaCy and Prodigy",
 								            "slogan": "PyData Berlin 2018",
 								            "description": "In this talk, I will discuss how to address some of the most likely causes of failure for new Natural Language Processing (NLP) projects. My main recommendation is to take an iterative approach: don't assume you know what your pipeline should look like, let alone your annotation schemes or model architectures.",
 								            "author": "Matthew Honnibal",
 								            "author_links": {
 								                "twitter": "honnibal",
 								                "github": "honnibal",
 								                "website": "https://explosion.ai"
 								            },
 								            "youtube": "jpWqz85F_4Y",
 								            "category": ["videos"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-modern-nlp-in-python",
 								            "title": "Modern NLP in Python",
 								            "slogan": "PyData DC 2016",
 								            "description": "Academic and industry research in Natural Language Processing (NLP) has progressed at an accelerating pace over the last several years. Members of the Python community have been hard at work moving cutting-edge research out of papers and into open source, \"batteries included\" software libraries that can be applied to practical problems. We'll explore some of these tools for modern NLP in Python.",
 								            "author": "Patrick Harrison",
 								            "youtube": "6zm9NC9uRkk",
 								            "category": ["videos"]
 								        },
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-spacy-course",
 								            "title": "Advanced NLP with spaCy · A free online course",
 								            "description": "spaCy is a modern Python library for industrial-strength Natural Language Processing. In this free and interactive online course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
 								            "url": "https://course.spacy.io/en",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines"
 								            },
 								            "youtube": "THduWAnG97k",
 								            "category": ["videos"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-spacy-course-de",
 								            "title": "Modernes NLP mit spaCy · Ein Gratis-Onlinekurs",
 								            "description": "spaCy ist eine moderne Python-Bibliothek für industriestarkes Natural Language Processing. In diesem kostenlosen und interaktiven Onlinekurs lernst du, mithilfe von spaCy fortgeschrittene Systeme für die Analyse natürlicher Sprache zu entwickeln und dabei sowohl regelbasierte Verfahren, als auch moderne Machine-Learning-Technologie einzusetzen.",
 								            "url": "https://course.spacy.io/de",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines"
 								            },
 								            "youtube": "K1elwpgDdls",
 								            "category": ["videos"]
 								        },
-												Fix and update universe.json [ci skip]

											
										
										
											2020-07-07 22:12:28 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-spacy-course-es",
 								            "title": "NLP avanzado con spaCy · Un curso en línea gratis",
 								            "description": "spaCy es un paquete moderno de Python para hacer Procesamiento de Lenguaje Natural de potencia industrial. En este curso en línea, interactivo y gratuito, aprenderás a usar spaCy para construir sistemas avanzados de comprensión de lenguaje natural usando enfoques basados en reglas y en machine learning.",
 								            "url": "https://course.spacy.io/es",
 								            "author": "Camila Gutiérrez",
 								            "author_links": {
 								                "twitter": "Mariacamilagl30"
 								            },
 								            "youtube": "RNiLVCE5d4k",
 								            "category": ["videos"]
 								        },
-												Update universe with videos [ci skip]

											
										
										
											2019-08-21 22:35:37 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-intro-to-nlp-episode-1",
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								            "title": "Intro to NLP with spaCy (1)",
-												Update universe with videos [ci skip]

											
										
										
											2019-08-21 22:35:37 +03:00
+								            "slogan": "Episode 1: Data exploration",
 								            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
 								            "author": "Vincent Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning"
 								            },
 								            "youtube": "WnGPv6HnBok",
 								            "category": ["videos"]
 								        },
-												Update universe.json [ci skip]

											
										
										
											2019-09-30 14:49:44 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-intro-to-nlp-episode-2",
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								            "title": "Intro to NLP with spaCy (2)",
-												Update universe.json [ci skip]

											
										
										
											2019-09-30 14:49:44 +03:00
+								            "slogan": "Episode 2: Rule-based Matching",
 								            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
 								            "author": "Vincent Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning"
 								            },
 								            "youtube": "KL4-Mpgbahw",
 								            "category": ["videos"]
 								        },
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-intro-to-nlp-episode-3",
 								            "title": "Intro to NLP with spaCy (3)",
 								            "slogan": "Episode 2: Evaluation",
 								            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
 								            "author": "Vincent Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning"
 								            },
 								            "youtube": "4V0JDdohxAk",
 								            "category": ["videos"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-intro-to-nlp-episode-4",
 								            "title": "Intro to NLP with spaCy (4)",
 								            "slogan": "Episode 4: Named Entity Recognition",
 								            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
 								            "author": "Vincent Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning"
 								            },
 								            "youtube": "IqOJU1-_Fi0",
 								            "category": ["videos"]
 								        },
-												Fix and update universe.json [ci skip]

											
										
										
											2020-07-07 22:12:28 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-intro-to-nlp-episode-5",
 								            "title": "Intro to NLP with spaCy (5)",
 								            "slogan": "Episode 5: Rules vs. Machine Learning",
 								            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
 								            "author": "Vincent Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning"
 								            },
 								            "youtube": "f4sqeLRzkPg",
 								            "category": ["videos"]
 								        },
-												Update universe.json to Include spaCy video #6 (#10723)

* Update universe.json

I noticed that episode 6 was missing, so I added it.

* Update universe.json

* Update universe.json
											
										
										
											2022-05-02 14:35:14 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-intro-to-nlp-episode-6",
 								            "title": "Intro to NLP with spaCy (6)",
 								            "slogan": "Episode 6: Moving to spaCy v3",
 								            "description": "In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.",
 								            "author": "Vincent Warmerdam",
 								            "author_links": {
 								                "twitter": "fishnets88",
 								                "github": "koaning"
 								            },
 								            "youtube": "k77RrmMaKEI",
 								            "category": ["videos"]
 								        },
-												Update universe with videos [ci skip]

											
										
										
											2019-08-21 22:35:37 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-spacy-irl-entity-linking",
 								            "title": "Entity Linking functionality in spaCy",
 								            "slogan": "spaCy IRL 2019",
 								            "url": "https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc",
 								            "author": "Sofie Van Landeghem",
 								            "author_links": {
 								                "twitter": "OxyKodit",
 								                "github": "svlandeg"
 								            },
 								            "youtube": "PW3RJM8tDGo",
 								            "category": ["videos"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-spacy-irl-lemmatization",
 								            "title": "Rethinking rule-based lemmatization",
 								            "slogan": "spaCy IRL 2019",
 								            "url": "https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc",
 								            "author": "Guadalupe Romero",
 								            "author_links": {
 								                "twitter": "_guadiromero",
 								                "github": "guadi1994"
 								            },
 								            "youtube": "88zcQODyuko",
 								            "category": ["videos"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "video-spacy-irl-scispacy",
 								            "title": "ScispaCy: A spaCy pipeline & models for scientific & biomedical text",
 								            "slogan": "spaCy IRL 2019",
 								            "url": "https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc",
 								            "author": "Mark Neumann",
 								            "author_links": {
 								                "twitter": "MarkNeumannnn",
 								                "github": "DeNeutoy"
 								            },
 								            "youtube": "2_HSKDALwuw",
 								            "category": ["videos"]
 								        },
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								        {
 								            "type": "education",
 								            "id": "podcast-nlp-highlights",
-												Update universe [ci skip]

											
										
										
											2019-06-04 12:15:51 +03:00
+								            "title": "NLP Highlights #78: Where do corpora come from?",
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								            "slogan": "January 2019",
 								            "description": "Most NLP projects rely crucially on the quality of annotations used for training and evaluating models. In this episode, Matt and Ines of Explosion AI tell us how Prodigy can improve data annotation and model development workflows. Prodigy is an annotation tool implemented as a python library, and it comes with a web application and a command line interface. A developer can define input data streams and design simple annotation interfaces. Prodigy can help break down complex annotation decisions into a series of binary decisions, and it provides easy integration with spaCy models. Developers can specify how models should be modified as new annotations come in in an active learning framework.",
 								            "soundcloud": "559200912",
 								            "thumb": "https://i.imgur.com/hOBQEzc.jpg",
-												Update universe [ci skip]

											
										
										
											2019-03-12 13:13:03 +03:00
+								            "url": "https://soundcloud.com/nlp-highlights/78-where-do-corpora-come-from-with-matt-honnibal-and-ines-montani",
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								            "author": "Matt Gardner, Waleed Ammar (Allen AI)",
 								            "author_links": {
 								                "website": "https://soundcloud.com/nlp-highlights"
 								            },
 								            "category": ["podcasts"]
 								        },
 								        {
 								            "type": "education",
 								            "id": "podcast-init",
-												Update universe [ci skip]

											
										
										
											2019-06-04 12:15:51 +03:00
+								            "title": "Podcast.__init__ #87: spaCy with Matthew Honnibal",
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								            "slogan": "December 2017",
-												Normalize spelling for spaCy (#5822)


											
										
										
											2020-07-27 11:09:33 +03:00
+								            "description": "As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of spaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.",
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								            "iframe": "https://www.pythonpodcast.com/wp-content/plugins/podlove-podcasting-plugin-for-wordpress/lib/modules/podlove_web_player/player_v4/dist/share.html?episode=https://www.pythonpodcast.com/?podlove_player4=176",
 								            "iframe_height": 200,
 								            "thumb": "https://i.imgur.com/rpo6BuY.png",
-												Update universe [ci skip]

											
										
										
											2019-03-12 13:13:03 +03:00
+								            "url": "https://www.podcastinit.com/episode-87-spacy-with-matthew-honnibal/",
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								            "author": "Tobias Macey",
 								            "author_links": {
 								                "website": "https://www.podcastinit.com"
 								            },
 								            "category": ["podcasts"]
 								        },
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								        {
 								            "type": "education",
 								            "id": "podcast-init2",
 								            "title": "Podcast.__init__ #256: An Open Source Toolchain For NLP From Explosion AI",
 								            "slogan": "March 2020",
-												Normalize spelling for spaCy (#5822)


											
										
										
											2020-07-27 11:09:33 +03:00
+								            "description": "The state of the art in natural language processing is a constantly moving target. With the rise of deep learning, previously cutting edge techniques have given way to robust language models. Through it all the team at Explosion AI have built a strong presence with the trifecta of spaCy, Thinc, and Prodigy to support fast and flexible data labeling to feed deep learning models and performant and scalable text processing. In this episode founder and open source author Matthew Honnibal shares his experience growing a business around cutting edge open source libraries for the machine learning developent process.",
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								            "iframe": "https://cdn.podlove.org/web-player/share.html?episode=https%3A%2F%2Fwww.pythonpodcast.com%2F%3Fpodlove_player4%3D614",
 								            "iframe_height": 200,
 								            "thumb": "https://i.imgur.com/rpo6BuY.png",
 								            "url": "https://www.pythonpodcast.com/explosion-ai-natural-language-processing-episode-256/",
 								            "author": "Tobias Macey",
 								            "author_links": {
 								                "website": "https://www.podcastinit.com"
 								            },
 								            "category": ["podcasts"]
 								        },
-												Update universe [ci skip]

											
										
										
											2019-03-12 13:13:03 +03:00
+								        {
 								            "type": "education",
 								            "id": "talk-python-podcast",
-												Update universe [ci skip]

											
										
										
											2019-06-04 12:15:51 +03:00
+								            "title": "Talk Python #202: Building a software business",
-												Update universe [ci skip]

											
										
										
											2019-03-12 13:13:03 +03:00
+								            "slogan": "March 2019",
 								            "description": "One core question around open source is how do you fund it? Well, there is always that PayPal donate button. But that's been a tremendous failure for many projects. Often the go-to answer is consulting. But what if you don't want to trade time for money? You could take things up a notch and change the equation, exchanging value for money. That's what Ines Montani and her co-founder did when they started Explosion AI with spaCy as the foundation.",
 								            "thumb": "https://i.imgur.com/q1twuK8.png",
 								            "url": "https://talkpython.fm/episodes/show/202/building-a-software-business",
 								            "soundcloud": "588364857",
 								            "author": "Michael Kennedy",
 								            "author_links": {
 								                "website": "https://talkpython.fm/"
 								            },
 								            "category": ["podcasts"]
 								        },
-												Add TWiML podcast to universe [ci skip]

											
										
										
											2019-05-11 18:48:22 +03:00
+								        {
 								            "type": "education",
 								            "id": "twimlai-podcast",
 								            "title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
 								            "slogan": "May 2019",
-												Normalize spelling for spaCy (#5822)


											
										
										
											2020-07-27 11:09:33 +03:00
+								            "description": "\"Ines and I caught up to discuss her various projects, including the aforementioned spaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the spaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
-												Add TWiML podcast to universe [ci skip]

											
										
										
											2019-05-11 18:48:22 +03:00
+								            "thumb": "https://i.imgur.com/ng2F5gK.png",
 								            "url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
 								            "iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
 								            "iframe_height": 90,
 								            "author": "Sam Charrington",
 								            "author_links": {
 								                "website": "https://twimlai.com"
 								            },
 								            "category": ["podcasts"]
-												Update universe [ci skip]

											
										
										
											2019-06-03 13:19:13 +03:00
+								        },
 								        {
 								            "type": "education",
 								            "id": "analytics-vidhya",
 								            "title": "DataHack Radio #23: The Brains behind spaCy",
 								            "slogan": "June 2019",
 								            "description": "\"What would you do if you had the chance to pick the brains behind one of the most popular Natural Language Processing (NLP) libraries of our era? A library that has helped usher in the current boom in NLP applications and nurtured tons of NLP scientists? Well – you invite the creators on our popular DataHack Radio podcast and let them do the talking! We are delighted to welcome Ines Montani and Matt Honnibal, the developers of spaCy – a powerful and advanced library for NLP.\"",
 								            "thumb": "https://i.imgur.com/3zJKZ1P.jpg",
 								            "url": "https://www.analyticsvidhya.com/blog/2019/06/datahack-radio-ines-montani-matthew-honnibal-brains-behind-spacy/",
 								            "soundcloud": "630741825",
 								            "author": "Analytics Vidhya",
 								            "author_links": {
 								                "website": "https://www.analyticsvidhya.com",
 								                "twitter": "analyticsvidhya"
 								            },
 								            "category": ["podcasts"]
-												Add TWiML podcast to universe [ci skip]

											
										
										
											2019-05-11 18:48:22 +03:00
+								        },
-												Update universe [ci skip]

											
										
										
											2019-12-13 17:57:39 +03:00
+								        {
 								            "type": "education",
 								            "id": "practical-ai-podcast",
 								            "title": "Practical AI: Modern NLP with spaCy",
 								            "slogan": "December 2019",
-												Normalize spelling for spaCy (#5822)


											
										
										
											2020-07-27 11:09:33 +03:00
+								            "description": "\"spaCy is awesome for NLP! It’s easy to use, has widespread adoption, is open source, and integrates the latest language models. Ines Montani and Matthew Honnibal (core developers of spaCy and co-founders of Explosion) join us to discuss the history of the project, its capabilities, and the latest trends in NLP. We also dig into the practicalities of taking NLP workflows to production. You don’t want to miss this episode!\"",
-												Update universe [ci skip]

											
										
										
											2019-12-13 17:57:39 +03:00
+								            "thumb": "https://i.imgur.com/jn8Bcdw.png",
 								            "url": "https://changelog.com/practicalai/68",
 								            "author": "Daniel Whitenack & Chris Benson",
 								            "author_links": {
 								                "website": "https://changelog.com/practicalai",
-												Fix some of the broken links on universe pages (#11011)

Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
											
										
										
											2022-06-23 18:53:00 +03:00
+								                "twitter": "PracticalAIFM"
-												Update universe [ci skip]

											
										
										
											2019-12-13 17:57:39 +03:00
+								            },
 								            "category": ["podcasts"]
 								        },
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								        {
 								            "type": "education",
 								            "id": "video-entity-linking",
 								            "title": "Training a custom entity linking mode with spaCy",
 								            "author": "Sofie Van Landeghem",
 								            "author_links": {
 								                "twitter": "OxyKodit",
 								                "github": "svlandeg"
 								            },
 								            "youtube": "8u57WSXVpmw",
 								            "category": ["videos"]
 								        },
-												Add self-attentive-parser to universe (see #59)

											
										
										
											2018-05-30 14:31:28 +03:00
+								        {
 								            "id": "self-attentive-parser",
 								            "title": "Berkeley Neural Parser",
 								            "slogan": "Constituency Parsing with a Self-Attentive Encoder (ACL 2018)",
 								            "description": "A Python implementation of the parsers described in *\"Constituency Parsing with a Self-Attentive Encoder\"* from ACL 2018.",
 								            "url": "https://arxiv.org/abs/1805.01052",
 								            "github": "nikitakit/self-attentive-parser",
 								            "pip": "benepar",
 								            "code_example": [
-												benepar usage example has deprecated imports
											
										
										
											2021-08-28 14:05:58 +03:00
+								                "import benepar, spacy",
 								                "nlp = spacy.load('en_core_web_md')",
 								                "nlp.add_pipe('benepar', config={'model': 'benepar_en3'})",
-												Avoid a SyntaxError in self-attentive-parser  (#6428)

* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
											
										
										
											2020-11-22 23:59:37 +03:00
+								                "doc = nlp('The time for action is now. It is never too late to do something.')",
-												Add self-attentive-parser to universe (see #59)

											
										
										
											2018-05-30 14:31:28 +03:00
+								                "sent = list(doc.sents)[0]",
 								                "print(sent._.parse_string)",
 								                "# (S (NP (NP (DT The) (NN time)) (PP (IN for) (NP (NN action)))) (VP (VBZ is) (ADVP (RB now))) (. .))",
 								                "print(sent._.labels)",
 								                "# ('S',)",
 								                "print(list(sent._.children)[0])",
 								                "# The time for action"
 								            ],
 								            "author": "Nikita Kitaev",
 								            "author_links": {
 								                "github": "nikitakit",
-												Fix broken URL (#12176)


											
										
										
											2023-01-25 13:42:19 +03:00
+								                "website": "http://kitaev.io"
-												Add self-attentive-parser to universe (see #59)

											
										
										
											2018-05-30 14:31:28 +03:00
+								            },
 								            "category": ["research", "pipeline"]
-												Add ExcelCy into Universe list (#2572)

Hi guys,

This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made.

## Description
ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe.

### Types of change
Update to Universe list in website.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-07-19 20:28:33 +03:00
+								        },
-												Update universe [ci skip]

											
										
										
											2018-08-02 18:33:08 +03:00
+								        {
 								            "id": "spacy-graphql",
 								            "title": "spacy-graphql",
 								            "slogan": "Query spaCy's linguistic annotations using GraphQL",
 								            "github": "ines/spacy-graphql",
 								            "description": "A very simple and experimental app that lets you query spaCy's linguistic annotations using [GraphQL](https://graphql.org/). The API currently supports most token attributes, named entities, sentences and text categories (if available as `doc.cats`, i.e. if you added a text classifier to a model). The `meta` field will return the model meta data. Models are only loaded once and kept in memory.",
 								            "url": "https://explosion.ai/demos/spacy-graphql",
 								            "category": ["apis"],
 								            "tags": ["graphql"],
 								            "thumb": "https://i.imgur.com/xC7zpTO.png",
 								            "code_example": [
 								                "{",
 								                "  nlp(text: \"Zuckerberg is the CEO of Facebook.\", model: \"en_core_web_sm\") {",
 								                "    meta {",
 								                "      lang",
 								                "      description",
 								                "    }",
 								                "    doc {",
 								                "      text",
 								                "      tokens {",
 								                "        text",
 								                "        pos_",
 								                "      }",
 								                "      ents {",
 								                "        text",
 								                "        label_",
 								                "      }",
 								                "    }",
 								                "  }",
 								                "}"
 								            ],
 								            "code_language": "json",
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            }
-												Add spacy-js to universe [ci-skip]

											
										
										
											2018-11-06 14:45:03 +03:00
+								        },
 								        {
 								            "id": "spacy-js",
 								            "title": "spacy-js",
 								            "slogan": "JavaScript API for spaCy with Python REST API",
 								            "github": "ines/spacy-js",
 								            "description": "JavaScript interface for accessing linguistic annotations provided by spaCy. This project is mostly experimental and was developed for fun to play around with different ways of mimicking spaCy's Python API.\n\nThe results will still be computed in Python and made available via a REST API. The JavaScript API resembles spaCy's Python API as closely as possible (with a few exceptions, as the values are all pre-computed and it's tricky to express complex recursive relationships).",
 								            "code_language": "javascript",
 								            "code_example": [
 								                "const spacy = require('spacy');",
 								                "",
 								                "(async function() {",
 								                "    const nlp = spacy.load('en_core_web_sm');",
 								                "    const doc = await nlp('This is a text about Facebook.');",
 								                "    for (let ent of doc.ents) {",
 								                "        console.log(ent.text, ent.label);",
 								                "    }",
 								                "    for (let token of doc) {",
 								                "        console.log(token.text, token.pos, token.head.text);",
 								                "    }",
 								                "})();"
 								            ],
 								            "author": "Ines Montani",
 								            "author_links": {
 								                "twitter": "_inesmontani",
 								                "github": "ines",
 								                "website": "https://ines.io"
 								            },
 								            "category": ["nonpython"],
 								            "tags": ["javascript"]
-												Add spacy-raspberry to universe (closes #2889)

											
										
										
											2018-11-06 14:45:50 +03:00
+								        },
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								        {
 								            "id": "spacy-wordnet",
-												Update universe [ci skip]

											
										
										
											2018-11-26 16:16:22 +03:00
+								            "title": "spacy-wordnet",
-												Minor formatting changes [ci skip]

											
										
										
											2018-11-14 01:59:59 +03:00
+								            "slogan": "WordNet meets spaCy",
 								            "description": "`spacy-wordnet` creates annotations that easily allow the use of WordNet and [WordNet Domains](http://wndomains.fbk.eu/) by using the [NLTK WordNet interface](http://www.nltk.org/howto/wordnet.html)",
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								            "github": "recognai/spacy-wordnet",
 								            "tags": ["wordnet", "synsets"],
-												update spacy-wordnet code example (#8327)

* update spacy-wordnet code example

- include spaCy 2.x and 3.x init alternatives
- upgrade recognai logo

* fix escape chars
											
										
										
											2021-06-10 22:53:11 +03:00
+								            "thumb": "https://i.imgur.com/ud4C7cj.png",
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								            "code_example": [
 								                "import spacy",
-												typo fix for wordnet_annotator (#4326)


											
										
										
											2019-09-27 12:52:53 +03:00
+								                "from spacy_wordnet.wordnet_annotator import WordnetAnnotator ",
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								                "",
-												Fix example code for spacy-wordnet (#11593)

* Fix example code for spacy-wordnet

It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Cleanup

* More cleanup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-10-11 17:45:05 +03:00
+								                "# Load a spaCy model (supported languages are \"es\" and \"en\") ",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "# spaCy 3.x",
 								                "nlp.add_pipe(\"spacy_wordnet\", after='tagger')",
 								                "# spaCy 2.x",
-												update spacy-wordnet code example (#8327)

* update spacy-wordnet code example

- include spaCy 2.x and 3.x init alternatives
- upgrade recognai logo

* fix escape chars
											
										
										
											2021-06-10 22:53:11 +03:00
+								                "# nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')",
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								                "token = nlp('prices')[0]",
 								                "",
-												Fix example code for spacy-wordnet (#11593)

* Fix example code for spacy-wordnet

It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Cleanup

* More cleanup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-10-11 17:45:05 +03:00
+								                "# WordNet object links spaCy token with NLTK WordNet interface by giving access to",
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								                "# synsets and lemmas ",
 								                "token._.wordnet.synsets()",
 								                "token._.wordnet.lemmas()",
 								                "",
-												Fix example code for spacy-wordnet (#11593)

* Fix example code for spacy-wordnet

It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Cleanup

* More cleanup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-10-11 17:45:05 +03:00
+								                "# And automatically add info about WordNet domains",
-												Include universe spec for spacy-wordnet component (#2919)

* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement

											
										
										
											2018-11-14 01:54:46 +03:00
+								                "token._.wordnet.wordnet_domains()"
 								            ],
 								            "author": "recognai",
 								            "author_links": {
 								                "github": "recognai",
 								                "twitter": "recogn_ai",
 								                "website": "https://recogn.ai"
 								            },
 								            "category": ["pipeline"]
-												Update universe [ci skip]

											
										
										
											2018-11-26 16:16:22 +03:00
+								        },
 								        {
-												Updated spacy_conll information (#3158)


											
										
										
											2019-01-16 15:46:16 +03:00
+								            "id": "spacy-conll",
 								            "title": "spacy_conll",
-												Update to spacy_conll in universe (#10617)

* update to spacy_conll

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-04-04 18:57:52 +03:00
+								            "slogan": "Parsing from and to CoNLL-U format with `spacy`, `spacy-stanza` and `spacy-udpipe`",
 								            "description": "This module allows you to parse text into CoNLL-U format or read ConLL-U into a spaCy `Doc`. You can use it as a command line tool, or embed it in your own scripts by adding it as a custom pipeline component to a `spacy`, `spacy-stanza` or `spacy-udpipe` pipeline. It also provides an easy-to-use function to quickly initialize any spaCy-wrapped parser. CoNLL-related properties are added to `Doc` elements, `Span` sentences, and `Token` objects.",
-												Update universe [ci skip]

											
										
										
											2018-11-26 16:16:22 +03:00
+								            "code_example": [
-												Update universe details spacy_conll (#5871)


											
										
										
											2020-08-05 15:34:12 +03:00
+								                "from spacy_conll import init_parser",
-												Updated spacy_conll information (#3158)


											
										
										
											2019-01-16 15:46:16 +03:00
+								                "",
-												Update universe details spacy_conll (#5871)


											
										
										
											2020-08-05 15:34:12 +03:00
+								                "",
 								                "# Initialise English parser, already including the ConllFormatter as a pipeline component.",
 								                "# Indicate that we want to get the CoNLL headers in the string output.",
-												Update to spacy_conll in universe (#10617)

* update to spacy_conll

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-04-04 18:57:52 +03:00
+								                "# `use_gpu` and `verbose` are specific to stanza. These keywords arguments are passed onto their Pipeline() initialisation",
-												Fix mixed-up parameters for spacy-conll (#10516)


											
										
										
											2022-03-18 10:56:21 +03:00
+								                "nlp = init_parser(\"en\",",
 								                "                  \"stanza\",",
-												Update universe details spacy_conll (#5871)


											
										
										
											2020-08-05 15:34:12 +03:00
+								                "                  parser_opts={\"use_gpu\": True, \"verbose\": False},",
 								                "                  include_headers=True)",
 								                "# Parse a given string",
 								                "doc = nlp(\"A cookie is a baked or cooked food that is typically small, flat and sweet. It usually contains flour, sugar and some type of oil or fat.\")",
 								                "",
 								                "# Get the CoNLL representation of the whole document, including headers",
 								                "conll = doc._.conll_str",
 								                "print(conll)"
-												Update universe [ci skip]

											
										
										
											2018-11-26 16:16:22 +03:00
+								            ],
-												Updated spacy_conll information (#3158)


											
										
										
											2019-01-16 15:46:16 +03:00
+								            "code_language": "python",
 								            "author": "Bram Vanroy",
-												Update universe [ci skip]

											
										
										
											2018-11-26 16:16:22 +03:00
+								            "author_links": {
-												Auto-format and fix image [ci skip]

											
										
										
											2020-02-23 15:56:50 +03:00
+								                "github": "BramVanroy",
-												Changes to spacy_conll in universe (#4914)

* Update information on spacy_conll

* Typo fix

											
										
										
											2020-01-16 03:56:39 +03:00
+								                "twitter": "BramVanroy",
-												Update universe details spacy_conll (#5871)


											
										
										
											2020-08-05 15:34:12 +03:00
+								                "website": "http://bramvanroy.be"
-												Update universe [ci skip]

											
										
										
											2018-11-26 16:16:22 +03:00
+								            },
-												Updated spacy_conll information (#3158)


											
										
										
											2019-01-16 15:46:16 +03:00
+								            "github": "BramVanroy/spacy_conll",
-												Changes to spacy_conll in universe (#4914)

* Update information on spacy_conll

* Typo fix

											
										
										
											2020-01-16 03:56:39 +03:00
+								            "category": ["standalone", "pipeline"],
-												Update to spacy_conll in universe (#10617)

* update to spacy_conll

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-04-04 18:57:52 +03:00
+								            "tags": ["linguistics", "computational linguistics", "conll", "conll-u"]
-												added spacy-langdetect to universe.json (#3266)


											
										
										
											2019-02-12 20:04:38 +03:00
+								        },
-												Added Ludwig among the projects (#3548) [ci skip]

* Added Ludwig among the projects

* Create w4nderlust.md

* Add Uber to logo wall

											
										
										
											2019-04-07 14:01:26 +03:00
+								        {
 								            "id": "ludwig",
 								            "title": "Ludwig",
 								            "slogan": "A code-free deep learning toolbox",
 								            "description": "Ludwig makes it easy to build deep learning models for many applications, including NLP ones. It uses spaCy for tokenizing text in different languages.",
 								            "pip": "ludwig",
 								            "github": "uber/ludwig",
 								            "thumb": "https://i.imgur.com/j1sORgD.png",
 								            "url": "http://ludwig.ai",
 								            "author": "Piero Molino @ Uber AI",
 								            "author_links": {
 								                "github": "w4nderlust",
 								                "twitter": "w4nderlus7",
 								                "website": "http://w4nderlu.st"
 								            },
 								            "category": ["standalone", "research"]
-												Added project gracyql to Universe (#3570) (resolves #3568)

As discussed with Ines in https://github.com/explosion/spaCy/issues/3568 , adding a new project proposal for the community in SpaCy Universe website

GracyQL a tiny graphql wrapper aroung spacy using graphene and starlette.

## Description
Change only in universe.json file to add a new project

### Types of change
New project reference in Universe

## Checklist
- [x ] I have submitted the spaCy Contributor Agreement.
- [x ] I ran the tests, and all new and existing tests passed.
- [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-04-10 18:54:42 +03:00
+								        },
 								        {
-												Update universe.json

A bot powered by Clarifai Predict API and spaCy. Can be found in Telegram messenger at @pic2phrase_bot
											
										
										
											2020-03-21 18:39:15 +03:00
+								            "id": "pic2phrase_bot",
 								            "title": "pic2phrase_bot: Photo Description Generator",
 								            "slogan": "A bot that generates descriptions to submitted photos, in a human-like manner.",
-												Update website/meta/universe.json
											
										
										
											2020-04-29 13:51:37 +03:00
+								            "description": "pic2phrase_bot runs inside Telegram messenger and can be used to generate a phrase describing a submitted photo, employing computer vision, web scraping, and syntactic dependency analysis powered by spaCy.",
-												Update universe.json
											
										
										
											2020-04-03 19:10:03 +03:00
+								            "thumb": "https://i.imgur.com/ggVI02O.jpg",
 								            "image": "https://i.imgur.com/z1yhWQR.jpg",
-												Remove u string and auto-format [ci skip]

											
										
										
											2020-04-29 13:54:57 +03:00
+								            "url": "https://telegram.me/pic2phrase_bot",
-												Update universe.json

A bot powered by Clarifai Predict API and spaCy. Can be found in Telegram messenger at @pic2phrase_bot
											
										
										
											2020-03-21 18:39:15 +03:00
+								            "author": "Yuli Vasiliev",
 								            "author_links": {
-												Remove u string and auto-format [ci skip]

											
										
										
											2020-04-29 13:54:57 +03:00
+								                "twitter": "VasilievYuli"
-												Update universe.json

A bot powered by Clarifai Predict API and spaCy. Can be found in Telegram messenger at @pic2phrase_bot
											
										
										
											2020-03-21 18:39:15 +03:00
+								            },
-												Update website/meta/universe.json
											
										
										
											2020-04-29 13:51:44 +03:00
+								            "category": ["standalone", "conversational"]
-												Update universe.json

A bot powered by Clarifai Predict API and spaCy. Can be found in Telegram messenger at @pic2phrase_bot
											
										
										
											2020-03-21 18:39:15 +03:00
+								        },
-												Update Universe Website for pyInflect (#3641)


											
										
										
											2019-04-26 14:17:36 +03:00
+								        {
 								            "id": "pyInflect",
-												Update universe.json [ci skip]

											
										
										
											2019-08-28 14:45:06 +03:00
+								            "slogan": "A Python module for word inflections",
-												Update Universe Website for pyInflect (#3641)


											
										
										
											2019-04-26 14:17:36 +03:00
+								            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add word inflections to the system.",
 								            "github": "bjascob/pyInflect",
 								            "pip": "pyinflect",
 								            "code_example": [
 								                "import spacy",
 								                "import pyinflect",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "doc = nlp('This is an example.')",
 								                "doc[3].tag_                # NN",
 								                "doc[3]._.inflect('NNS')    # examples"
 								            ],
 								            "author": "Brad Jascob",
 								            "author_links": {
 								                "github": "bjascob"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["inflection"]
-												Update universe.json (#3653) [ci skip]

* Update universe.json

* Update universe.json

											
										
										
											2019-05-03 12:50:12 +03:00
+								        },
-												Update universe.json [ci skip]

											
										
										
											2019-08-28 14:45:06 +03:00
+								        {
 								            "id": "lemminflect",
 								            "slogan": "A Python module for English lemmatization and inflection",
 								            "description": "LemmInflect uses a dictionary approach to lemmatize English words and inflect them into forms specified by a user supplied [Universal Dependencies](https://universaldependencies.org/u/pos/) or [Penn Treebank](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) tag.  The library works with out-of-vocabulary (OOV) words by applying neural network techniques to classify word forms and choose the appropriate morphing rules. The system acts as a standalone module or as an extension to spaCy.",
 								            "github": "bjascob/LemmInflect",
 								            "pip": "lemminflect",
 								            "thumb": "https://raw.githubusercontent.com/bjascob/LemmInflect/master/docs/img/icons8-citrus-80.png",
 								            "code_example": [
 								                "import spacy",
 								                "import lemminflect",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "doc = nlp('I am testing this example.')",
 								                "doc[2]._.lemma()         # 'test'",
 								                "doc[4]._.inflect('NNS')  # 'examples'"
 								            ],
 								            "author": "Brad Jascob",
 								            "author_links": {
 								                "github": "bjascob"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["inflection", "lemmatizer"]
 								        },
-												Updates spaCy Universe for amrlib (#6020)

* Updates spaCy Universe for amrlib

* Updates to doc based on feedback
											
										
										
											2020-09-04 11:03:35 +03:00
+								        {
 								            "id": "amrlib",
 								            "slogan": "A python library that makes AMR parsing, generation and visualization simple.",
 								            "description": "amrlib is a python module and spaCy add-in for Abstract Meaning Representation (AMR).  The system can parse sentences to AMR graphs or generate text from existing graphs.  It includes a GUI for visualization and experimentation.",
 								            "github": "bjascob/amrlib",
 								            "pip": "amrlib",
 								            "code_example": [
 								                "import spacy",
 								                "import amrlib",
 								                "amrlib.setup_spacy_extension()",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "doc = nlp('This is a test of the spaCy extension. The test has multiple sentences.')",
 								                "graphs = doc._.to_amr()",
 								                "for graph in graphs:",
 								                "    print(graph)"
 								            ],
 								            "author": "Brad Jascob",
 								            "author_links": {
 								                "github": "bjascob"
 								            },
 								            "category": ["pipeline"]
 								        },
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								        {
 								            "id": "classyclassification",
-												Updated explenation for for classy classification (#10484)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-15 18:42:33 +03:00
+								            "title": "Classy Classification",
 								            "slogan": "Have you ever struggled with needing a spaCy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go!",
 								            "description": "Have you ever struggled with needing a [spaCy TextCategorizer](https://spacy.io/api/textcategorizer) but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using [sentence-transformers](https://github.com/UKPLab/sentence-transformers) or [spaCy models](https://spacy.io/usage/models), provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with [Huggingface zero-shot classifiers](https://huggingface.co/models?pipeline_tag=zero-shot-classification).",
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								            "github": "davidberenstein1957/classy-classification",
 								            "pip": "classy-classification",
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/classy-classification/master/logo.png",
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								            "code_example": [
 								                "import spacy",
 								                "",
 								                "data = {",
 								                "    \"furniture\": [\"This text is about chairs.\",",
 								                "               \"Couches, benches and televisions.\",",
 								                "               \"I really need to get a new sofa.\"],",
 								                "    \"kitchen\": [\"There also exist things like fridges.\",",
 								                "                \"I hope to be getting a new stove today.\",",
 								                "                \"Do you also have some ovens.\"]",
 								                "}",
 								                "",
-												Updated explenation for for classy classification (#10484)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-15 18:42:33 +03:00
+								                "# see github repo for examples on sentence-transformers and Huggingface",
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								                "nlp = spacy.load('en_core_web_md')",
-												updated `add_pipe` docs (#12947)


											
										
										
											2023-09-01 12:05:36 +03:00
+								                "nlp.add_pipe(\"classy_classification\", ",
-												Updated explenation for for classy classification (#10484)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-15 18:42:33 +03:00
+								                "    config={",
 								                "        \"data\": data,",
 								                "        \"model\": \"spacy\"",
 								                "    }",
 								                ")",
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								                "",
 								                "print(nlp(\"I am looking for kitchen appliances.\")._.cats)",
-												Updated explenation for for classy classification (#10484)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-15 18:42:33 +03:00
+								                "# Output:",
 								                "#",
 								                "# [{\"label\": \"furniture\", \"score\": 0.21}, {\"label\": \"kitchen\", \"score\": 0.79}]"
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								            ],
 								            "author": "David Berenstein",
 								            "author_links": {
 								                "github": "davidberenstein1957",
 								                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
 								            },
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "category": ["pipeline", "standalone"],
-												Updated explenation for for classy classification (#10484)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-15 18:42:33 +03:00
+								            "tags": [
 								                "classification",
 								                "zero-shot",
 								                "few-shot",
 								                "sentence-transformers",
 								                "huggingface"
 								            ],
-												added classy-classification package to spacy universe (#10393)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2022-03-07 14:47:26 +03:00
+								            "spacy_version": 3
 								        },
-												added Concise Concepts to spaCy universe (#10499)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-24 20:00:12 +03:00
+								        {
 								            "id": "conciseconcepts",
 								            "title": "Concise Concepts",
 								            "slogan": "Concise Concepts uses few-shot NER based on word embedding similarity to get you going with easy!",
 								            "description": "When wanting to apply NER to concise concepts, it is really easy to come up with examples, but it takes some effort to train an entire pipeline. Concise Concepts uses few-shot NER based on word embedding similarity to get you going with easy!",
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								            "github": "davidberenstein1957/concise-concepts",
-												added Concise Concepts to spaCy universe (#10499)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-24 20:00:12 +03:00
+								            "pip": "concise-concepts",
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/concise-concepts/master/img/logo.png",
 								            "image": "https://raw.githubusercontent.com/davidberenstein1957/concise-concepts/master/img/example.png",
-												added Concise Concepts to spaCy universe (#10499)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-24 20:00:12 +03:00
+								            "code_example": [
 								                "import spacy",
 								                "from spacy import displacy",
 								                "",
 								                "data = {",
 								                "    \"fruit\": [\"apple\", \"pear\", \"orange\"],",
 								                "    \"vegetable\": [\"broccoli\", \"spinach\", \"tomato\"],",
 								                "    \"meat\": [\"beef\", \"pork\", \"fish\", \"lamb\"]",
 								                "}",
 								                "",
 								                "text = \"\"\"",
 								                "    Heat the oil in a large pan and add the Onion, celery and carrots.",
 								                "    Then, cook over a medium–low heat for 10 minutes, or until softened.",
 								                "    Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes.",
 								                "    Later, add some oranges and chickens.\"\"\"",
 								                "",
-												chore: add 'concepCy' to spacy universe (#11255)

* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
											
										
										
											2022-08-04 09:42:38 +03:00
+								                "# use any model that has internal spacy embeddings",
-												added Concise Concepts to spaCy universe (#10499)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-24 20:00:12 +03:00
+								                "nlp = spacy.load('en_core_web_lg')",
 								                "nlp.add_pipe(\"concise_concepts\", ",
 								                "    config={\"data\": data}",
 								                ")",
 								                "doc = nlp(text)",
 								                "",
 								                "options = {\"colors\": {\"fruit\": \"darkorange\", \"vegetable\": \"limegreen\", \"meat\": \"salmon\"},",
 								                "           \"ents\": [\"fruit\", \"vegetable\", \"meat\"]}",
 								                "",
 								                "displacy.render(doc, style=\"ent\", options=options)"
 								            ],
 								            "author": "David Berenstein",
 								            "author_links": {
 								                "github": "davidberenstein1957",
 								                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
 								            },
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "category": ["pipeline"],
 								            "tags": ["ner", "few-shot", "gensim"],
-												added Concise Concepts to spaCy universe (#10499)

* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-03-24 20:00:12 +03:00
+								            "spacy_version": 3
 								        },
-												added crosslingual coreference to spacy universe without additional commits (#10580)

* added crosslingual coreference to spacy universe

* Updated example to introduce batching example.

Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
											
										
										
											2022-04-08 09:23:58 +03:00
+								        {
 								            "id": "crosslingualcoreference",
 								            "title": "Crosslingual Coreference",
 								            "slogan": "One multi-lingual coreference model to rule them all!",
 								            "description": "Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non-English languages also data proved to be poorly annotated. Crosslingual Coreference therefore uses the assumption a trained model with English data and cross-lingual embeddings should work for other languages with similar sentence structure. Verified to work quite well for at least (EN, NL, DK, FR, DE).",
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								            "github": "davidberenstein1957/crosslingual-coreference",
-												added crosslingual coreference to spacy universe without additional commits (#10580)

* added crosslingual coreference to spacy universe

* Updated example to introduce batching example.

Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
											
										
										
											2022-04-08 09:23:58 +03:00
+								            "pip": "crosslingual-coreference",
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/crosslingual-coreference/master/img/logo.png",
 								            "image": "https://raw.githubusercontent.com/davidberenstein1957/crosslingual-coreference/master/img/example_total.png",
-												added crosslingual coreference to spacy universe without additional commits (#10580)

* added crosslingual coreference to spacy universe

* Updated example to introduce batching example.

Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
											
										
										
											2022-04-08 09:23:58 +03:00
+								            "code_example": [
 								                "import spacy",
 								                "",
 								                "text = \"\"\"",
 								                "    Do not forget about Momofuku Ando!",
 								                "    He created instant noodles in Osaka.",
 								                "    At that location, Nissin was founded.",
 								                "    Many students survived by eating these noodles, but they don't even know him.\"\"\"",
 								                "",
-												chore: add 'concepCy' to spacy universe (#11255)

* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
											
										
										
											2022-08-04 09:42:38 +03:00
+								                "# use any model that has internal spacy embeddings",
-												added crosslingual coreference to spacy universe without additional commits (#10580)

* added crosslingual coreference to spacy universe

* Updated example to introduce batching example.

Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
											
										
										
											2022-04-08 09:23:58 +03:00
+								                "nlp = spacy.load('en_core_web_sm')",
 								                "nlp.add_pipe(",
 								                "    \"xx_coref\", config={\"chunk_size\": 2500, \"chunk_overlap\": 2, \"device\": 0})",
 								                ")",
 								                "",
 								                "doc = nlp(text)",
 								                "",
 								                "print(doc._.coref_clusters)",
 								                "# Output",
 								                "#",
 								                "# [[[4, 5], [7, 7], [27, 27], [36, 36]],",
 								                "# [[12, 12], [15, 16]],",
 								                "# [[9, 10], [27, 28]],",
 								                "# [[22, 23], [31, 31]]]",
 								                "print(doc._.resolved_text)",
 								                "# Output",
 								                "#",
 								                "# Do not forget about Momofuku Ando!",
 								                "# Momofuku Ando created instant noodles in Osaka.",
 								                "# At Osaka, Nissin was founded.",
 								                "# Many students survived by eating instant noodles,",
 								                "# but Many students don't even know Momofuku Ando."
 								            ],
 								            "author": "David Berenstein",
 								            "author_links": {
 								                "github": "davidberenstein1957",
 								                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
 								            },
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "category": ["pipeline", "standalone"],
 								            "tags": ["coreference", "multi-lingual", "cross-lingual", "allennlp"],
-												added crosslingual coreference to spacy universe without additional commits (#10580)

* added crosslingual coreference to spacy universe

* Updated example to introduce batching example.

Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
											
										
										
											2022-04-08 09:23:58 +03:00
+								            "spacy_version": 3
 								        },
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								        {
 								            "id": "adeptaugmentations",
 								            "title": "Adept Augmentations",
 								            "slogan": " A Python library aimed at dissecting and augmenting NER training data for a few-shot scenario.",
 								            "description": "EntitySwapAugmenter takes either a `datasets.Dataset` or a `spacy.tokens.DocBin`. Additionally, it is optional to provide a set of labels. It initially creates a knowledge base of entities belonging to a certain label. When running `augmenter.augment()` for N runs, it then creates N new sentences with random swaps of the original entities with an entity of the same corresponding label from the knowledge base.\n\nFor example, assuming that we have knowledge base for `PERSONS`, `LOCATIONS` and `PRODUCTS`. We can then create additional data for the sentence \"Momofuko Ando created instant noodles in Osaka.\" using `augmenter.augment(N=2)`, resulting in \"David created instant noodles in Madrid.\" or \"Tom created Adept Augmentations in the Netherlands\".",
-												universe: Update examples Adept Augementation (#12620)

* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
											
										
										
											2023-05-15 15:09:33 +03:00
+								            "github": "argilla-io/adept-augmentations",
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								            "pip": "adept-augmentations",
-												universe: Update examples Adept Augementation (#12620)

* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
											
										
										
											2023-05-15 15:09:33 +03:00
+								            "thumb": "https://raw.githubusercontent.com/argilla-io/adept-augmentations/main/logo.png",
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								            "code_example": [
 								                "from adept_augmentations import EntitySwapAugmenter",
-												universe: Update examples Adept Augementation (#12620)

* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
											
										
										
											2023-05-15 15:09:33 +03:00
+								                "import spacy",
 								                "from spacy.tokens import Doc, DocBin",
 								                "nlp = spacy.blank(\"en\")",
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								                "",
-												universe: Update examples Adept Augementation (#12620)

* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
											
										
										
											2023-05-15 15:09:33 +03:00
+								                "# Create some example golden data",
 								                "example_data = [",
 								                "    (\"Apple is looking at buying U.K. startup for $1 billion\", [(0, 5, \"ORG\"), (27, 31, \"LOC\"), (44, 54, \"MONEY\")]),",
 								                "    (\"Microsoft acquires GitHub for $7.5 billion\", [(0, 9, \"ORG\"), (19, 25, \"ORG\"), (30, 42, \"MONEY\")]),",
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								                "]",
 								                "",
 								                "# Create a new DocBin",
-												universe: Update examples Adept Augementation (#12620)

* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
											
										
										
											2023-05-15 15:09:33 +03:00
+								                "nlp = spacy.blank(\"en\")",
 								                "docs = []",
 								                "for entry in example_data:",
 								                "    doc = Doc(nlp.vocab, words=entry[0].split())",
 								                "    doc.ents = [doc.char_span(ent[0], ent[1], label=ent[2]) for ent in entry[1]]",
 								                "    docs.append(doc)",
 								                "golden_dataset = DocBin(docs=docs)",
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								                "",
 								                "# Augment Data",
-												universe: Update examples Adept Augementation (#12620)

* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
											
										
										
											2023-05-15 15:09:33 +03:00
+								                "augmented_dataset = EntitySwapAugmenter(golden_dataset).augment(4)",
 								                "for doc in augmented_dataset.get_docs(nlp.vocab):",
-												chore: added adept-augmentations to the spacy universe (#12609)

* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-05-10 14:16:16 +03:00
+								                "    print(doc.text)",
 								                "",
 								                "# GitHub is looking at buying U.K. startup for $ 7.5 billion",
 								                "# Microsoft is looking at buying U.K. startup for $ 1 billion",
 								                "# Microsoft is looking at buying U.K. startup for $ 7.5 billion",
 								                "# GitHub is looking at buying U.K. startup for $ 1 billion",
 								                "# Microsoft acquires Apple for $ 7.5 billion",
 								                "# Apple acquires Microsoft for $ 1 billion",
 								                "# Microsoft acquires Microsoft for $ 7.5 billion",
 								                "# GitHub acquires GitHub for $ 1 billion"
 								            ],
 								            "author": "David Berenstein",
 								            "author_links": {
 								                "github": "davidberenstein1957",
 								                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
 								            },
 								            "category": ["standalone"],
 								            "tags": ["ner", "few-shot", "augmentation", "datasets", "training"],
 								            "spacy_version": 3
 								        },
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								        {
 								            "id": "spacysetfit",
 								            "title": "spaCy-SetFit",
 								            "slogan": "An an easy and intuitive approach to use SetFit in combination with spaCy.",
 								            "description": "spaCy-SetFit is a Python library that extends spaCy's text categorization capabilities by incorporating SetFit for few-shot classification. It allows you to train a text categorizer using a intuitive dictionary. \n\nThe library integrates with spaCy's pipeline architecture, enabling easy integration and configuration of the text categorizer component. You can provide a training dataset containing inlier and outlier examples, and spaCy-SetFit will use the paraphrase-MiniLM-L3-v2 model for training the text categorizer with SetFit. Once trained, you can use the categorizer to classify new text and obtain category probabilities.",
 								            "github": "davidberenstein1957/spacy-setfit",
 								            "pip": "spacy-setfit",
 								            "thumb": "https://raw.githubusercontent.com/davidberenstein1957/spacy-setfit/main/logo.png",
 								            "code_example": [
 								            "import spacy",
 								            "",
 								            "# Create some example data",
 								            "train_dataset = {",
 								            "    \"inlier\": [",
 								            "        \"Text about furniture\",",
 								            "        \"Couches, benches and televisions.\",",
 								            "        \"I really need to get a new sofa.\"",
 								            "    ],",
 								            "    \"outlier\": [",
 								            "        \"Text about kitchen equipment\",",
 								            "        \"This text is about politics\",",
 								            "        \"Comments about AI and stuff.\"",
 								            "    ]",
 								            "}",
 								            "",
 								            "# Load the spaCy language model:",
 								            "nlp = spacy.load(\"en_core_web_sm\")",
 								            "",
-												updated `add_pipe` docs (#12947)


											
										
										
											2023-09-01 12:05:36 +03:00
+								            "# Add the \"spacy_setfit\" pipeline component to the spaCy model, and configure it with SetFit parameters:",
 								            "nlp.add_pipe(\"spacy_setfit\", config={",
-												docs: added reference to `spacy-setfit` to the spaCy Universe (#12737)

* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
											
										
										
											2023-06-19 16:52:07 +03:00
+								            "    \"pretrained_model_name_or_path\": \"paraphrase-MiniLM-L3-v2\",",
 								            "    \"setfit_trainer_args\": {",
 								            "        \"train_dataset\": train_dataset",
 								            "    }",
 								            "})",
 								            "doc = nlp(\"I really need to get a new sofa.\")",
 								            "doc.cats",
 								            "# {'inlier': 0.902350975129, 'outlier': 0.097649024871}"
 								            ],
 								            "author": "David Berenstein",
 								            "author_links": {
 								                "github": "davidberenstein1957",
 								                "website": "https://www.linkedin.com/in/david-berenstein-1bab11105/"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["few-shot", "SetFit", "training"],
 								            "spacy_version": 3
 								        },
-												Add entry for Blackstone in universe.json (#4101)

* Add entry for Blackstone in universe.json

Add an entry for the Blackstone project. Checked JSON is valid.

* Create ICLRandD.md

* Fix indentation (tabs to spaces)

It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that.

* Try to fix formatting for diff

* Fix diff


Co-authored-by: Ines Montani <ines@ines.io>
											
										
										
											2019-08-09 18:16:51 +03:00
+								        {
 								            "id": "blackstone",
 								            "title": "Blackstone",
-												Update universe.json [ci skip]

											
										
										
											2019-08-09 18:42:37 +03:00
+								            "slogan": "A spaCy pipeline and model for NLP on unstructured legal text",
 								            "description": "Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the [Incorporated Council of Law Reporting for England and Wales'](https://iclr.co.uk/) research lab, [ICLR&D](https://research.iclr.co.uk/).",
-												Add entry for Blackstone in universe.json (#4101)

* Add entry for Blackstone in universe.json

Add an entry for the Blackstone project. Checked JSON is valid.

* Create ICLRandD.md

* Fix indentation (tabs to spaces)

It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that.

* Try to fix formatting for diff

* Fix diff


Co-authored-by: Ines Montani <ines@ines.io>
											
										
										
											2019-08-09 18:16:51 +03:00
+								            "github": "ICLRandD/Blackstone",
 								            "pip": "blackstone",
 								            "thumb": "https://iclr.s3-eu-west-1.amazonaws.com/assets/iclrand/Blackstone/thumb.png",
 								            "url": "https://research.iclr.co.uk",
 								            "author": " ICLR&D",
 								            "author_links": {
 								                "github": "ICLRandD",
 								                "twitter": "ICLRanD",
 								                "website": "https://research.iclr.co.uk"
 								            },
 								            "category": ["scientific", "models", "research"]
 								        },
-												Update universe.json (#3653) [ci skip]

* Update universe.json

* Update universe.json

											
										
										
											2019-05-03 12:50:12 +03:00
+								        {
 								            "id": "NGym",
-												Adjust wording and formatting [ci skip]

											
										
										
											2019-05-03 13:00:31 +03:00
+								            "title": "NeuralGym",
 								            "slogan": "A little Windows GUI for training models with spaCy",
 								            "description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.",
-												Update universe.json (#3653) [ci skip]

* Update universe.json

* Update universe.json

											
										
										
											2019-05-03 12:50:12 +03:00
+								            "github": "d5555/NeuralGym",
 								            "url": "https://github.com/d5555/NeuralGym",
 								            "image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png",
 								            "thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png",
 								            "author": "d5555",
 								            "category": ["training"],
-												Adjust wording and formatting [ci skip]

											
										
										
											2019-05-03 13:00:31 +03:00
+								            "tags": ["windows"]
-												Request to include Holmes in spaCy Universe (#3685)

* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added

											
										
										
											2019-05-08 03:42:03 +03:00
+								        },
 								        {
 								            "id": "holmes",
 								            "title": "Holmes",
 								            "slogan": "Information extraction from English and German texts based on predicate logic",
-												Update Holmes entry in universe.json

											
										
										
											2022-05-30 19:05:26 +03:00
+								            "github": "explosion/holmes-extractor",
 								            "url": "https://github.com/explosion/holmes-extractor",
-												Change demo URL (#11102)


											
										
										
											2022-07-08 20:19:48 +03:00
+								            "description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural extraction, topic matching and supervised document classification. There is a [website demonstrating intelligent search based on topic matching](https://holmes-demo.explosion.services).",
-												Request to include Holmes in spaCy Universe (#3685)

* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added

											
										
										
											2019-05-08 03:42:03 +03:00
+								            "pip": "holmes-extractor",
-												Update Holmes entry in universe.json

											
										
										
											2022-05-30 19:05:26 +03:00
+								            "category": ["pipeline", "standalone"],
-												Auto-format

											
										
										
											2019-05-11 18:48:07 +03:00
+								            "tags": ["chatbots", "text-processing"],
-												Update Holmes entry in universe.json

											
										
										
											2022-05-30 19:05:26 +03:00
+								            "thumb": "https://raw.githubusercontent.com/explosion/holmes-extractor/master/docs/holmes_thumbnail.png",
-												Request to include Holmes in spaCy Universe (#3685)

* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added

											
										
										
											2019-05-08 03:42:03 +03:00
+								            "code_example": [
 								                "import holmes_extractor as holmes",
-												Update to Holmes Universe entry (#4679)

* Updated Universe entry for Holmes

* Correction

* Updated model name

* Updated wording

											
										
										
											2019-11-21 18:23:24 +03:00
+								                "holmes_manager = holmes.Manager(model='en_core_web_lg')",
-												Request to include Holmes in spaCy Universe (#3685)

* Request to add Holmes to spaCy Universe

Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.

* Added

											
										
										
											2019-05-08 03:42:03 +03:00
+								                "holmes_manager.register_search_phrase('A big dog chases a cat')",
 								                "holmes_manager.start_chatbot_mode_console()"
 								            ],
 								            "author": "Richard Paul Hudson",
 								            "author_links": {
 								                "github": "richardpaulhudson"
 								            }
-												Update universe.json [ci skip]

											
										
										
											2019-08-05 15:30:07 +03:00
+								        },
-												Added universe entry for Coreferee

											
										
										
											2021-04-19 15:28:06 +03:00
+								        {
 								            "id": "coreferee",
 								            "title": "Coreferee",
 								            "slogan": "Coreference resolution for multiple languages",
-												Updated Coreferee Universe entry (#10763)


											
										
										
											2022-05-06 14:21:39 +03:00
+								            "github": "explosion/coreferee",
 								            "url": "https://github.com/explosion/coreferee",
 								            "description": "Coreferee is a pipeline plugin that performs coreference resolution for English, French, German and Polish. It is designed so that it is easy to add support for new languages and optimised for limited training data. It uses a mixture of neural networks and programmed rules. Please note you will need to [install models](https://github.com/explosion/coreferee#getting-started) before running the code example.",
-												Added universe entry for Coreferee

											
										
										
											2021-04-19 15:28:06 +03:00
+								            "pip": "coreferee",
 								            "category": ["pipeline", "models", "standalone"],
 								            "tags": ["coreference-resolution", "anaphora"],
 								            "code_example": [
 								                "import coreferee, spacy",
 								                "nlp = spacy.load('en_core_web_trf')",
 								                "nlp.add_pipe('coreferee')",
 								                "doc = nlp('Although he was very busy with his work, Peter had had enough of it. He and his wife decided they needed a holiday. They travelled to Spain because they loved the country very much.')",
 								                "doc._.coref_chains.print()",
 								                "# Output:",
 								                "#",
 								                "# 0: he(1), his(6), Peter(9), He(16), his(18)",
 								                "# 1: work(7), it(14)",
 								                "# 2: [He(16); wife(19)], they(21), They(26), they(31)",
 								                "# 3: Spain(29), country(34)",
 								                "#",
 								                "print(doc._.coref_chains.resolve(doc[31]))",
 								                "# Output:",
 								                "#",
 								                "# [Peter, wife]"
 								            ],
 								            "author": "Richard Paul Hudson",
 								            "author_links": {
 								                "github": "richardpaulhudson"
 								            }
 								        },
-												Update universe.json [ci skip]

											
										
										
											2019-08-05 15:30:07 +03:00
+								        {
-												Update transformer model details [ci skip]

											
										
										
											2019-10-08 16:39:38 +03:00
+								            "id": "spacy-transformers",
 								            "title": "spacy-transformers",
-												Use consistent spelling

											
										
										
											2019-10-02 11:37:39 +03:00
+								            "slogan": "spaCy pipelines for pretrained BERT, XLNet and GPT-2",
-												Update transformer model details [ci skip]

											
										
										
											2019-10-08 16:39:38 +03:00
+								            "description": "This package provides spaCy model pipelines that wrap [Hugging Face's `transformers`](https://github.com/huggingface/transformers) package, so you can use them in spaCy. The result is convenient access to state-of-the-art transformer architectures, such as BERT, GPT-2, XLNet, etc.",
 								            "github": "explosion/spacy-transformers",
 								            "url": "https://explosion.ai/blog/spacy-transformers",
 								            "pip": "spacy-transformers",
-												Update universe.json [ci skip]

											
										
										
											2019-08-05 15:30:07 +03:00
+								            "category": ["pipeline", "models", "research"],
 								            "code_example": [
 								                "import spacy",
 								                "",
-												Update docs [ci skip]

											
										
										
											2020-09-26 14:18:08 +03:00
+								                "nlp = spacy.load(\"en_core_web_trf\")",
 								                "doc = nlp(\"Apple shares rose on the news. Apple pie is delicious.\")"
-												Update universe.json [ci skip]

											
										
										
											2019-08-05 15:30:07 +03:00
+								            ],
 								            "author": "Explosion",
 								            "author_links": {
 								                "twitter": "explosion_ai",
 								                "github": "explosion",
 								                "website": "https://explosion.ai"
 								            }
-												Update universe.json to include negspacy (#4132)


											
										
										
											2019-08-16 18:48:17 +03:00
+								        },
-												Add projects to spaCy Universe (#9269)

* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-09-23 11:56:45 +03:00
+								        {
 								            "id": "spacy-huggingface-hub",
 								            "title": "spacy-huggingface-hub",
 								            "slogan": "Push your spaCy pipelines to the Hugging Face Hub",
 								            "description": "This package provides a CLI command for uploading any trained spaCy pipeline packaged with [`spacy package`](https://spacy.io/api/cli#package) to the [Hugging Face Hub](https://huggingface.co). It auto-generates all meta information for you, uploads a pretty README (requires spaCy v3.1+) and handles version control under the hood.",
 								            "github": "explosion/spacy-huggingface-hub",
 								            "thumb": "https://i.imgur.com/j6FO9O6.jpg",
 								            "url": "https://github.com/explosion/spacy-huggingface-hub",
 								            "pip": "spacy-huggingface-hub",
 								            "category": ["pipeline", "models"],
 								            "author": "Explosion",
 								            "author_links": {
 								                "twitter": "explosion_ai",
 								                "github": "explosion",
 								                "website": "https://explosion.ai"
 								            }
 								        },
 								        {
 								            "id": "spacy-clausie",
 								            "title": "spacy-clausie",
 								            "slogan": "Implementation of the ClausIE information extraction system for Python+spaCy",
 								            "github": "mmxgn/spacy-clausie",
 								            "url": "https://github.com/mmxgn/spacy-clausie",
 								            "description": "ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text",
 								            "category": ["pipeline", "scientific", "research"],
 								            "code_example": [
 								                "import spacy",
 								                "import claucy",
 								                "",
 								                "nlp = spacy.load(\"en\")",
 								                "claucy.add_to_pipe(nlp)",
 								                "",
 								                "doc = nlp(\"AE died in Princeton in 1955.\")",
 								                "",
 								                "print(doc._.clauses)",
-												chore: add 'concepCy' to spacy universe (#11255)

* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
											
										
										
											2022-08-04 09:42:38 +03:00
+								                "# Output:",
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								                "# &lt;SV, AE, died, None, None, None, [in Princeton, in 1955]&gt;",
-												Add projects to spaCy Universe (#9269)

* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-09-23 11:56:45 +03:00
+								                "",
 								                "propositions = doc._.clauses[0].to_propositions(as_text=True)",
 								                "",
 								                "print(propositions)",
 								                "# Output:",
 								                "# [AE died in Princeton in 1955, AE died in 1955, AE died in Princeton"
 								            ],
 								            "author": "Emmanouil Theofanis Chourdakis",
 								            "author_links": {
 								                "github": "mmxgn"
 								            }
 								        },
 								        {
 								            "id": "ipymarkup",
 								            "slogan": "NER, syntax markup visualizations",
 								            "description": "Collection of NLP visualizations for NER and syntax tree markup. Similar to [displaCy](https://explosion.ai/demos/displacy) and [displaCy ENT](https://explosion.ai/demos/displacy-ent).",
 								            "github": "natasha/ipymarkup",
 								            "image": "https://github.com/natasha/ipymarkup/blob/master/table.png?raw=true",
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "pip": "pip install ipymarkup",
-												Add projects to spaCy Universe (#9269)

* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-09-23 11:56:45 +03:00
+								            "code_example": [
 								                "from ipymarkup import show_span_ascii_markup, show_dep_ascii_markup",
 								                "",
 								                "text = 'В мероприятии примут участие не только российские учёные, но и зарубежные исследователи, в том числе, Крис Хелмбрехт - управляющий директор и совладелец креативного агентства Kollektiv (Германия, США), Ннека Угбома - руководитель проекта Mushroom works (Великобритания), Гергей Ковач - политик и лидер субкультурной партии «Dog with two tails» (Венгрия), Георг Жено - немецкий режиссёр, один из создателей экспериментального театра «Театр.doc», Театра им. Йозефа Бойса (Германия).'",
 								                "spans = [(102, 116, 'PER'), (186, 194, 'LOC'), (196, 199, 'LOC'), (202, 214, 'PER'), (254, 268, 'LOC'), (271, 283, 'PER'), (324, 342, 'ORG'), (345, 352, 'LOC'), (355, 365, 'PER'), (445, 455, 'ORG'), (456, 468, 'PER'), (470, 478, 'LOC')]",
 								                "show_span_ascii_markup(text, spans)"
 								            ],
 								            "author": "Alexander Kukushkin",
 								            "author_links": {
 								                "github": "kuk"
 								            },
 								            "category": ["visualizers"]
 								        },
-												Added RONEC to spaCy Universe (#4151)

* Added RONEC to spaCy Universe

* Added contributor file

* Corrected date from .github/contributors/avramandrei.md

* Convert tabs to spaces

* Remove duplicate keys

Can only have one GitHub link unfortunately

* Also add models category

* Adjust ID

This is used to generate the URL, so a simpler string is better

											
										
										
											2019-08-20 15:46:07 +03:00
+								        {
-												Update universe.json to include negspacy (#4132)


											
										
										
											2019-08-16 18:48:17 +03:00
+								            "id": "negspacy",
 								            "title": "negspaCy",
 								            "slogan": "spaCy pipeline object for negating concepts in text based on the NegEx algorithm.",
 								            "github": "jenojp/negspacy",
 								            "url": "https://github.com/jenojp/negspacy",
 								            "description": "negspacy is a spaCy pipeline component that evaluates whether Named Entities are negated in text. It adds an extension to 'Span' objects.",
 								            "pip": "negspacy",
 								            "category": ["pipeline", "scientific"],
 								            "tags": ["negation", "text-processing"],
-												Auto-format [ci skip]

											
										
										
											2019-08-20 15:46:41 +03:00
+								            "thumb": "https://github.com/jenojp/negspacy/blob/master/docs/thumb.png?raw=true",
 								            "image": "https://github.com/jenojp/negspacy/blob/master/docs/icon.png?raw=true",
-												Update universe.json to include negspacy (#4132)


											
										
										
											2019-08-16 18:48:17 +03:00
+								            "code_example": [
 								                "import spacy",
 								                "from negspacy.negation import Negex",
 								                "",
 								                "nlp = spacy.load(\"en_core_web_sm\")",
-												Update negspacy example code for spaCy 3.0 (#8022)


											
										
										
											2021-05-07 10:33:21 +03:00
+								                "nlp.add_pipe(\"negex\", config={\"ent_types\":[\"PERSON\",\"ORG\"]})",
-												Update universe.json to include negspacy (#4132)


											
										
										
											2019-08-16 18:48:17 +03:00
+								                "",
 								                "doc = nlp(\"She does not like Steve Jobs but likes Apple products.\")",
 								                "for e in doc.ents:",
 								                "    print(e.text, e._.negex)"
 								            ],
 								            "author": "Jeno Pizarro",
 								            "author_links": {
 								                "github": "jenojp",
 								                "twitter": "jenojp"
 								            }
-												Added RONEC to spaCy Universe (#4151)

* Added RONEC to spaCy Universe

* Added contributor file

* Corrected date from .github/contributors/avramandrei.md

* Convert tabs to spaces

* Remove duplicate keys

Can only have one GitHub link unfortunately

* Also add models category

* Adjust ID

This is used to generate the URL, so a simpler string is better

											
										
										
											2019-08-20 15:46:07 +03:00
+								        },
 								        {
 								            "id": "ronec",
 								            "title": "RONEC - Romanian Named Entity Corpus",
 								            "slogan": "Named Entity Recognition corpus for Romanian language.",
 								            "github": "dumitrescustefan/ronec",
 								            "url": "https://github.com/dumitrescustefan/ronec",
 								            "description": "The corpus holds 5127 sentences, annotated with 16 classes, with a total of 26376 annotated entities. The corpus comes into two formats: BRAT and CONLLUP.",
 								            "category": ["standalone", "models"],
 								            "tags": ["ner", "romanian"],
 								            "thumb": "https://raw.githubusercontent.com/dumitrescustefan/ronec/master/res/thumb.png",
 								            "code_example": [
 								                "# to train a new model on ronec",
 								                "python3 convert_spacy.py ronec/conllup/ronec.conllup output",
 								                "python3 -m spacy train ro models output/train_ronec.json output/train_ronec.json -p ent",
 								                "",
 								                "# download the Romanian NER model",
 								                "python -m spacy download ro_ner",
 								                "",
 								                "# load the model and print entities for a simple sentence",
 								                "import spacy",
 								                "",
 								                "nlp = spacy.load(\"ro_ner\")",
 								                "doc = nlp(\"Popescu Ion a fost la Cluj\")",
 								                "",
 								                "for ent in doc.ents:",
 								                "\tprint(ent.text, ent.start_char, ent.end_char, ent.label_)"
 								            ],
 								            "author": "Stefan Daniel Dumitrescu, Andrei-Marius Avram"
-												new universe project - the numeric fused-head (#4192)

* new universe project

* Update website/meta/universe.json

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/meta/universe.json

Co-Authored-By: Ines Montani <ines@ines.io>

											
										
										
											2019-08-25 18:25:28 +03:00
+								        },
-												Add healthsea to universe (#9838)

* Add healthsea to universe

* Update website/meta/universe.json

* Add thumbnail

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2021-12-15 19:54:44 +03:00
+								        {
 								            "id": "Healthsea",
 								            "title": "Healthsea",
 								            "slogan": "Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects",
 								            "description": "This spaCy project trains an NER model and a custom Text Classification model with Clause Segmentation and Blinding capabilities to analyze supplement reviews and their potential effects on health.",
 								            "github": "explosion/healthsea",
 								            "thumb": "https://github.com/explosion/healthsea/blob/main/img/Jellyfish.png",
 								            "category": ["pipeline", "research"],
 								            "code_example": [
 								                "import spacy",
 								                "",
 								                "nlp = spacy.load(\"en_healthsea\")",
 								                "doc = nlp(\"This is great for joint pain.\")",
 								                "",
 								                "# Clause Segmentation & Blinding",
 								                "print(doc._.clauses)",
 								                "",
 								                ">     {",
 								                ">    \"split_indices\": [0, 7],",
 								                ">    \"has_ent\": true,",
 								                ">    \"ent_indices\": [4, 6],",
 								                ">    \"blinder\": \"_CONDITION_\",",
 								                ">    \"ent_name\": \"joint pain\",",
 								                ">    \"cats\": {",
 								                ">        \"POSITIVE\": 0.9824668169021606,",
 								                ">        \"NEUTRAL\": 0.017364952713251114,",
 								                ">        \"NEGATIVE\": 0.00002889777533710003,",
 								                ">        \"ANAMNESIS\": 0.0001394189748680219",
 								                ">    \"prediction_text\": [\"This\", \"is\", \"great\", \"for\", \"_CONDITION_\", \"!\"]",
 								                ">    }",
 								                "",
 								                "# Aggregated results",
 								                ">    {",
 								                ">    \"joint_pain\": {",
 								                ">        \"effects\": [\"POSITIVE\"],",
 								                ">        \"effect\": \"POSITIVE\",",
 								                ">        \"label\": \"CONDITION\",",
 								                ">        \"text\": \"joint pain\"",
 								                ">       }",
 								                ">    }"
 								            ],
 								            "author": "Edward Schmuhl",
 								            "author_links": {
 								                "github": "thomashacker",
 								                "twitter": "aestheticedwar1",
 								                "website": "https://explosion.ai/"
 								            }
-												Update universe.json [ci skip]

											
										
										
											2019-08-28 12:59:06 +03:00
+								        },
 								        {
 								            "id": "presidio",
 								            "title": "Presidio",
 								            "slogan": "Context aware, pluggable and customizable data protection and PII data anonymization",
 								            "description": "Presidio *(Origin from Latin praesidium ‘protection, garrison’)* helps to ensure sensitive text is properly managed and governed. It provides fast ***analytics*** and ***anonymization*** for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context.",
 								            "url": "https://aka.ms/presidio",
 								            "image": "https://raw.githubusercontent.com/microsoft/presidio/master/docs/assets/before-after.png",
 								            "github": "microsoft/presidio",
 								            "category": ["standalone"],
 								            "thumb": "https://avatars0.githubusercontent.com/u/6154722",
-												Added presidio-research to universe.json (#4950)

* Added presidio-research to universe.json

Added a reference to Presidio Research, the data-science toolbox for Microsoft Presidio.

* Updated url

											
										
										
											2020-02-03 14:57:55 +03:00
+								            "author": "Microsoft",
 								            "author_links": {
 								                "github": "microsoft"
 								            }
 								        },
 								        {
 								            "id": "presidio-research",
 								            "title": "Presidio Research",
 								            "slogan": "Toolbox for developing and evaluating PII detectors, NER models for PII and generating fake PII data",
 								            "description": "This package features data-science related tasks for developing new recognizers for Microsoft Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models. Anyone interested in evaluating an existing Microsoft Presidio instance, a specific PII recognizer or to develop new models or logic for detecting PII could leverage the preexisting work in this package. Additionally, anyone interested in generating new data based on previous datasets (e.g. to increase the coverage of entity values) for Named Entity Recognition models could leverage the data generator contained in this package.",
 								            "url": "https://aka.ms/presidio-research",
 								            "github": "microsoft/presidio-research",
 								            "category": ["standalone"],
 								            "thumb": "https://avatars0.githubusercontent.com/u/6154722",
-												Update universe.json [ci skip]

											
										
										
											2019-08-28 12:59:06 +03:00
+								            "author": "Microsoft",
 								            "author_links": {
 								                "github": "microsoft"
 								            }
-												Add cookiecutter-spacy-fastapi to spacy universe (#4498)


											
										
										
											2019-10-22 15:50:40 +03:00
+								        },
-												✨  project: pySBD - Python Sentence Boundary Disambiguation (#4455)

* ✨  project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example

											
										
										
											2019-10-30 14:13:29 +03:00
+								        {
 								            "id": "python-sentence-boundary-disambiguation",
 								            "title": "pySBD - python Sentence Boundary Disambiguation",
-												Update universe.json [ci skip]

											
										
										
											2019-10-30 15:29:00 +03:00
+								            "slogan": "Rule-based sentence boundary detection that works out-of-the-box",
-												✨  project: pySBD - Python Sentence Boundary Disambiguation (#4455)

* ✨  project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example

											
										
										
											2019-10-30 14:13:29 +03:00
+								            "github": "nipunsadvilkar/pySBD",
-												Update universe.json [ci skip]

											
										
										
											2019-10-30 15:29:00 +03:00
+								            "description": "pySBD is 'real-world' sentence segmenter which extracts reasonable sentences when the format and domain of the input text are unknown. It is a rules-based algorithm based on [The Golden Rules](https://s3.amazonaws.com/tm-town-nlp-resources/golden_rules.txt) - a set of tests to check accuracy of segmenter in regards to edge case scenarios developed by [TM-Town](https://www.tm-town.com/) dev team. pySBD is python port of ruby gem [Pragmatic Segmenter](https://github.com/diasks2/pragmatic_segmenter).",
-												✨  project: pySBD - Python Sentence Boundary Disambiguation (#4455)

* ✨  project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example

											
										
										
											2019-10-30 14:13:29 +03:00
+								            "pip": "pysbd",
 								            "category": ["scientific"],
 								            "tags": ["sentence segmentation"],
 								            "code_example": [
-												✏️  typo in pysbd code example (#5821)


											
										
										
											2020-07-27 10:43:39 +03:00
+								                "from pysbd.utils import PySBDFactory",
-												✨  project: pySBD - Python Sentence Boundary Disambiguation (#4455)

* ✨  project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example

											
										
										
											2019-10-30 14:13:29 +03:00
+								                "",
 								                "nlp = spacy.blank('en')",
-												Github Action for spaCy universe project alert (#11090)


											
										
										
											2022-07-07 15:20:30 +03:00
+								                "# Caution: works with spaCy<=2.x.x",
-												✨  project: pySBD - Python Sentence Boundary Disambiguation (#4455)

* ✨  project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example

											
										
										
											2019-10-30 14:13:29 +03:00
+								                "nlp.add_pipe(PySBDFactory(nlp))",
 								                "",
 								                "doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.')",
 								                "print(list(doc.sents))",
 								                "# [My name is Jonas E. Smith., Please turn to p. 55.]"
 								            ],
 								            "author": "Nipun Sadvilkar",
 								            "author_links": {
 								                "twitter": "nipunsadvilkar",
 								                "github": "nipunsadvilkar",
 								                "website": "https://nipunsadvilkar.github.io"
-												Update universe.json [ci skip]

											
										
										
											2019-10-30 15:29:00 +03:00
+								            }
-												✨  project: pySBD - Python Sentence Boundary Disambiguation (#4455)

* ✨  project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example

											
										
										
											2019-10-30 14:13:29 +03:00
+								        },
-												Add cookiecutter-spacy-fastapi to spacy universe (#4498)


											
										
										
											2019-10-22 15:50:40 +03:00
+								        {
 								            "id": "cookiecutter-spacy-fastapi",
 								            "title": "cookiecutter-spacy-fastapi",
 								            "slogan": "Docker-based cookiecutter for easy spaCy APIs using FastAPI",
-												Update universe.json [ci skip]

											
										
										
											2019-10-22 15:54:47 +03:00
+								            "description": "Docker-based cookiecutter for easy spaCy APIs using FastAPI. The default endpoints expect batch requests with a list of Records in the Azure Search Cognitive Skill format. So out of the box, this cookiecutter can be setup as a Custom Cognitive Skill. For more on Azure Search and Cognitive Skills [see this page](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface).",
-												Add cookiecutter-spacy-fastapi to spacy universe (#4498)


											
										
										
											2019-10-22 15:50:40 +03:00
+								            "url": "https://github.com/microsoft/cookiecutter-spacy-fastapi",
 								            "image": "https://raw.githubusercontent.com/microsoft/cookiecutter-spacy-fastapi/master/images/cookiecutter-docs.png",
 								            "github": "microsoft/cookiecutter-spacy-fastapi",
 								            "category": ["apis"],
 								            "thumb": "https://avatars0.githubusercontent.com/u/6154722",
 								            "author": "Microsoft",
 								            "author_links": {
 								                "github": "microsoft"
 								            }
-												add dframcy to universe.json (#4580)


											
										
										
											2019-11-04 15:53:23 +03:00
+								        },
 								        {
 								            "id": "dframcy",
 								            "title": "Dframcy",
 								            "slogan": "Dataframe Integration with spaCy NLP",
 								            "github": "yash1994/dframcy",
 								            "description": "DframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy's linguistic annotation and training tasks.",
 								            "pip": "dframcy",
-												Update universe.json [ci skip]

											
										
										
											2019-11-04 15:55:55 +03:00
+								            "category": ["pipeline", "training"],
 								            "tags": ["pandas"],
-												add dframcy to universe.json (#4580)


											
										
										
											2019-11-04 15:53:23 +03:00
+								            "code_example": [
-												Update universe.json [ci skip]

											
										
										
											2019-11-04 15:55:55 +03:00
+								                "import spacy",
 								                "from dframcy import DframCy",
 								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
 								                "dframcy = DframCy(nlp)",
 								                "doc = dframcy.nlp(u'Apple is looking at buying U.K. startup for $1 billion')",
-												add dframcy to universe.json (#4580)


											
										
										
											2019-11-04 15:53:23 +03:00
+								                "annotation_dataframe = dframcy.to_dataframe(doc)"
 								            ],
 								            "author": "Yash Patadia",
 								            "author_links": {
 								                "twitter": "PatadiaYash",
 								                "github": "yash1994"
 								            }
-												Submitting `PyTextRank` for inclusion in the spaCy uniVerse (#4942)

* submitting PyTextRank for consideration of including in the spaCy uniVerse

* including SCA

											
										
										
											2020-01-28 13:37:54 +03:00
+								        },
 								        {
 								            "id": "spacy-pytextrank",
 								            "title": "PyTextRank",
 								            "slogan": "Py impl of TextRank for lightweight phrase extraction",
 								            "description": "An implementation of TextRank in Python for use in spaCy pipelines which provides fast, effective phrase extraction from texts, along with extractive summarization. The graph algorithm works independent of a specific natural language and does not require domain knowledge. See (Mihalcea 2004) https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf",
 								            "github": "DerwenAI/pytextrank",
 								            "pip": "pytextrank",
 								            "code_example": [
 								                "import spacy",
 								                "import pytextrank",
 								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "# example text",
 								                "text = \"\"\"Compatibility of systems of linear constraints over the set of natural numbers.",
 								                "Criteria of compatibility of a system of linear Diophantine equations, strict inequations,",
 								                "and nonstrict inequations are considered. Upper bounds for components of a minimal set of",
 								                "solutions and algorithms of construction of minimal generating sets of solutions for all types",
 								                "of systems are given. These criteria and the corresponding algorithms for constructing a minimal",
 								                "supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.\"\"\"",
-												Submitting `PyTextRank` for inclusion in the spaCy uniVerse (#4942)

* submitting PyTextRank for consideration of including in the spaCy uniVerse

* including SCA

											
										
										
											2020-01-28 13:37:54 +03:00
+								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "# load a spaCy model, depending on language, scale, etc.",
 								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "# add PyTextRank to the spaCy pipeline",
 								                "nlp.add_pipe(\"textrank\")",
-												Submitting `PyTextRank` for inclusion in the spaCy uniVerse (#4942)

* submitting PyTextRank for consideration of including in the spaCy uniVerse

* including SCA

											
										
										
											2020-01-28 13:37:54 +03:00
+								                "",
 								                "doc = nlp(text)",
 								                "# examine the top-ranked phrases in the document",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "for phrase in doc._.phrases:",
 								                "    print(phrase.text)",
 								                "    print(phrase.rank, phrase.count)",
 								                "    print(phrase.chunks)"
-												Submitting `PyTextRank` for inclusion in the spaCy uniVerse (#4942)

* submitting PyTextRank for consideration of including in the spaCy uniVerse

* including SCA

											
										
										
											2020-01-28 13:37:54 +03:00
+								            ],
 								            "code_language": "python",
 								            "url": "https://github.com/DerwenAI/pytextrank/wiki",
 								            "thumb": "https://memegenerator.net/img/instances/66942896.jpg",
 								            "image": "https://memegenerator.net/img/instances/66942896.jpg",
 								            "author": "Paco Nathan",
 								            "author_links": {
 								                "twitter": "pacoid",
 								                "github": "ceteri",
 								                "website": "https://derwen.ai/paco"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["phrase extraction", "ner", "summarization", "graph algorithms", "textrank"]
-												add spacy_syllables to universe + sign contributor agreement

											
										
										
											2020-03-13 19:58:29 +03:00
+								        },
 								        {
 								            "id": "spacy_syllables",
 								            "title": "Spacy Syllables",
 								            "slogan": "Multilingual syllable annotations",
 								            "description": "Spacy Syllables is a pipeline component that adds multilingual syllable annotations to Tokens. It uses Pyphen under the hood and has support for a long list of languages.",
 								            "github": "sloev/spacy-syllables",
 								            "pip": "spacy_syllables",
 								            "code_example": [
 								                "import spacy",
 								                "from spacy_syllables import SpacySyllables",
 								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "nlp.add_pipe(\"syllables\", after=\"tagger\")",
-												add spacy_syllables to universe + sign contributor agreement

											
										
										
											2020-03-13 19:58:29 +03:00
+								                "",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "assert nlp.pipe_names == [\"tok2vec\", \"tagger\", \"syllables\", \"parser\",  \"attribute_ruler\", \"lemmatizer\", \"ner\"]",
 								                "doc = nlp(\"terribly long\")",
-												Fixed example for spacy_syllables (#10705)

There was a typo in the example for the spacy_syllables project.
											
										
										
											2022-04-25 17:40:54 +03:00
+								                "data = [(token.text, token._.syllables, token._.syllables_count) for token in doc]",
-												Adding and updating content in the spacy universe (#10493)

* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2022-04-15 16:36:54 +03:00
+								                "assert data == [(\"terribly\", [\"ter\", \"ri\", \"bly\"], 3), (\"long\", [\"long\"], 1)]"
-												add spacy_syllables to universe + sign contributor agreement

											
										
										
											2020-03-13 19:58:29 +03:00
+								            ],
 								            "thumb": "https://raw.githubusercontent.com/sloev/spacy-syllables/master/logo.png",
 								            "author": "Johannes Valbjørn",
 								            "author_links": {
 								                "github": "sloev"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["syllables", "multilingual"]
-												add gobbli to spacy-universe 🥳

											
										
										
											2020-03-17 15:30:29 +03:00
+								        },
-												add spacy_onnx_sentiment_english to universe (#12422)

* add spacy_onnx_sentiment_english to universe

* rename to sentimental-onix

* fix comma json error

* fix typo

* typo fix

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* mention need to download model before example works

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2023-03-27 12:35:14 +03:00
+								        {
 								            "id": "sentimental-onix",
 								            "title": "Sentimental Onix",
 								            "slogan": "Use onnx for sentiment models",
 								            "description": "spaCy pipeline component for sentiment analysis using onnx",
 								            "github": "sloev/sentimental-onix",
 								            "pip": "sentimental-onix",
 								            "code_example": [
 								                "# Download model:",
 								                "#   python -m sentimental_onix download en",
 								                "import spacy",
 								                "from sentimental_onix import pipeline",
 								                "",
 								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "nlp.add_pipe(\"sentencizer\")",
 								                "nlp.add_pipe(\"sentimental_onix\", after=\"sentencizer\")",
 								                "",
 								                "sentences = [",
 								                "    (sent.text, sent._.sentiment)",
 								                "    for doc in nlp.pipe(",
 								                "        [",
 								                "            \"i hate pasta on tuesdays\",",
 								                "            \"i like movies on wednesdays\",",
 								                "            \"i find your argument ridiculous\",",
 								                "            \"soda with straws are my favorite\",",
 								                "        ]",
 								                "    )",
 								                "    for sent in doc.sents",
 								                "]",
 								                "",
 								                "assert sentences == [",
 								                "    (\"i hate pasta on tuesdays\", \"Negative\"),",
 								                "    (\"i like movies on wednesdays\", \"Positive\"),",
 								                "    (\"i find your argument ridiculous\", \"Negative\"),",
 								                "    (\"soda with straws are my favorite\", \"Positive\"),",
 								                "]"
 								            ],
 								            "thumb": "https://raw.githubusercontent.com/sloev/sentimental-onix/master/.github/onix.webp",
 								            "author": "Johannes Valbjørn",
 								            "author_links": {
 								                "github": "sloev"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["sentiment", "english"]
 								        },
-												add gobbli to spacy-universe 🥳

											
										
										
											2020-03-17 15:30:29 +03:00
+								        {
 								            "id": "gobbli",
 								            "title": "gobbli",
 								            "slogan": "Deep learning for text classification doesn't have to be scary",
-												Update universe.json [ci skip]

											
										
										
											2020-03-18 00:19:29 +03:00
+								            "description": "gobbli is a Python library which wraps several modern deep learning models in a uniform interface that makes it easy to evaluate feasibility and conduct analyses. It leverages the abstractive powers of Docker to hide nearly all dependency management and functional differences between models from the user. It also contains an interactive app for exploring text data and evaluating classification models. spaCy's base text classification models, as well as models integrated from `spacy-transformers`, are available in the collection of classification models. In addition, spaCy is used for data augmentation and document embeddings.",
-												add gobbli to spacy-universe 🥳

											
										
										
											2020-03-17 15:30:29 +03:00
+								            "url": "https://github.com/rtiinternational/gobbli",
 								            "github": "rtiinternational/gobbli",
 								            "pip": "gobbli",
 								            "thumb": "https://i.postimg.cc/NGpzhrdr/gobbli-lg.png",
 								            "code_example": [
 								                "from gobbli.io import PredictInput, TrainInput",
 								                "from gobbli.model.bert import BERT",
 								                "",
 								                "train_input = TrainInput(",
 								                "    X_train=['This is a training document.', 'This is another training document.'],",
 								                "    y_train=['0', '1'],",
 								                "    X_valid=['This is a validation sentence.', 'This is another validation sentence.'],",
 								                "    y_valid=['1', '0'],",
 								                ")",
 								                "",
 								                "clf = BERT()",
 								                "",
 								                "# Set up classifier resources -- Docker image, etc.",
 								                "clf.build()",
 								                "",
 								                "# Train model",
 								                "train_output = clf.train(train_input)",
 								                "",
 								                "predict_input = PredictInput(",
 								                "    X=['Which class is this document?'],",
 								                "    labels=train_output.labels,",
 								                "    checkpoint=train_output.checkpoint,",
 								                ")",
 								                "",
 								                "predict_output = clf.predict(predict_input)"
 								            ],
 								            "category": ["standalone"]
-												Add spacy_fastlang to universe (#5271)

* Add spacy_fastlang to universe

* Sign SCA
											
										
										
											2020-04-15 14:50:46 +03:00
+								        },
 								        {
 								            "id": "spacy_fastlang",
 								            "title": "Spacy FastLang",
 								            "slogan": "Language detection done fast",
 								            "description": "Fast language detection using FastText and Spacy.",
 								            "github": "thomasthiebaud/spacy-fastlang",
 								            "pip": "spacy_fastlang",
 								            "code_example": [
-												Update universe example codes (#9422)

* Update universe plugins

* Adjust azure trigger

* Add init to tests/universe

* deliberatly trying to break the universe to see if the CI catches it

* revert

Co-authored-by: svlandeg <svlandeg@github.com>
											
										
										
											2021-10-13 17:29:19 +03:00
+								                "import spacy_fastlang",
-												Add spacy_fastlang to universe (#5271)

* Add spacy_fastlang to universe

* Sign SCA
											
										
										
											2020-04-15 14:50:46 +03:00
+								                "",
-												Update universe example codes (#9422)

* Update universe plugins

* Adjust azure trigger

* Add init to tests/universe

* deliberatly trying to break the universe to see if the CI catches it

* revert

Co-authored-by: svlandeg <svlandeg@github.com>
											
										
										
											2021-10-13 17:29:19 +03:00
+								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "nlp.add_pipe(\"language_detector\")",
-												Remove the nested quote
											
										
										
											2020-05-23 18:58:19 +03:00
+								                "doc = nlp('Life is like a box of chocolates. You never know what you are gonna get.')",
-												Add spacy_fastlang to universe (#5271)

* Add spacy_fastlang to universe

* Sign SCA
											
										
										
											2020-04-15 14:50:46 +03:00
+								                "",
 								                "assert doc._.language == 'en'",
 								                "assert doc._.language_score >= 0.8"
 								            ],
 								            "author": "Thomas Thiebaud",
 								            "author_links": {
 								                "github": "thomasthiebaud"
 								            },
 								            "category": ["pipeline"]
-												Add mlflow to spaCy universe (#5352)

* Add mlflow to universe

* Use mlflow black logo
											
										
										
											2020-04-29 11:18:03 +03:00
+								        },
 								        {
 								            "id": "mlflow",
 								            "title": "MLflow",
 								            "slogan": "An open source platform for the machine learning lifecycle",
 								            "description": "MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components: Tracking, Projects, Models and Registry.",
 								            "github": "mlflow/mlflow",
 								            "pip": "mlflow",
 								            "thumb": "https://www.mlflow.org/docs/latest/_static/MLflow-logo-final-black.png",
 								            "image": "",
 								            "url": "https://mlflow.org/",
 								            "author": "Databricks",
 								            "author_links": {
 								                "github": "databricks",
 								                "twitter": "databricks",
 								                "website": "https://databricks.com/"
 								            },
 								            "category": ["standalone", "apis"],
 								            "code_example": [
 								                "import mlflow",
 								                "import mlflow.spacy",
 								                "",
 								                "# MLflow Tracking",
 								                "nlp = spacy.load('my_best_model_path/output/model-best')",
 								                "with mlflow.start_run(run_name='Spacy'):",
 								                "    mlflow.set_tag('model_flavor', 'spacy')",
 								                "    mlflow.spacy.log_model(spacy_model=nlp, artifact_path='model')",
 								                "    mlflow.log_metric(('accuracy', 0.72))",
 								                "    my_run_id = mlflow.active_run().info.run_id",
 								                "",
 								                "",
 								                "# MLflow Models",
 								                "model_uri = f'runs:/{my_run_id}/model'",
 								                "nlp2 = mlflow.spacy.load_model(model_uri=model_uri)"
 								            ]
-												Update universe.json
											
										
										
											2020-05-20 06:12:21 +03:00
+								        },
 								        {
 								            "id": "pyate",
 								            "title": "PyATE",
 								            "slogan": "Python Automated Term Extraction",
 								            "description": "PyATE is a term extraction library written in Python using Spacy POS tagging with Basic, Combo Basic, C-Value, TermExtractor, and Weirdness.",
 								            "github": "kevinlu1248/pyate",
 								            "pip": "pyate",
 								            "code_example": [
-												Changed pyate code example in universe.json
											
										
										
											2020-05-20 19:11:32 +03:00
+								                "import spacy",
-												Updated PyATE syntax to fit spaCy V3

											
										
										
											2021-06-27 03:52:41 +03:00
+								                "import pyate",
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								                "",
 								                "nlp = spacy.load('en_core_web_sm')",
-												Updated PyATE syntax to fit spaCy V3

											
										
										
											2021-06-27 03:52:41 +03:00
+								                "nlp.add_pipe(\"combo_basic\") # or any of `basic`, `weirdness`, `term_extractor` or `cvalue`",
-												Update universe and display of videos [ci skip]

											
										
										
											2020-05-21 22:54:23 +03:00
+								                "# source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994795/",
 								                "string = 'Central to the development of cancer are genetic changes that endow these “cancer cells” with many of the hallmarks of cancer, such as self-sufficient growth and resistance to anti-growth and pro-death signals. However, while the genetic changes that occur within cancer cells themselves, such as activated oncogenes or dysfunctional tumor suppressors, are responsible for many aspects of cancer development, they are not sufficient. Tumor promotion and progression are dependent on ancillary processes provided by cells of the tumor environment but that are not necessarily cancerous themselves. Inflammation has long been associated with the development of cancer. This review will discuss the reflexive relationship between cancer and inflammation with particular focus on how considering the role of inflammation in physiologic processes such as the maintenance of tissue homeostasis and repair may provide a logical framework for understanding the connection between the inflammatory response and cancer.'",
 								                "",
 								                "doc = nlp(string)",
 								                "print(doc._.combo_basic.sort_values(ascending=False).head(5))",
 								                "\"\"\"\"\"\"",
 								                "dysfunctional tumor                1.443147",
 								                "tumor suppressors                  1.443147",
 								                "genetic changes                    1.386294",
 								                "cancer cells                       1.386294",
 								                "dysfunctional tumor suppressors    1.298612",
 								                "\"\"\"\"\"\""
-												Update universe.json
											
										
										
											2020-05-20 06:12:21 +03:00
+								            ],
 								            "code_language": "python",
 								            "url": "https://github.com/kevinlu1248/pyate",
 								            "author": "Kevin Lu",
 								            "author_links": {
 								                "twitter": "kevinlu1248",
 								                "github": "kevinlu1248",
 								                "website": "https://github.com/kevinlu1248/pyate"
 								            },
 								            "category": ["pipeline", "research"],
 								            "tags": ["term_extraction"]
-												update spacy universe with my project (#5497)

* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
											
										
										
											2020-05-25 12:30:23 +03:00
+								        },
 								        {
 								            "id": "contextualSpellCheck",
 								            "title": "Contextual Spell Check",
 								            "slogan": "Contextual spell correction using BERT (bidirectional representations)",
-												updated code eg & description of contextualSpellCheck (#7096)


											
										
										
											2021-02-17 15:26:43 +03:00
+								            "description": "This package currently focuses on Out of Vocabulary (OOV) word or non-word error (NWE) correction using BERT model. The idea of using BERT was to use the context when correcting NWE.",
-												update spacy universe with my project (#5497)

* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
											
										
										
											2020-05-25 12:30:23 +03:00
+								            "github": "R1j1t/contextualSpellCheck",
 								            "pip": "contextualSpellCheck",
 								            "code_example": [
 								                "import spacy",
 								                "import contextualSpellCheck",
 								                "",
-												updated code eg & description of contextualSpellCheck (#7096)


											
										
										
											2021-02-17 15:26:43 +03:00
+								                "nlp = spacy.load('en_core_web_sm')",
-												update spacy universe with my project (#5497)

* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
											
										
										
											2020-05-25 12:30:23 +03:00
+								                "contextualSpellCheck.add_to_pipe(nlp)",
 								                "doc = nlp('Income was $9.4 milion compared to the prior year of $2.7 milion.')",
 								                "",
 								                "print(doc._.performed_spellCheck) #Should be True",
 								                "print(doc._.outcome_spellCheck) #Income was $9.4 million compared to the prior year of $2.7 million."
 								            ],
 								            "code_language": "python",
 								            "url": "https://github.com/R1j1t/contextualSpellCheck",
 								            "thumb": "https://user-images.githubusercontent.com/22280243/82760949-98e68480-9e14-11ea-952e-4738620fd9e3.png",
 								            "image": "https://user-images.githubusercontent.com/22280243/82138959-2852cd00-9842-11ea-918a-49b2a7873ef6.png",
 								            "author": "Rajat Goel",
 								            "author_links": {
 								                "github": "r1j1t",
 								                "website": "https://github.com/R1j1t"
 								            },
 								            "category": ["pipeline", "conversational", "research"],
 								            "tags": ["spell check", "correction", "preprocessing", "translation", "correction"]
-												Add texthero to universe.json (#5716)

* Add texthero to universe.json

* Add spaCy contributor Agreement
											
										
										
											2020-07-07 21:54:22 +03:00
+								        },
 								        {
 								            "id": "texthero",
 								            "title": "Texthero",
 								            "slogan": "Text preprocessing, representation and visualization from zero to hero.",
 								            "description": "Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.",
 								            "github": "jbesomi/texthero",
 								            "pip": "texthero",
 								            "code_example": [
 								                "import texthero as hero",
 								                "import pandas as pd",
 								                "",
 								                "df = pd.read_csv('https://github.com/jbesomi/texthero/raw/master/dataset/bbcsport.csv')",
 								                "df['named_entities'] = hero.named_entities(df['text'])",
 								                "df.head()"
 								            ],
 								            "code_language": "python",
 								            "url": "https://texthero.org",
 								            "thumb": "https://texthero.org/img/T.png",
 								            "image": "https://texthero.org/docs/assets/texthero.png",
 								            "author": "Jonathan Besomi",
 								            "author_links": {
 								                "github": "jbesomi",
 								                "website": "https://besomi.ai"
 								            },
-												Fix and update universe.json [ci skip]

											
										
										
											2020-07-07 22:12:28 +03:00
+								            "category": ["standalone"]
-												Add VA COVID-19 NLP project to spaCy Universe (#5777)

* Update universe.json

Add cov-bsv to "resources"

* Update universe.json

* add contributor agreement
											
										
										
											2020-07-19 14:35:31 +03:00
+								        },
 								        {
 								            "id": "cov-bsv",
 								            "title": "VA COVID-19 NLP BSV",
 								            "slogan": "spaCy pipeline for COVID-19 surveillance.",
 								            "github": "abchapman93/VA_COVID-19_NLP_BSV",
 								            "description": "A spaCy rule-based pipeline for identifying positive cases of COVID-19 from clinical text. A version of this system was deployed as part of the US Department of Veterans Affairs biosurveillance response to COVID-19.",
 								            "pip": "cov-bsv",
 								            "code_example": [
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								                "import cov_bsv",
 								                "",
 								                "nlp = cov_bsv.load()",
 								                "doc = nlp('Pt tested for COVID-19. His wife was recently diagnosed with novel coronavirus. SARS-COV-2: Detected')",
 								                "",
 								                "print(doc.ents)",
 								                "print(doc._.cov_classification)",
 								                "cov_bsv.visualize_doc(doc)"
-												Add VA COVID-19 NLP project to spaCy Universe (#5777)

* Update universe.json

Add cov-bsv to "resources"

* Update universe.json

* add contributor agreement
											
										
										
											2020-07-19 14:35:31 +03:00
+								            ],
 								            "category": ["pipeline", "standalone", "biomedical", "scientific"],
 								            "tags": ["clinical", "epidemiology", "covid-19", "surveillance"],
 								            "author": "Alec Chapman",
 								            "author_links": {
 								                "github": "abchapman93"
 								            }
-												add medspacy to universe and fix example w/ cov-bsv

											
										
										
											2020-10-29 16:53:56 +03:00
+								        },
 								        {
 								            "id": "medspacy",
 								            "title": "medspaCy",
-												fix thumbnail link to be github raw url

											
										
										
											2020-11-01 17:53:48 +03:00
+								            "thumb": "https://raw.githubusercontent.com/medspacy/medspacy/master/images/medspacy_logo.png",
-												add medspacy to universe and fix example w/ cov-bsv

											
										
										
											2020-10-29 16:53:56 +03:00
+								            "slogan": "A toolkit for clinical NLP with spaCy.",
 								            "github": "medspacy/medspacy",
 								            "description": "A toolkit for clinical NLP with spaCy. Features include sentence splitting, section detection, and asserting negation, family history, and uncertainty.",
 								            "pip": "medspacy",
 								            "code_example": [
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								                "import medspacy",
 								                "from medspacy.ner import TargetRule",
 								                "",
 								                "nlp = medspacy.load()",
 								                "print(nlp.pipe_names)",
 								                "",
 								                "nlp.get_pipe('target_matcher').add([TargetRule('stroke', 'CONDITION'), TargetRule('diabetes', 'CONDITION'), TargetRule('pna', 'CONDITION')])",
 								                "doc = nlp('Patient has hx of stroke. Mother diagnosed with diabetes. No evidence of pna.')",
 								                "",
 								                "for ent in doc.ents:",
 								                "    print(ent, ent._.is_negated, ent._.is_family, ent._.is_historical)",
 								                "medspacy.visualization.visualize_ent(doc)"
-												add medspacy to universe and fix example w/ cov-bsv

											
										
										
											2020-10-29 16:53:56 +03:00
+								            ],
 								            "category": ["biomedical", "scientific", "research"],
 								            "tags": ["clinical"],
 								            "author": "medspacy",
 								            "author_links": {
 								                "github": "medspacy"
 								            }
-												Website (Universe): An entry for rita-dsl (#6138)

* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
											
										
										
											2020-10-06 12:19:36 +03:00
+								        },
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								        {
-												Website (Universe): An entry for rita-dsl (#6138)

* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
											
										
										
											2020-10-06 12:19:36 +03:00
+								            "id": "rita-dsl",
 								            "title": "RITA DSL",
 								            "slogan": "Domain Specific Language for creating language rules",
 								            "github": "zaibacu/rita-dsl",
 								            "description": "A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format",
 								            "pip": "rita-dsl",
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								            "thumb": "https://raw.githubusercontent.com/zaibacu/rita-dsl/master/docs/assets/logo-100px.png",
-												Website (Universe): An entry for rita-dsl (#6138)

* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
											
										
										
											2020-10-06 12:19:36 +03:00
+								            "code_language": "python",
 								            "code_example": [
 								                "import spacy",
 								                "from rita.shortcuts import setup_spacy",
 								                "",
 								                "rules = \"\"\"",
 								                "cuts = {\"fitted\", \"wide-cut\"}",
 								                "lengths = {\"short\", \"long\", \"calf-length\", \"knee-length\"}",
 								                "fabric_types = {\"soft\", \"airy\", \"crinkled\"}",
 								                "fabrics = {\"velour\", \"chiffon\", \"knit\", \"woven\", \"stretch\"}",
 								                "",
 								                "{IN_LIST(cuts)?, IN_LIST(lengths), WORD(\"dress\")}->MARK(\"DRESS_TYPE\")",
 								                "{IN_LIST(lengths), IN_LIST(cuts), WORD(\"dress\")}->MARK(\"DRESS_TYPE\")",
 								                "{IN_LIST(fabric_types)?, IN_LIST(fabrics)}->MARK(\"DRESS_FABRIC\")",
 								                "\"\"\"",
 								                "",
 								                "nlp = spacy.load(\"en\")",
 								                "setup_spacy(nlp, rules_string=rules)",
 								                "r = nlp(\"She was wearing a short wide-cut dress\")",
 								                "print(list([{\"label\": e.label_, \"text\": e.text} for e in r.ents]))"
 								            ],
 								            "category": ["standalone"],
 								            "tags": ["dsl", "language-patterns", "language-rules", "nlp"],
 								            "author": "Šarūnas Navickas",
 								            "author_links": {
 								                "github": "zaibacu"
 								            }
-												Update universe.json (include PatternOmatic) (#6399)

Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
											
										
										
											2020-11-19 15:15:50 +03:00
+								        },
 								        {
 								            "id": "PatternOmatic",
 								            "title": "PatternOmatic",
 								            "slogan": "Finds linguistic patterns effortlessly",
 								            "description": "Discover spaCy's linguistic patterns matching a given set of String samples to be used by the spaCy's Rule Based Matcher",
 								            "github": "revuel/PatternOmatic",
 								            "pip": "PatternOmatic",
 								            "code_example": [
 								                "from PatternOmatic.api import find_patterns",
 								                "",
 								                "samples = ['I am a cat!', 'You are a dog!', 'She is an owl!']",
 								                "",
 								                "patterns_found, _ = find_patterns(samples)",
 								                "",
 								                "print(f'Patterns found: {patterns_found}')"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://svgshare.com/i/R3P.svg",
 								            "image": "https://svgshare.com/i/R3P.svg",
 								            "author": "Miguel Revuelta Espinosa",
 								            "author_links": {
 								                "github": "revuel"
 								            },
 								            "category": ["scientific", "research", "standalone"],
 								            "tags": ["Evolutionary Computation", "Grammatical Evolution"]
-												spaCy Universe: New project; SpacyDotNet (#6702)

* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
											
										
										
											2021-01-13 04:47:30 +03:00
+								        },
 								        {
 								            "id": "SpacyDotNet",
 								            "title": "spaCy .NET Wrapper",
 								            "slogan": "SpacyDotNet is a .NET Core compatible wrapper for spaCy, based on Python.NET",
 								            "description": "This projects relies on [Python.NET](http://pythonnet.github.io/) to interop with spaCy. It's not meant to be a complete and exhaustive implementation of all spaCy features and [APIs](https://spacy.io/api). Although it should be enough for basic tasks, it's considered as a starting point if you need to build a complex project using spaCy in .NET Most of the basic features in _Spacy101_ are available. All `Container` classes are present (`Doc`, `Token`, `Span` and `Lexeme`) with their basic properties/methods running and also `Vocab` and `StringStore` in a limited form. Anyway, any developer should be ready to add the missing properties or classes in a very straightforward manner.",
 								            "github": "AMArostegui/SpacyDotNet",
 								            "thumb": "https://raw.githubusercontent.com/AMArostegui/SpacyDotNet/master/cslogo.png",
 								            "code_example": [
 								                "var spacy = new Spacy();",
 								                "",
 								                "var nlp = spacy.Load(\"en_core_web_sm\");",
 								                "var doc = nlp.GetDocument(\"Apple is looking at buying U.K. startup for $1 billion\");",
 								                "",
 								                "foreach (Token token in doc.Tokens)",
 								                "    Console.WriteLine($\"{token.Text} {token.Lemma} {token.PoS} {token.Tag} {token.Dep} {token.Shape} {token.IsAlpha} {token.IsStop}\");",
 								                "",
 								                "Console.WriteLine(\"\");",
 								                "foreach (Span ent in doc.Ents)",
 								                "    Console.WriteLine($\"{ent.Text} {ent.StartChar} {ent.EndChar} {ent.Label}\");",
 								                "",
 								                "nlp = spacy.Load(\"en_core_web_md\");",
 								                "var tokens = nlp.GetDocument(\"dog cat banana afskfsd\");",
 								                "",
 								                "Console.WriteLine(\"\");",
 								                "foreach (Token token in tokens.Tokens)",
 								                "    Console.WriteLine($\"{token.Text} {token.HasVector} {token.VectorNorm}, {token.IsOov}\");",
 								                "",
 								                "tokens = nlp.GetDocument(\"dog cat banana\");",
 								                "Console.WriteLine(\"\");",
 								                "foreach (Token token1 in tokens.Tokens)",
 								                "{",
 								                "    foreach (Token token2 in tokens.Tokens)",
 								                "        Console.WriteLine($\"{token1.Text} {token2.Text} {token1.Similarity(token2) }\");",
 								                "}",
 								                "",
 								                "doc = nlp.GetDocument(\"I love coffee\");",
 								                "Console.WriteLine(\"\");",
 								                "Console.WriteLine(doc.Vocab.Strings[\"coffee\"]);",
 								                "Console.WriteLine(doc.Vocab.Strings[3197928453018144401]);",
 								                "",
 								                "Console.WriteLine(\"\");",
 								                "foreach (Token word in doc.Tokens)",
 								                "{",
 								                "    var lexeme = doc.Vocab[word.Text];",
 								                "    Console.WriteLine($@\"{lexeme.Text} {lexeme.Orth} {lexeme.Shape} {lexeme.Prefix} {lexeme.Suffix} {lexeme.IsAlpha} {lexeme.IsDigit} {lexeme.IsTitle} {lexeme.Lang}\");",
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								                "}"
 								            ],
-												spaCy Universe: New project; SpacyDotNet (#6702)

* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
											
										
										
											2021-01-13 04:47:30 +03:00
+								            "code_language": "csharp",
 								            "author": "Antonio Miras",
 								            "author_links": {
 								                "github": "AMArostegui"
 								            },
 								            "category": ["nonpython"]
-												Update universe.json
											
										
										
											2021-02-15 15:01:46 +03:00
+								        },
-												Auto-format [ci skip]

											
										
										
											2021-02-24 14:37:32 +03:00
+								        {
 								            "id": "ruts",
 								            "title": "ruTS",
 								            "slogan": "A library for statistics extraction from texts in Russian",
 								            "description": "The library allows extracting the following statistics from a text: basic statistics, readability metrics, lexical diversity metrics, morphological statistics",
 								            "github": "SergeyShk/ruTS",
 								            "pip": "ruts",
 								            "code_example": [
 								                "import spacy",
 								                "import ruts",
 								                "",
 								                "nlp = spacy.load('ru_core_news_sm')",
 								                "nlp.add_pipe('basic', last=True)",
 								                "doc = nlp('мама мыла раму')",
 								                "doc._.basic.get_stats()"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://habrastorage.org/webt/6z/le/fz/6zlefzjavzoqw_wymz7v3pwgfp4.png",
 								            "image": "https://clipartart.com/images/free-tree-roots-clipart-black-and-white-2.png",
 								            "author": "Sergey Shkarin",
 								            "author_links": {
 								                "twitter": "shk_sergey",
 								                "github": "SergeyShk"
 								            },
 								            "category": ["pipeline", "standalone"],
 								            "tags": ["Text Analytics", "Russian"]
-												Add TRUNAJOD to spaCy universe. (#7754)

* Add TRUNAJOD to spaCy universe.

* Add trunajod logo and thumb.

Co-authored-by: Diego <dpalma@evernote.com>
											
										
										
											2021-04-22 09:40:28 +03:00
+								        },
 								        {
 								            "id": "trunajod",
 								            "title": "TRUNAJOD",
 								            "slogan": "A text complexity library for text analysis built on spaCy",
 								            "description": "With all the basic NLP capabilities provided by spaCy (dependency parsing, POS tagging, tokenizing), `TRUNAJOD` focuses on extracting measurements from texts that might be interesting for different applications and use cases.",
 								            "github": "dpalmasan/TRUNAJOD2.0",
 								            "pip": "trunajod",
 								            "code_example": [
 								                "import spacy",
 								                "from TRUNAJOD.entity_grid import EntityGrid",
 								                "",
 								                "nlp = spacy.load('es_core_news_sm', disable=['ner', 'textcat'])",
 								                "example_text = (",
 								                "    'El espectáculo del cielo nocturno cautiva la mirada y suscita preguntas'",
 								                "    'sobre el universo, su origen y su funcionamiento. No es sorprendente que '",
 								                "    'todas las civilizaciones y culturas hayan formado sus propias '",
 								                "    'cosmologías. Unas relatan, por ejemplo, que el universo ha'",
 								                "    'sido siempre tal como es, con ciclos que inmutablemente se repiten; '",
 								                "    'otras explican que este universo ha tenido un principio, '",
 								                "    'que ha aparecido por obra creadora de una divinidad.'",
 								                ")",
 								                "doc = nlp(example_text)",
 								                "egrid = EntityGrid(doc)",
 								                "print(egrid.get_egrid())"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://raw.githubusercontent.com/dpalmasan/TRUNAJOD2.0/master/imgs/trunajod_thumb.png",
 								            "image": "https://raw.githubusercontent.com/dpalmasan/TRUNAJOD2.0/master/imgs/trunajod_logo.png",
 								            "author": "Diego Palma",
 								            "author_links": {
 								                "github": "dpalmasan"
 								            },
 								            "category": ["research", "standalone", "scientific"],
 								            "tags": ["Text Analytics", "Coherence", "Cohesion"]
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								        },
-												Adding LingFeat Software to spaCy Universe. (#9574)

* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-11-01 12:38:14 +03:00
+								        {
 								            "id": "lingfeat",
 								            "title": "LingFeat",
 								            "slogan": "A Linguistic Feature Extraction (Text Analysis) Tool for Readability Assessment and Text Simplification",
 								            "description": "LingFeat is a feature extraction library which currently extracts 255 linguistic features from English string input. Categories include syntax, semantics, discourse, and also traditional readability formulas. Published in EMNLP 2021.",
 								            "github": "brucewlee/lingfeat",
 								            "pip": "lingfeat",
 								            "code_example": [
 								                "from lingfeat import extractor",
 								                "",
 								                "",
 								                "text = 'TAEAN, South Chungcheong Province -- Just before sunup, Lee Young-ho, a seasoned fisherman with over 30 years of experience, silently waits for boats carrying blue crabs as the season for the seafood reaches its height. Soon afterward, small and big boats sail into Sinjin Port in Taean County, South Chungcheong Province, the second-largest source of blue crab after Incheon, accounting for 29 percent of total production of the country. A crane lifts 28 boxes filled with blue crabs weighing 40 kilograms each from the boat, worth about 10 million won ($8,500). “It has been a productive fall season for crabbing here. The water temperature is a very important factor affecting crab production. They hate cold water,” Lee said. The temperature of the sea off Taean appeared to have stayed at the level where crabs become active. If the sea temperature suddenly drops, crabs go into their winter dormancy mode, burrowing into the mud and sleeping through the cold months.'",
 								                "",
 								                "",
 								                "#Pass text",
 								                "LingFeat = extractor.pass_text(text)",
 								                "",
 								                "",
 								                "#Preprocess text",
 								                "LingFeat.preprocess()",
 								                "",
 								                "",
 								                "#Extract features",
 								                "#each method returns a dictionary of the corresponding features",
 								                "#Advanced Semantic (AdSem) Features",
 								                "WoKF = LingFeat.WoKF_() #Wikipedia Knowledge Features",
 								                "WBKF = LingFeat.WBKF_() #WeeBit Corpus Knowledge Features",
 								                "OSKF = LingFeat.OSKF_() #OneStopEng Corpus Knowledge Features",
 								                "",
 								                "#Discourse (Disco) Features",
 								                "EnDF = LingFeat.EnDF_() #Entity Density Features",
 								                "EnGF = LingFeat.EnGF_() #Entity Grid Features",
 								                "",
 								                "#Syntactic (Synta) Features",
 								                "PhrF = LingFeat.PhrF_() #Noun/Verb/Adj/Adv/... Phrasal Features",
 								                "TrSF = LingFeat.TrSF_() #(Parse) Tree Structural Features",
 								                "POSF = LingFeat.POSF_() #Noun/Verb/Adj/Adv/... Part-of-Speech Features",
 								                "",
 								                "#Lexico Semantic (LxSem) Features",
 								                "TTRF = LingFeat.TTRF_() #Type Token Ratio Features",
-												chore: add 'concepCy' to spacy universe (#11255)

* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
											
										
										
											2022-08-04 09:42:38 +03:00
+								                "VarF = LingFeat.VarF_() #Noun/Verb/Adj/Adv Variation Features",
-												Adding LingFeat Software to spaCy Universe. (#9574)

* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-11-01 12:38:14 +03:00
+								                "PsyF = LingFeat.PsyF_() #Psycholinguistic Difficulty of Words (AoA Kuperman)",
 								                "WoLF = LingFeat.WorF_() #Word Familiarity from Frequency Count (SubtlexUS)",
 								                "",
 								                "Shallow Traditional (ShTra) Features",
 								                "ShaF = LingFeat.ShaF_() #Shallow Features (e.g. avg number of tokens)",
 								                "TraF = LingFeat.TraF_() #Traditional Formulas"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://raw.githubusercontent.com/brucewlee/lingfeat/master/img/lingfeat_logo2.png",
 								            "image": "https://raw.githubusercontent.com/brucewlee/lingfeat/master/img/lingfeat_logo.png",
 								            "author": "Bruce W. Lee (이웅성)",
 								            "author_links": {
 								                "github": "brucewlee",
 								                "website": "https://brucewlee.github.io/"
 								            },
 								            "category": ["research", "scientific"],
-												Website migration from Gatsby to Next (#12058)

* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: https://github.com/vercel/next.js/blob/77b5f79a4dff453abb62346bf75b14d859539b81/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
											
										
										
											2023-01-11 19:30:07 +03:00
+								            "tags": [
 								                "Readability",
 								                "Simplification",
 								                "Feature Extraction",
 								                "Syntax",
 								                "Discourse",
 								                "Semantics",
 								                "Lexical"
 								            ]
-												Adding LingFeat Software to spaCy Universe. (#9574)

* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-11-01 12:38:14 +03:00
+								        },
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								        {
 								            "id": "hmrb",
 								            "title": "Hammurabi",
 								            "slogan": "Python Rule Processing Engine 🏺",
 								            "description": "Hammurabi works as a rule engine to parse input using a defined set of rules. It uses a simple and readable syntax to define complex rules to handle phrase matching. The syntax supports nested logical statements, regular expressions, reusable or side-loaded variables and match triggered callback functions to modularize your rules. The latest version works with both spaCy 2.X and 3.X. For more information check the documentation on [ReadTheDocs](https://hmrb.readthedocs.io/en/latest/).",
 								            "github": "babylonhealth/hmrb",
 								            "pip": "hmrb",
 								            "code_example": [
-												Update Hammurabi example code to v3 (#9218)

* Update Hammurabi example code

* Fix typo
											
										
										
											2021-09-16 14:32:44 +03:00
+								                "import spacy",
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								                "from hmrb.core import SpacyCore",
 								                "",
-												Update Hammurabi example code to v3 (#9218)

* Update Hammurabi example code

* Fix typo
											
										
										
											2021-09-16 14:32:44 +03:00
+								                "nlp = spacy.load(\"en_core_web_sm\")",
 								                "sentences = \"I love gorillas. Peter loves gorillas. Jane loves Tarzan.\"",
 								                "",
 								                "def conj_be(subj: str) -> str:",
 								                "   if subj == \"I\":",
 								                "       return \"am\"",
 								                "   elif subj == \"you\":",
 								                "       return \"are\"",
 								                "   else:",
 								                "       return \"is\"",
 								                "",
 								                "@spacy.registry.callbacks(\"gorilla_callback\")",
 								                "def gorilla_clb(seq: list, span: slice, data: dict) -> None:",
 								                "   subj = seq[span.start].text",
 								                "   be = conj_be(subj)",
 								                "   print(f\"{subj} {be} a gorilla person.\")",
 								                "@spacy.registry.callbacks(\"lover_callback\")",
 								                "def lover_clb(seq: list, span: slice, data: dict) -> None:",
 								                "   print(f\"{seq[span][-1].text} is a love interest of {seq[span.start].text}.\")",
 								                "",
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								                "grammar = \"\"\"",
-												Update Hammurabi example code to v3 (#9218)

* Update Hammurabi example code

* Fix typo
											
										
										
											2021-09-16 14:32:44 +03:00
+								                "   Law:",
 								                "   - callback: \"loves_gorilla\"",
 								                "   (",
 								                "   ((pos: \"PROPN\") or (pos: \"PRON\"))",
 								                "   (lemma: \"love\")",
 								                "   (lemma: \"gorilla\")",
 								                "   )",
 								                "   Law:",
 								                "   - callback: \"loves_someone\"",
 								                "   (",
 								                "   (pos: \"PROPN\")",
 								                "   (lower: \"loves\")",
 								                "   (pos: \"PROPN\")",
 								                "   )",
 								                "\"\"\"",
 								                "",
 								                "@spacy.registry.augmenters(\"jsonify_span\")",
 								                "def jsonify_span(span):",
 								                "   return [{\"lemma\": token.lemma_, \"pos\": token.pos_, \"lower\": token.lower_} for token in span]",
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								                "",
 								                "conf = {",
-												Update Hammurabi example code to v3 (#9218)

* Update Hammurabi example code

* Fix typo
											
										
										
											2021-09-16 14:32:44 +03:00
+								                "   \"rules\": grammar,",
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								                "   \"callbacks\": {",
-												Update Hammurabi example code to v3 (#9218)

* Update Hammurabi example code

* Fix typo
											
										
										
											2021-09-16 14:32:44 +03:00
+								                "       \"loves_gorilla\": \"callbacks.gorilla_callback\",",
 								                "       \"loves_someone\": \"callbacks.lover_callback\",",
 								                "   },",
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								                "   \"map_doc\": \"augmenters.jsonify_span\",",
 								                "   \"sort_length\": True,",
 								                "}",
-												Update Hammurabi example code to v3 (#9218)

* Update Hammurabi example code

* Fix typo
											
										
										
											2021-09-16 14:32:44 +03:00
+								                "",
 								                "nlp.add_pipe(\"hmrb\", config=conf)",
-												Add hmrb to spaCy Universe (#8129)

* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
											
										
										
											2021-05-31 11:40:48 +03:00
+								                "nlp(sentences)"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://user-images.githubusercontent.com/6807878/118643685-cae6b880-b7d4-11eb-976e-066aec9505da.png",
 								            "image": "https://user-images.githubusercontent.com/6807878/118643685-cae6b880-b7d4-11eb-976e-066aec9505da.png",
 								            "author": "Kristian Boda",
 								            "author_links": {
 								                "github": "bodak",
 								                "twitter": "bodak",
 								                "website": "https://github.com/babylonhealth/"
 								            },
 								            "category": ["pipeline", "standalone", "scientific", "biomedical"],
 								            "tags": ["babylonhealth", "rule-engine", "matcher"]
-												Add forte to universe.json

											
										
										
											2021-06-29 23:17:22 +03:00
+								        },
 								        {
 								            "id": "forte",
 								            "title": "Forte",
 								            "slogan": "Forte is a toolkit for building Natural Language Processing pipelines, featuring cross-task interaction, adaptable data-model interfaces and composable pipelines.",
 								            "description": "Forte provides a platform to assemble state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural Language Generation.",
 								            "github": "asyml/forte",
-												Update example code of forte (#9175)

Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
											
										
										
											2021-09-11 07:13:13 +03:00
+								            "pip": "forte.spacy stave torch",
-												Add forte to universe.json

											
										
										
											2021-06-29 23:17:22 +03:00
+								            "code_example": [
-												Update example code of forte (#9175)

Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
											
										
										
											2021-09-11 07:13:13 +03:00
+								                "from fortex.spacy import SpacyProcessor",
 								                "from forte.processors.stave import StaveProcessor",
-												Add forte to universe.json

											
										
										
											2021-06-29 23:17:22 +03:00
+								                "from forte import Pipeline",
 								                "from forte.data.readers import StringReader",
 								                "",
-												Update pip and code example

											
										
										
											2021-06-30 21:49:51 +03:00
+								                "pipeline = Pipeline()",
-												Add forte to universe.json

											
										
										
											2021-06-29 23:17:22 +03:00
+								                "pipeline.set_reader(StringReader())",
-												Update pip and code example

											
										
										
											2021-06-30 21:49:51 +03:00
+								                "pipeline.add(SpacyProcessor())",
-												Update example code of forte (#9175)

Co-authored-by: Suqi Sun <suqi.sun@petuum.com>
											
										
										
											2021-09-11 07:13:13 +03:00
+								                "pipeline.add(StaveProcessor())",
-												Add forte to universe.json

											
										
										
											2021-06-29 23:17:22 +03:00
+								                "pipeline.run('Running SpaCy with Forte!')"
 								            ],
 								            "code_language": "python",
 								            "url": "https://medium.com/casl-project/forte-building-modular-and-re-purposable-nlp-pipelines-cf5b5c5abbe9",
 								            "thumb": "https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_graphic.png",
 								            "image": "https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/logo_h.png",
 								            "author": "Petuum",
 								            "author_links": {
 								                "twitter": "PetuumInc",
 								                "github": "asyml",
 								                "website": "https://petuum.com"
 								            },
 								            "category": ["pipeline", "standalone"],
 								            "tags": ["pipeline"]
-												Fix universe.json [ci skip]

											
										
										
											2021-08-20 04:26:29 +03:00
+								        },
 								        {
 								            "id": "spacy-api-docker-v3",
 								            "slogan": "spaCy v3 REST API, wrapped in a Docker container",
 								            "github": "bbieniek/spacy-api-docker",
 								            "url": "https://hub.docker.com/r/bbieniek/spacyapi/",
 								            "thumb": "https://i.imgur.com/NRnDKyj.jpg",
 								            "code_example": [
 								                "version: '3'",
 								                "",
 								                "services:",
 								                "  spacyapi:",
 								                "    image: bbieniek/spacyapi:en_v3",
 								                "    ports:",
 								                "      - \"127.0.0.1:8080:80\"",
 								                "    restart: always"
 								            ],
 								            "code_language": "docker",
 								            "author": "Baltazar Bieniek",
 								            "author_links": {
 								                "github": "bbieniek"
 								            },
 								            "category": ["apis"]
-												Adding PhruzzMatcher to spaCy universe (#9321)

* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
											
										
										
											2021-09-30 07:46:53 +03:00
+								        },
 								        {
 								            "id": "phruzz_matcher",
 								            "title": "phruzz-matcher",
 								            "slogan": "Phrase matcher using RapidFuzz",
-												Fix invalid json

											
										
										
											2021-09-30 09:23:55 +03:00
+								            "description": "Combination of the RapidFuzz library with Spacy PhraseMatcher The goal of this component is to find matches when there were NO \"perfect matches\" due to typos or abbreviations between a Spacy doc and a list of phrases.",
-												Adding PhruzzMatcher to spaCy universe (#9321)

* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
											
										
										
											2021-09-30 07:46:53 +03:00
+								            "github": "mjvallone/phruzz-matcher",
 								            "pip": "phruzz_matcher",
 								            "code_example": [
 								                "import spacy",
 								                "from spacy.language import Language",
 								                "from phruzz_matcher.phrase_matcher import PhruzzMatcher",
 								                "",
 								                "famous_people = [",
 								                "        \"Brad Pitt\",",
 								                "        \"Demi Moore\",",
 								                "        \"Bruce Willis\",",
 								                "        \"Jim Carrey\",",
 								                "]",
 								                "",
 								                "@Language.factory(\"phrase_matcher\")",
 								                "def phrase_matcher(nlp: Language, name: str):",
 								                "    return PhruzzMatcher(nlp, famous_people, \"FAMOUS_PEOPLE\", 85)",
 								                "",
 								                "nlp = spacy.blank('es')",
 								                "nlp.add_pipe(\"phrase_matcher\")",
 								                "",
 								                "doc = nlp(\"El otro día fui a un bar donde vi a brad pit y a Demi Moore, estaban tomando unas cervezas mientras charlaban de sus asuntos.\")",
 								                "print(f\"doc.ents: {doc.ents}\")",
 								                "",
 								                "#OUTPUT",
-												Fix invalid json

											
										
										
											2021-09-30 09:23:55 +03:00
+								                "#doc.ents: (brad pit, Demi Moore)"
-												Adding PhruzzMatcher to spaCy universe (#9321)

* Adding PhruzzMatcher to spaCy universe

* Fixes to make the package work properly
											
										
										
											2021-09-30 07:46:53 +03:00
+								            ],
 								            "thumb": "https://avatars.githubusercontent.com/u/961296?v=4",
 								            "image": "",
 								            "code_language": "python",
 								            "author": "Martin Vallone",
 								            "author_links": {
 								                "github": "mjvallone",
 								                "twitter": "vallotin",
 								                "website": "https://fiqus.coop/"
 								            },
 								            "category": ["pipeline", "research", "standalone"],
 								            "tags": ["spacy", "python", "nlp", "ner"]
-												Add WordDumb to spaCy Universe (#9572)

* Add WordDumb to spaCy Universe

* Add standalone category

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-11-01 12:38:41 +03:00
+								        },
 								        {
 								            "id": "WordDumb",
 								            "title": "WordDumb",
 								            "slogan": "A calibre plugin that generates Word Wise and X-Ray files.",
 								            "description": "A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.",
 								            "github": "xxyzz/WordDumb",
 								            "code_language": "python",
 								            "thumb": "https://raw.githubusercontent.com/xxyzz/WordDumb/master/starfish.svg",
 								            "image": "https://user-images.githubusercontent.com/21101839/130245435-b874f19a-7785-4093-9975-81596efc42bb.png",
 								            "author": "xxyzz",
 								            "author_links": {
 								                "github": "xxyzz"
 								            },
 								            "category": ["standalone"]
-												Update universe.json with new library eng_spacysentiment (#9679)

* Update universe.json

* Update universe.json

* Cleanup fields

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-11-16 08:06:19 +03:00
+								        },
 								        {
 								            "id": "eng_spacysentiment",
 								            "title": "eng_spacysentiment",
 								            "slogan": "Simple sentiment analysis using spaCy pipelines",
 								            "description": "Sentiment analysis for simple english sentences using pre-trained spaCy pipelines",
 								            "github": "vishnunkumar/spacysentiment",
 								            "pip": "eng-spacysentiment",
 								            "code_example": [
 								                "import eng_spacysentiment",
 								                "nlp = eng_spacysentiment.load()",
 								                "text = \"Welcome to Arsenals official YouTube channel Watch as we take you closer and show you the personality of the club\"",
 								                "doc = nlp(text)",
 								                "print(doc.cats)",
 								                "# {'positive': 0.29878824949264526, 'negative': 0.7012117505073547}"
 								            ],
 								            "thumb": "",
 								            "image": "",
 								            "code_language": "python",
 								            "author": "Vishnu Nandakumar",
 								            "author_links": {
 								                "github": "Vishnunkumar",
 								                "twitter": "vishnun_uchiha"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["pipeline", "nlp", "sentiment"]
-												add textnets to spaCy universe (#10216)

https://github.com/jboynyc/textnets/issues/38
											
										
										
											2022-02-09 09:04:26 +03:00
+								        },
 								        {
 								            "id": "textnets",
 								            "slogan": "Text analysis with networks",
 								            "description": "textnets represents collections of texts as networks of documents and words. This provides novel possibilities for the visualization and analysis of texts.",
 								            "github": "jboynyc/textnets",
 								            "image": "https://user-images.githubusercontent.com/2187261/152641425-6c0fb41c-b8e0-44fb-a52a-7c1ba24eba1e.png",
 								            "code_example": [
 								                "import textnets as tn",
 								                "",
 								                "corpus = tn.Corpus(tn.examples.moon_landing)",
 								                "t = tn.Textnet(corpus.tokenized(), min_docs=1)",
 								                "t.plot(label_nodes=True,",
 								                "       show_clusters=True,",
 								                "       scale_nodes_by=\"birank\",",
 								                "       scale_edges_by=\"weight\")"
 								            ],
 								            "author": "John Boy",
 								            "author_links": {
 								                "github": "jboynyc",
 								                "twitter": "jboy"
 								            },
 								            "category": ["visualizers", "standalone"]
-												add tmtoolkit package to spaCy universe (#10245)


											
										
										
											2022-02-14 09:16:43 +03:00
+								        },
 								        {
 								            "id": "tmtoolkit",
 								            "slogan": "Text mining and topic modeling toolkit",
 								            "description": "tmtoolkit is a set of tools for text mining and topic modeling with Python developed especially for the use in the social sciences, in journalism or related disciplines. It aims for easy installation, extensive documentation and a clear programming interface while offering good performance on large datasets by the means of vectorized operations (via NumPy) and parallel computation (using Python’s multiprocessing module and the loky package).",
 								            "github": "WZBSocialScienceCenter/tmtoolkit",
 								            "code_example": [
-												Add tmtoolkit setup steps

											
										
										
											2022-02-14 09:17:25 +03:00
+								                "# Note: This requires these setup steps:",
 								                "#   pip install tmtoolkit[recommended]",
 								                "#   python -m tmtoolkit setup en",
-												add tmtoolkit package to spaCy universe (#10245)


											
										
										
											2022-02-14 09:16:43 +03:00
+								                "from tmtoolkit.corpus import Corpus, tokens_table, lemmatize, to_lowercase, dtm",
 								                "from tmtoolkit.bow.bow_stats import tfidf, sorted_terms_table",
 								                "# load built-in sample dataset and use 4 worker processes",
 								                "corp = Corpus.from_builtin_corpus('en-News100', max_workers=4)",
 								                "# investigate corpus as dataframe",
 								                "toktbl = tokens_table(corp)",
 								                "print(toktbl)",
 								                "# apply some text normalization",
 								                "lemmatize(corp)",
 								                "to_lowercase(corp)",
 								                "# build sparse document-token matrix (DTM)",
 								                "# document labels identify rows, vocabulary tokens identify columns",
 								                "mat, doc_labels, vocab = dtm(corp, return_doc_labels=True, return_vocab=True)",
 								                "# apply tf-idf transformation to DTM",
 								                "# operation is applied on sparse matrix and uses few memory",
 								                "tfidf_mat = tfidf(mat)",
 								                "# show top 5 tokens per document ranked by tf-idf",
 								                "top_tokens = sorted_terms_table(tfidf_mat, vocab, doc_labels, top_n=5)",
 								                "print(top_tokens)"
 								            ],
 								            "author": "Markus Konrad / WZB Social Science Center",
 								            "author_links": {
 								                "github": "internaut",
 								                "twitter": "_knrd"
 								            },
 								            "category": ["scientific", "standalone"]
-												docs: add EDS-NLP to spaCy universe (#10489)

* docs: add EDS-NLP to spaCy universe

* fix: remove "standalone" tag for EDS-NLP

Co-authored-by: Basile Dura <basile.dura-ext@aphp.fr>
											
										
										
											2022-03-21 13:03:39 +03:00
+								        },
 								        {
 								            "id": "edsnlp",
 								            "title": "EDS-NLP",
 								            "slogan": "spaCy components to extract information from clinical notes written in French.",
 								            "description": "EDS-NLP provides a set of rule-based spaCy components to extract information for French clinical notes. It also features _qualifier_ pipelines that detect negations, speculations and family context, among other modalities. Check out the [demo](https://aphp.github.io/edsnlp/demo/)!",
 								            "github": "aphp/edsnlp",
 								            "pip": "edsnlp",
 								            "code_example": [
 								                "import spacy",
 								                "",
 								                "nlp = spacy.blank(\"fr\")",
 								                "",
 								                "terms = dict(",
 								                "    covid=[\"covid\", \"coronavirus\"],",
 								                ")",
 								                "",
 								                "# Sentencizer component, needed for negation detection",
 								                "nlp.add_pipe(\"eds.sentences\")",
 								                "# Matcher component",
 								                "nlp.add_pipe(\"eds.matcher\", config=dict(terms=terms))",
 								                "# Negation detection",
 								                "nlp.add_pipe(\"eds.negation\")",
 								                "",
 								                "# Process your text in one call !",
 								                "doc = nlp(\"Le patient est atteint de covid\")",
 								                "",
 								                "doc.ents",
 								                "# Out: (covid,)",
 								                "",
 								                "doc.ents[0]._.negation",
 								                "# Out: False"
 								            ],
 								            "code_language": "python",
 								            "url": "https://aphp.github.io/edsnlp/",
 								            "author": "AP-HP",
 								            "author_links": {
 								                "github": "aphp",
 								                "website": "https://github.com/aphp"
 								            },
 								            "category": ["biomedical", "scientific", "research", "pipeline"],
 								            "tags": ["clinical"]
-												Move sent-patterns to correct section of universe.json (#11192)


											
										
										
											2022-07-25 10:14:50 +03:00
+								        },
 								        {
 								            "id": "sent-pattern",
 								            "title": "English Interpretation Sentence Pattern",
 								            "slogan": "English interpretation for accurate translation from English to Japanese",
 								            "description": "This package categorizes English sentences into one of five basic sentence patterns and identifies the subject, verb, object, and other components. The five basic sentence patterns are based on C. T. Onions's Advanced English Syntax and are frequently used when teaching English in Japan.",
 								            "github": "lll-lll-lll-lll/sent-pattern",
 								            "pip": "sent-pattern",
 								            "author": "Shunpei Nakayama",
 								            "author_links": {
 								                "twitter": "ExZ79575296",
 								                "github": "lll-lll-lll-lll"
 								            },
 								            "category": ["pipeline"],
 								            "tags": ["interpretation", "ja"]
-												Add spacy-partial-tagger to spaCy Universe (#11538)


											
										
										
											2022-09-27 15:11:50 +03:00
+								        },
 								        {
 								            "id": "spacy-partial-tagger",
 								            "title": "spaCy - Partial Tagger",
 								            "slogan": "Sequence Tagger for Partially Annotated Dataset in spaCy",
 								            "description": "This is a library to build a CRF tagger with a partially annotated dataset in spaCy. You can build your own tagger only from dictionary.",
 								            "github": "doccano/spacy-partial-tagger",
 								            "pip": "spacy-partial-tagger",
 								            "category": ["pipeline", "training"],
 								            "author": "Yasufumi Taniguchi",
 								            "author_links": {
 								                "github": "yasufumy"
 								            }
-												Add spacy-pythainlp (#12038)

* Add spacy-pythainlp

* Move submission to right section

* Minor cleanup

* Remove extra list call

* Update universe.json

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2023-01-03 11:03:59 +03:00
+								        },
 								        {
 								            "id": "spacy-pythainlp",
 								            "title": "spaCy-PyThaiNLP",
 								            "slogan": "PyThaiNLP for spaCy",
 								            "description": "This package wraps the PyThaiNLP library to add support for Thai to spaCy.",
 								            "github": "PyThaiNLP/spaCy-PyThaiNLP",
 								            "code_example": [
 								                "import spacy",
 								                "import spacy_pythainlp.core",
 								                "",
 								                "nlp = spacy.blank('th')",
 								                "nlp.add_pipe('pythainlp')",
 								                "doc = nlp('ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  ผมอยากไปเที่ยว')",
 								                "",
 								                "print(list(doc.sents))",
 								                "# output: [ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  , ผมอยากไปเที่ยว]"
 								            ],
 								            "code_language": "python",
 								            "author": "Wannaphong Phatthiyaphaibun",
 								            "author_links": {
 								                "twitter": "@wannaphong_p",
 								                "github": "wannaphong",
 								                "website": "https://iam.wannaphong.com/"
 								            },
 								            "category": ["pipeline", "research"],
 								            "tags": ["Thai"]
-												[DOCS] add vetiver to spacy universe (#12557)

* add vetiver to spacy universe

* remove image

* update logo to render correctly in thumbnail

* apply Basil's suggestion

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* refer to the same model

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
											
										
										
											2023-06-01 18:11:18 +03:00
+								        },
 								        {
 								            "id": "vetiver",
 								            "title": "Vetiver",
 								            "slogan": "Version, share, deploy, and monitor models.",
 								            "description": "The goal of vetiver is to provide fluent tooling to version, deploy, and monitor a trained model. Functions handle creating model objects, versioning models, predicting from a remote API endpoint, deploying Dockerfiles, and more.",
 								            "github": "rstudio/vetiver-python",
 								            "pip": "vetiver",
 								            "code_example": [
 								                "import spacy",
 								                "from vetiver import VetiverModel, VetiverAPI",
 								                "",
 								                "# If you use this model, you'll need to download it first:",
 								                "# python -m spacy download en_core_web_md",
 								                "nlp = spacy.load('en_core_web_md')",
 								                "# Create deployable model object with your nlp Language object",
 								                "v = VetiverModel(nlp, model_name = 'my_model')",
 								                "# Try out your API endpoint locally",
 								                "VetiverAPI(v).run()"
 								            ],
 								            "code_language": "python",
 								            "url": "https://vetiver.rstudio.com/",
 								            "thumb": "https://raw.githubusercontent.com/rstudio/vetiver-python/main/docs/figures/square-logo.svg",
 								            "author": "Posit, PBC",
 								            "author_links": {
 								                "twitter": "posit_pbc",
 								                "github": "rstudio",
 								                "website": "https://posit.co/"
 								            },
 								            "category": ["apis", "standalone"],
 								            "tags": ["apis", "deployment"]
-												Add SpanMarker for NER to spaCy universe (#12730)

* Add SpanMarker for NER to spaCy universe

* Escape the newlines in the text in the code example

Or at least, attempt to

* Remove now unnecessary import

* Disable NER pipeline component in code example
											
										
										
											2023-06-20 17:47:44 +03:00
+								        },
 								        {
 								            "id": "span_marker",
 								            "title": "SpanMarker",
 								            "slogan": "Effortless state-of-the-art NER in spaCy",
 								            "description": "The SpanMarker integration with spaCy allows you to seamlessly replace the default spaCy `\"ner\"` pipeline component with any [SpanMarker model available on the Hugging Face Hub](https://huggingface.co/models?library=span-marker). Through this, you can take advantage of the advanced Named Entity Recognition capabilities of SpanMarker within the familiar and powerful spaCy framework.\n\nBy default, the `span_marker` pipeline component uses a [SpanMarker model using RoBERTa-large trained on OntoNotes v5.0](https://huggingface.co/tomaarsen/span-marker-roberta-large-ontonotes5). This model reaches a competitive 91.54 F1, notably higher than the [85.5 and 89.8 F1](https://spacy.io/usage/facts-figures#section-benchmarks) from `en_core_web_lg` and `en_core_web_trf`, respectively. A short head-to-head between this SpanMarker model and the `trf` spaCy model has been posted [here](https://github.com/tomaarsen/SpanMarkerNER/pull/12).\n\nAdditionally, see [here](https://tomaarsen.github.io/SpanMarkerNER/notebooks/spacy_integration.html) for documentation on using SpanMarker with spaCy.",
 								            "github": "tomaarsen/SpanMarkerNER",
 								            "pip": "span_marker",
 								            "code_example": [
 								                "import spacy",
 								                "",
-												Use 'exclude' instead of 'disable' (#12783)

as suggested by @svlandeg
											
										
										
											2023-07-04 12:45:13 +03:00
+								                "nlp = spacy.load(\"en_core_web_sm\", exclude=[\"ner\"])",
-												Add SpanMarker for NER to spaCy universe (#12730)

* Add SpanMarker for NER to spaCy universe

* Escape the newlines in the text in the code example

Or at least, attempt to

* Remove now unnecessary import

* Disable NER pipeline component in code example
											
										
										
											2023-06-20 17:47:44 +03:00
+								                "nlp.add_pipe(\"span_marker\", config={\"model\": \"tomaarsen/span-marker-roberta-large-ontonotes5\"})",
 								                "",
 								                "text = \"\"\"Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the \\",
 								                "Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her \\",
 								                "death in 30 BCE.\"\"\"",
 								                "doc = nlp(text)",
 								                "print([(entity, entity.label_) for entity in doc.ents])",
 								                "# [(Cleopatra VII, \"PERSON\"), (Cleopatra the Great, \"PERSON\"), (the Ptolemaic Kingdom of Egypt, \"GPE\"),",
 								                "# (69 BCE, \"DATE\"), (Egypt, \"GPE\"), (51 BCE, \"DATE\"), (30 BCE, \"DATE\")]"
 								            ],
 								            "code_language": "python",
 								            "url": "https://tomaarsen.github.io/SpanMarkerNER",
 								            "author": "Tom Aarsen",
 								            "author_links": {
 								                "github": "tomaarsen",
 								                "website": "https://www.linkedin.com/in/tomaarsen"
 								            },
 								            "category": ["pipeline", "standalone", "scientific"],
 								            "tags": ["ner"]
-												Update universe.json (#12904)

* Update universe.json

added hobbit-spacy to the universe json

* Update universe.json

removed displacy from hobbit-spacy and added a default text.
											
										
										
											2023-08-14 17:44:14 +03:00
+								        },
 								        {
 								            "id": "hobbit-spacy",
 								            "title": "Hobbit spaCy",
 								            "slogan": "NLP for Middle Earth",
 								            "description": "Hobbit spaCy is a custom spaCy pipeline designed specifically for working with Middle Earth and texts from the world of J.R.R. Tolkien.",
 								            "github": "wjbmattingly/hobbit-spacy",
 								            "pip": "en-hobbit",
 								            "code_example": [
 								                "import spacy",
 								                "",
 								                "nlp = spacy.load('en_hobbit')",
 								                "doc = nlp('Frodo saw Glorfindel and Glóin; and in a corner alone Strider was sitting, clad in his old travel - worn clothes again')"
 								            ],
 								            "code_language": "python",
 								            "thumb": "https://github.com/wjbmattingly/hobbit-spacy/blob/main/images/hobbit-thumbnail.png?raw=true",
 								            "image": "https://github.com/wjbmattingly/hobbit-spacy/raw/main/images/hobbitspacy.png",
 								            "author": "W.J.B. Mattingly",
 								            "author_links": {
 								                "twitter": "wjb_mattingly",
 								                "github": "wjbmattingly",
 								                "website": "https://wjbmattingly.com"
 								            },
 								            "category": ["pipeline", "standalone"],
 								            "tags": ["spans", "rules", "ner"]
-												Adding rolegal model to the spaCy universe (#13017)

* adding rolegal model to the spaCy universe

* Fix formatting

* Use raw URL

* update image url and example

* fix pip and update url to raw

* okay, let's add thumb instead of image :octopus:

* Update website/meta/universe.json

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
											
										
										
											2023-09-28 12:06:50 +03:00
+								        },
 								        {
 								            "id": "rolegal",
 								            "title": "A spaCy Package for Romanian Legal Document Processing",
 								            "thumb": "https://raw.githubusercontent.com/senisioi/rolegal/main/img/paper200x200.jpeg",
 								            "slogan": "rolegal: a spaCy Package for Noisy Romanian Legal Document Processing",
 								            "description": "This is a spaCy language model for Romanian legal domain trained with floret 4-gram to 5-gram embeddings and `LEGAL` entity recognition. Useful for processing OCR-resulted noisy legal documents.",
 								            "github": "senisioi/rolegal",
 								            "pip": "ro-legal-fl",
 								            "tags": ["legal", "floret", "ner", "romanian"],
 								            "code_example": [
 								                "import spacy",
 								                "nlp = spacy.load(\"ro_legal_fl\")",
 								                "",
 								                "doc = nlp(\"Titlul III din LEGEA nr. 255 din 19 iulie 2013, publicată în MONITORUL OFICIAL\")",
 								                "# legal entity identification",
 								                "for entity in doc.ents:",
 								                "    print('entity: ', entity, '; entity type: ', entity.label_)",
 								                "",
 								                "# floret n-gram embeddings robust to typos",
 								                "print(nlp('achizit1e public@').similarity(nlp('achiziții publice')))",
 								                "# 0.7393895566928835",
 								                "print(nlp('achizitii publice').similarity(nlp('achiziții publice')))",
 								                "# 0.8996480808279399"
 								            ],
 								            "author": "Sergiu Nisioi",
 								            "author_links": {
 								                "github": "senisioi",
 								                "website": "https://nlp.unibuc.ro/people/snisioi.html"
 								            },
 								            "category": ["pipeline", "training", "models"]
-												Add WordDumb to spaCy Universe (#9572)

* Add WordDumb to spaCy Universe

* Add standalone category

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
											
										
										
											2021-11-01 12:38:41 +03:00
+								        }
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								    ],
-												Adjust wording and formatting [ci skip]

											
										
										
											2019-05-03 13:00:31 +03:00
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								    "categories": [
 								        {
 								            "label": "Projects",
 								            "items": [
 								                {
 								                    "id": "pipeline",
 								                    "title": "Pipeline",
 								                    "description": "Custom pipeline components and extensions"
 								                },
 								                {
 								                    "id": "training",
 								                    "title": "Training",
 								                    "description": "Helpers and toolkits for training spaCy models"
 								                },
 								                {
 								                    "id": "conversational",
 								                    "title": "Conversational",
 								                    "description": "Frameworks and utilities for working with conversational text, e.g. for chat bots"
 								                },
 								                {
 								                    "id": "research",
 								                    "title": "Research",
 								                    "description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
 								                },
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								                {
 								                    "id": "scientific",
 								                    "title": "Scientific",
 								                    "description": "Frameworks and utilities for scientific text processing"
 								                },
-												Corrected broken (#9505)


											
										
										
											2021-10-20 18:31:59 +03:00
+								                {
 								                    "id": "biomedical",
 								                    "title": "Biomedical",
 								                    "description": "Frameworks and utilities for processing biomedical text"
 								                },
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								                {
 								                    "id": "visualizers",
 								                    "title": "Visualizers",
 								                    "description": "Demos and tools to visualize NLP annotations or systems"
 								                },
 								                {
 								                    "id": "apis",
 								                    "title": "Containers & APIs",
 								                    "description": "Infrastructure tools for managing or deploying spaCy"
 								                },
 								                {
 								                    "id": "nonpython",
 								                    "title": "Non-Python",
 								                    "description": "Wrappers, bindings and implementations in other programming languages"
 								                },
 								                {
 								                    "id": "standalone",
 								                    "title": "Standalone",
 								                    "description": "Self-contained libraries or tools that use spaCy under the hood"
-												Update universe [ci skip]

											
										
										
											2019-06-02 13:58:12 +03:00
+								                },
 								                {
 								                    "id": "models",
 								                    "title": "Models",
-												Use consistent spelling

											
										
										
											2019-10-02 11:37:39 +03:00
+								                    "description": "Third-party pretrained models for different languages and domains"
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								                }
 								            ]
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        },
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								        {
 								            "label": "Education",
 								            "items": [
 								                {
 								                    "id": "books",
 								                    "title": "Books",
 								                    "description": "Books about or featuring spaCy"
 								                },
 								                {
 								                    "id": "courses",
 								                    "title": "Courses",
 								                    "description": "Online courses and interactive tutorials"
 								                },
 								                {
 								                    "id": "videos",
 								                    "title": "Videos",
 								                    "description": "Talks and tutorials in video format"
 								                },
 								                {
 								                    "id": "podcasts",
 								                    "title": "Podcasts",
 								                    "description": "Episodes about spaCy or interviews with the spaCy team"
 								                }
 								            ]
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 03:06:46 +03:00
+								        }
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								    ]
-												Auto-format [ci skip]

											
										
										
											2021-04-22 03:58:05 +03:00
+								}