Adding LingFeat Software to spaCy Universe. (#9574)

* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
This commit is contained in:
Bruce W. Lee (이웅성) 2021-11-01 18:38:14 +09:00 committed by GitHub
parent 5279c7c4ba
commit a4dcb68cf6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -3343,6 +3343,65 @@
"category": ["research", "standalone", "scientific"],
"tags": ["Text Analytics", "Coherence", "Cohesion"]
},
{
"id": "lingfeat",
"title": "LingFeat",
"slogan": "A Linguistic Feature Extraction (Text Analysis) Tool for Readability Assessment and Text Simplification",
"description": "LingFeat is a feature extraction library which currently extracts 255 linguistic features from English string input. Categories include syntax, semantics, discourse, and also traditional readability formulas. Published in EMNLP 2021.",
"github": "brucewlee/lingfeat",
"pip": "lingfeat",
"code_example": [
"from lingfeat import extractor",
"",
"",
"text = 'TAEAN, South Chungcheong Province -- Just before sunup, Lee Young-ho, a seasoned fisherman with over 30 years of experience, silently waits for boats carrying blue crabs as the season for the seafood reaches its height. Soon afterward, small and big boats sail into Sinjin Port in Taean County, South Chungcheong Province, the second-largest source of blue crab after Incheon, accounting for 29 percent of total production of the country. A crane lifts 28 boxes filled with blue crabs weighing 40 kilograms each from the boat, worth about 10 million won ($8,500). “It has been a productive fall season for crabbing here. The water temperature is a very important factor affecting crab production. They hate cold water,” Lee said. The temperature of the sea off Taean appeared to have stayed at the level where crabs become active. If the sea temperature suddenly drops, crabs go into their winter dormancy mode, burrowing into the mud and sleeping through the cold months.'",
"",
"",
"#Pass text",
"LingFeat = extractor.pass_text(text)",
"",
"",
"#Preprocess text",
"LingFeat.preprocess()",
"",
"",
"#Extract features",
"#each method returns a dictionary of the corresponding features",
"#Advanced Semantic (AdSem) Features",
"WoKF = LingFeat.WoKF_() #Wikipedia Knowledge Features",
"WBKF = LingFeat.WBKF_() #WeeBit Corpus Knowledge Features",
"OSKF = LingFeat.OSKF_() #OneStopEng Corpus Knowledge Features",
"",
"#Discourse (Disco) Features",
"EnDF = LingFeat.EnDF_() #Entity Density Features",
"EnGF = LingFeat.EnGF_() #Entity Grid Features",
"",
"#Syntactic (Synta) Features",
"PhrF = LingFeat.PhrF_() #Noun/Verb/Adj/Adv/... Phrasal Features",
"TrSF = LingFeat.TrSF_() #(Parse) Tree Structural Features",
"POSF = LingFeat.POSF_() #Noun/Verb/Adj/Adv/... Part-of-Speech Features",
"",
"#Lexico Semantic (LxSem) Features",
"TTRF = LingFeat.TTRF_() #Type Token Ratio Features",
"VarF = LingFeat.VarF_() #Noun/Verb/Adj/Adv Variation Features",
"PsyF = LingFeat.PsyF_() #Psycholinguistic Difficulty of Words (AoA Kuperman)",
"WoLF = LingFeat.WorF_() #Word Familiarity from Frequency Count (SubtlexUS)",
"",
"Shallow Traditional (ShTra) Features",
"ShaF = LingFeat.ShaF_() #Shallow Features (e.g. avg number of tokens)",
"TraF = LingFeat.TraF_() #Traditional Formulas"
],
"code_language": "python",
"thumb": "https://raw.githubusercontent.com/brucewlee/lingfeat/master/img/lingfeat_logo2.png",
"image": "https://raw.githubusercontent.com/brucewlee/lingfeat/master/img/lingfeat_logo.png",
"author": "Bruce W. Lee (이웅성)",
"author_links": {
"github": "brucewlee",
"website": "https://brucewlee.github.io/"
},
"category": ["research", "scientific"],
"tags": ["Readability", "Simplification", "Feature Extraction", "Syntax", "Discourse", "Semantics", "Lexical"]
},
{
"id": "hmrb",
"title": "Hammurabi",