GeneDetector

Discription

Get the gene name that related to your interested disease from PubMed Abstract.

package Used

GoFrame is an application development framework of Golang.
html implements an HTML5-compliant tokenizer and parser.
htmlquery supports HTML document query.
XPath is Go package provides selecting nodes from HTML or other documents using XPath expression.
regexp2 is a regex engine in pure Go based on the .NET engine
csv is a package related to read and write csv file\

How to run the project

Type in disease name and abstract number, and you will get disease related gene name and other paper information in a csv file. Default disease name is Alzheimer's and default abstract number is 10.

$ go build
$ ./GeneDetector

OR

$ go build
$ ./GeneDetector -disease diabetes -n 20

Expected output

A CSV file with the related gene symbol. There are other information in this csv file, including: paper title, url, abstract content, gene name, pmid, doi, keyword(disease name)

Changes

The main strategy to get the related gene name is use regular expression to match the gene symbol. I didn't use text mining strategies to get the gene name due to my limit knowledge background. But I do spend time to get to know more about the knowledge of text mining. I found that Named-entity recognition(NER) would be a good strategy for the next step of this project.

Acknowledge

Thanks for Robin L. having good discussions about the tips of web scrap!
Thanks for Professor Carl Kingsford importing all the great knowledge about golang!
Thanks for TA Siddharth Reed for giving me lots of help when I found this project was really hard to go on!
Thanks for TA Jingjing Tang for grading and giving feedback!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
build		build
config		config
csv		csv
log		log
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Alzheimer's_2021128_214141.csv		Alzheimer's_2021128_214141.csv
Alzheimer's_2021128_215753.csv		Alzheimer's_2021128_215753.csv
GeneDetector		GeneDetector
GeneDetector Poster.pptx		GeneDetector Poster.pptx
GeneDetector.tgz		GeneDetector.tgz
README.md		README.md
diabetes_2021126_131925.csv		diabetes_2021126_131925.csv
downloader.go		downloader.go
downloader2.go		downloader2.go
go.mod		go.mod
go.sum		go.sum
lijiayi-progress.pdf		lijiayi-progress.pdf
main.go		main.go
parser.go		parser.go
parser2.go		parser2.go
~$GeneDetector Poster.pptx		~$GeneDetector Poster.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeneDetector

Discription

package Used

How to run the project

Expected output

Changes

Acknowledge

About

Releases

Packages

Languages

JiayiJennie/GeneDetector

Folders and files

Latest commit

History

Repository files navigation

GeneDetector

Discription

package Used

How to run the project

Expected output

Changes

Acknowledge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages