[LaTex] 根据剪贴板的文章标题从谷歌拉取bibtex引用

发布于 2023-11-26  283 次阅读


Please refresh the page if equations are not rendered correctly.
---------------------------------------------------------------

第二次更新

对代码进行改造:
- 增加浏览器标识伪装,避免403错误;
- 输出的引用会直接保存到剪贴板,运行后直接粘贴即可,无需再复制console的输出;

还需要使用几次后验证。

import requests
from bs4 import BeautifulSoup
import pyperclip as pc
import webbrowser

paper = pc.paste()
base_url = "https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=" + paper + "&btnG= "

header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
}
googleSearch = requests.request("GET", url=base_url, headers=header)

bs_page = BeautifulSoup(googleSearch.content, "html.parser")
block = bs_page.find("div", {"class": "gs_ri"})
title = block.find("h3")
link = title.find("a")
citation_id = link["id"]

cite_url = "https://scholar.google.de/scholar?hl=de&q=info:" + citation_id + ":scholar.google.com/&output=cite&scirp=0"

# webbrowser.open(cite_url)  # open citation page in browser

findLatex = requests.request("GET", url=cite_url, headers=header)

citation_view = BeautifulSoup(findLatex.content, "html.parser")
latex_link = citation_view.find("div", {"id": "gs_citi"})

latex_mf = latex_link.findChildren("a")[0]["href"]

result = BeautifulSoup(requests.request("GET", url=latex_mf, headers=header).content, "html.parser")
citation = result.text
pc.copy(citation)
print(citation)

# copy citation to clipboard
pc.copy(citation)
print("Citation copied to clipboard!")
print("Done!")

根据How to Web Scrape using Beautiful Soup in Python without running into HTTP error 403 | by Raiyan Quaium | Medium对代码进行改造。

第一次更新

这是个坑,用几次之后估计谷歌会认为是爬虫,将ip禁掉:

切记勿用!!!

首次发布内容

代码来自Kildrese/scholarBibTex
运行代码后会自动根据电脑剪贴板的文章标题从谷歌拉取bibtex引用。本想自己写个,居然一下子就搜到了。做个记录,感谢作者,都是懒人。

运行方式:
- 复制文章标题
- 运行程序
- 输出bibtex引用

import requests
from bs4 import BeautifulSoup
import pyperclip as pc

paper = pc.paste()
base_url = "https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=" + paper + "&btnG= "

googleSearch = requests.request("GET", url=base_url)

bs_page = BeautifulSoup(googleSearch.content, "html.parser")
block = bs_page.find("div", {"class": "gs_ri"})
title = block.find("h3")
link = title.find("a")
citation_id = link["id"]

cite_url = "https://scholar.google.de/scholar?hl=de&q=info:" + citation_id + ":scholar.google.com/&output=cite&scirp=0"

findLatex = requests.request("GET", url=cite_url)

citation_view = BeautifulSoup(findLatex.content, "html.parser")
latex_link = citation_view.find("div", {"id": "gs_citi"})

latex_mf = latex_link.findChildren("a")[0]["href"]

result = BeautifulSoup(requests.request("GET", url=latex_mf).content, "html.parser")
citation = result.text
pc.copy(citation)
print(citation)
Everything not saved will be lost.
最后更新于 2023-11-26