1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
|
import time import requests import re import os.path import pickle import logging import sys from importlib import reload from datetime import datetime from bs4 import BeautifulSoup from pyinstapaper.instapaper import Instapaper, Folder, Bookmark
INSTAPAPER_KEY = '*****************' INSTAPAPER_SECRET = '**************' INSTAPAPER_LOGIN = '[email protected]' INSTAPAPER_PASSWORD = 'password'
base_url = 'http://m.uuks.org' novel_list = [ "苏厨", "绍宋", "从 1983 开始", "全职艺术家", "大明镇海王", "奶爸学园" ] novel_url = [ '392_392855', '511_511286', '441_441819', '32802', '519_519302', "33250" ]
instapaper = Instapaper(INSTAPAPER_KEY, INSTAPAPER_SECRET) instapaper.login(INSTAPAPER_LOGIN, INSTAPAPER_PASSWORD)
def add_bookmark_instapaper(title, url): data = { 'time': time.time(), 'progress_timestamp': 0, 'title': title, 'url': url } bookmark = Bookmark(instapaper, **data) bookmark.add()
def fetch_novel(novel_list, novel_url): print('job running') if os.path.isfile('url.pkl'): with open('url.pkl', 'rb') as f: last_url = pickle.load(f) f.close() else: last_url = [[]] * len(novel_list)
url_archive = [] for j in range(0, len(novel_list)): print(novel_list[j]) try: old_url = last_url[j] except IndexError: old_url = [] url = base_url + '/b/' + novel_url[j] + '/' head = {} head[ 'User-Agent'] = 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Safari/535.19' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') soup_text = soup.select( '#book-sp > div.listbox > div.list_cont > div.book-detial > div.ml-list > ul a' ) latest_url = [] latest_title = [] for i in range(0, len(soup_text)):
latest_url.append(base_url + soup_text[i]['href']) latest_title.append(novel_list[j] + '---' + soup_text[i].string)
for k in range(len(latest_url)-1, -1, -1): if latest_url[k] in old_url: continue print(latest_title[k], latest_url[k]) add_bookmark_instapaper(latest_title[k], latest_url[k])
old_url = latest_url url_archive.append(old_url)
with open('url.pkl', 'wb') as f: pickle.dump(url_archive, f) f.close()
bookmarks = instapaper.get_bookmarks('unread') for ct, bookmark in enumerate(bookmarks): print(bookmark.title) bookmark.archive() bookmark.delete()
fetch_novel(novel_list, novel_url)
|