PATENTSCOPEはクエリをくんだ結果をRSSで出力できるので
色々解析に使えるだろうと思われるのですが
いかんせん修行不足の身、最近読んでる本を参考に
タイトル、番号、リンク、サマリを出すスクリプトを書いてみました。
エンコードでエラー出まくりだったので全部uft_8にエンコードしたのですがこれほんとにいいのか不明。
取りあえずPCT&PDE4 @ FPをクエリにして
pde4.txtに
http://patentscope.wipo.int/search/rss.jsf?query=FP%3APDE4+&office=+%28OF%3Awo%29&rss=true&sortOption=Pub+Date+Desc
を保存しときます。
で
patentparser.pyを書いて
#! /usr/bin/python # -*- coding: utf_8 -*- import re import feedparser import sys feedlistfile = open(sys.argv[1],'r') feeds = [feedparser.parse(url) for url in feedlistfile] for feed in feeds: for entry in feed.entries: url=entry["link"] num = re.search("WO\d*",url) patent_no = num.group(0) title = entry["title"] summary = entry["summary"] link = entry["link"] published = entry["published"] print "PATENT_NO\t"+patent_no print "LINK\t"+link print "PUBLISHED\t"+published.encode('utf_8') print "TITLE\t"+title.encode('utf_8') print "SUMMARY\t"+summary.encode('utf_8') print "="*50 print " "
取りあえず動く。
リダイレクションをファイルにして
$ python patentparser.py pde4.txt > out.txt $ cat out.txt PATENT_NO WO2012149251 LINK http://patentscope.wipo.int/search/en/detail.jsf?docId=WO2012149251&recNum=1&docAn=US2012035359&queryString=FP:PDE4 &maxRec=298 PUBLISHED Fri, 02 Nov 2012 00:00:00 CET TITLE METHODS AND COMPOSITIONS USING PDE4 INHIBITORS FOR THE TREATMENT AND MANAGEMENT OF AUTOIMMUNE AND INFLAMMATORY DISEASES SUMMARY Methods of treating, preventing, or managing autoimmune inflammatory diseases and disorders including but not limited to spondylitis, juvenile rheumatoid arthritis, psoriasis, psoriatic arthritis, osteoarthritis, ankylosing spondylitis, and rheumatoid arthritis by the administration of phosphodiesterase 4 (PDE4) inhibitors in combination with other therapeutics are disclosed. Pharmaceutical compositions, dosage forms, and kits suitable for use in methods of the invention are also disclosed. ================================================== PATENT_NO WO2012110946 LINK http://patentscope.wipo.int/search/en/detail.jsf?docId=WO2012110946&recNum=2&docAn=IB2012050657&queryString=FP:PDE4 &maxRec=298 PUBLISHED Fri, 24 Aug 2012 00:00:00 CEST TITLE PHARMACEUTICAL COMPOSITION COMPRISING THE PDE4 ENZYME INHIBITOR REVAMILAST AND A DISEASE MODIFYING AGENT, PREFERABLY METHOTREXATE SUMMARY The present patent application relates to a pharmaceutical composition comprising a PDE4 enzyme inhibitor and a disease modifying agent; a process for preparing such composition; and its use in treating an autoimmune disease in a subject. ================================================== PATENT_NO WO2012098495 LINK http://patentscope.wipo.int/search/en/detail.jsf?docId=WO2012098495&recNum=3&docAn=IB2012050215&queryString=FP:PDE4 &maxRec=298 PUBLISHED Fri, 27 Jul 2012 00:00:00 CEST TITLE PHARMACEUTICAL COMPOSITION THAT INCLUDES REVAMILAST AND A BETA-2 AGONIST SUMMARY The present patent application relates to a pharmaceutical composition that includes a PDE4 enzyme inhibitor, namely revamilast, and a beta- 2 adrenergic receptor agonist; a process for preparing such a composition; and its use in treating a respiratory disorder in a subject. ==================================================
ということで出力先をもう少し整形しようと思う。
プロキシハンドラも入れないとだめですね。