fun-sci @ ウィキ

fun-sci @ ウィキ http://w.atwiki.jp/fun-sci/ fun-sci @ ウィキ ja 2007-05-27T21:15:44+09:00 1180268144 SASってさ https://w.atwiki.jp/fun-sci/pages/26.html SASを使うユーザーがどれくらいいるのか実際のところどうなのか聞いて見たい人は多いのではないだろうか？自分は、SASにはいろいろな疑問を持っている。どっちかというと、クライアントで作業をすることの多い自分なのでサーバーベースで大規模システムにあわせているであろうSASにどうしても窮屈を感じてしまうのです。・今更、なんでラインベースのデータ更新なの？と、思ってしまうのですよ。ファイル読み込みとか、大規模計算ならとにかく何でもかんでもラインベースの仕組みで処理させるのはいただけない。 arrayなんか作っても、所詮作りこまれた配列システムにはかないません。・なんでサブルーチンを作らせないの？とにかく、データステップに関数を作らせてくださいという感じです。今更、マクロとかLINKとかの制御ジャンプ構文だけ（しかもネスト制限あり）というのはどうかと。・IOMの仕様サポートページがわかりづらいうえに、SAS IOMに渡すSASコードが純粋なコードでないと動かない（らしい）というのはいただけない。ファイルでコードを開いて直接投げ込むようなコードを書きたいときは、いろいろ前処理が必要でリファクタリングしにくいと思う。 2007-05-27T21:15:44+09:00 1180268144 Rのオブジェクトでsummaryできない！ https://w.atwiki.jp/fun-sci/pages/25.html そうそう、RobjをPython側でリスト、辞書で受け取ると Rpyがうまく変換してくれないと、正しく実行してくれない。特に、R特有のクラスである場合は困ってしまう。悩みます。たとえば、 http://www21.atwiki.jp/fun-sci/pages/23.html で出力される"ret"データは r.summary(ret)としても、数値入り辞書と解釈されて [[' 3', '-none-', 'numeric'], [' 11', '-none-', 'list'], [' 1', '-none-', 'num eric'], ['100', '-none-', 'numeric'], [' 3', '-none-', 'call'], ['100', '-none- ', 'numeric'], [' 1', '-none-', 'logical'], [' 3', 'formula', 'call'], [' 1', '-none-', 'logical'], [' 1', '-none-', 'character'], [' 3', 'terms', 'call'], [' 0', '-none-', 'NULL'], [' 3', '-none-', 'list'], [' 4', '-none-', 'numeri c'], [' 0', '-none-', 'NULL'], [' 0', '-none-', 'NULL'], [' 4', '-none-', 'li st'], [' 1', '-none-', 'numeric'], ['100', '-none-', 'numeric'], [' 1', '-none -', 'numeric'], [' 5', '-none-', 'list'], [' 3', '-none-', 'numeric'], [' 1', '-none-', 'numeric'], [' 1', '-none-', 'numeric'], [' 1', '-none-', 'numeric' ], ['100', '-none-', 'numeric'], ['100', '-none-', 2007-05-27T20:45:35+09:00 1180266335 S-Plusによる統計解析のデータをPythonへ https://w.atwiki.jp/fun-sci/pages/24.html ＊S-Plusによる統計解析のデータをPythonへ RはS-Plusのクローンなので「S-Plusによる統計解析」を参考にしている人が多いのでは？この本に書いてあるデータはlibraryとして提供されているので勉強するのに使いやすい。せっかくなので、Pythonでも利用してみたいなんてときはこんな感じでデータの抽出、ファイルへの書き出しができます。 from rpy import * r("library(MASS)") dat = [] # データをリスト中に取り込む dat = r("Aids2") # ファイルへの書き込み myf = open("Aids2.txt","w") myf.write(str(dat)) myf.close() 2007-05-24T00:16:48+09:00 1179933408 rpy glm覚え書き https://w.atwiki.jp/fun-sci/pages/23.html テストデータでscipy、rpyのglmの使い方を勉強してみる。出展はハーバード大学講義テキスト生物統計学入門のChapter 19 データは公開されているものなのでOKかとコードはこんな感じ from scipy import * from rpy import * import csv # low_infants.txtは下の「データ」のテキストファイル rd1 = csv.reader(file("low_infants.txt")) low_inf = [] for line in rd1: low_inf.append(line) # この時点では文字列のarray low_infa = array(low_inf) # 文字列を64精度の数値に変換 low_infa = low_infa.astype(float64) x1=low_infa[:,4].tolist() x2=low_infa[:,2].tolist() y=low_inf[:,1].tolist() ret = r.glm(r("y~x1+x2"),data=r.data_frame(x1=x1,x2=x2,y=y,family='gaussian')) ret['coefficients'] 係数は本に載っていたように出てきた。一応、うまく行ってる？＊＊　データ 27,41,29,1360,37,0 29,40,31,1490,34,0 30,38,33,1490,32,0 28,38,31,1180,37,0 29,38,30,1200,29,1 23,32,25,680,19,0 22,33,27,620,20,1 26,38,29,1060,25,0 27,30,28,1320,27,0 25,34,29,830,32,1 23,32,26,880,26,0 26,39,30,1130,29,0 27,38,29,1140,24,0 27,39,29,1350,26,0 26,37,29,950,25,0 27,39,29,1220,25,0 2007-05-24T23:48:20+09:00 1180018100 R関連 https://w.atwiki.jp/fun-sci/pages/22.html ＊＊R関連（といいつつrpyばっか） -[[rpyとgnuplot覚え書き>http://www21.atwiki.jp/fun-sci/pages/21.html]] -[[rpy glm覚え書き>http://www21.atwiki.jp/fun-sci/pages/23.html]] -[[「S-Plusによる統計解析」のデータをPythonへ>http://www21.atwiki.jp/fun-sci/pages/24.html]] -[[Rのオブジェクトでsummaryできない！>http://www21.atwiki.jp/fun-sci/pages/25.html]] 2007-05-27T20:48:47+09:00 1180266527 rpyとgnuplot覚え書き https://w.atwiki.jp/fun-sci/pages/21.html Rpyを使ってRに線形回帰（単なる直線回帰ですが）をさせる方法を説明しているページ http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/ のplot部分をGnuplot.pyを使ってGnuplotで書いてみる。 from rpy import r import Gnuplot my_x = [5.05, 6.75, 3.21, 2.66] my_y = [1.65, 26.5, -5.93, 7.96] ls_fit = r.lsfit(my_x,my_y) gradient = ls_fit['coefficients']['X'] yintercept= ls_fit['coefficients']['Intercept'] #ここから g1 = Gnuplot.Gnuplot() g1.plot(zip(mx_x,my_y)) # sに直線の式のTemplateを作成 s = Template('$grad * x + ($itspt)') form1 = s.substitute(grad=gradiant,itspt=yintercept) '5.3935773612 * x + (-16.2811279931)' g1.plot(form1) 2007-05-21T22:26:17+09:00 1179753977 更新ですよ https://w.atwiki.jp/fun-sci/pages/20.html フォルダ監視をする必要があるときは大抵、time?かなにかで時限式に ls -la をgrepして判定するんですがこれができるのはUNIX系だけです。 WINDOWSは調べるのめんどそおで、Pythonで組んでみる簡単なやつでファイル構成が変わるか最終更新日が変わったかで見るものです。 import os,sys,time,pprint from stat import * class touched: 　　def watching(self,dirpath): 　　　　　　　　dir1 = self.dirstat(dirpath) 　　　　　　　　while 1: 　　　　　　　　　　　　time.sleep(5) 　　　　　　　　　　try: 　　　　　　　　dir2 = self.dirstat(dirpath) 　　　　　　　　self.checkdirf(dir1,dir2) 　　　　　　　　self.checkdirs(dir1,dir2) 　　　　　　　　print "next ... " 　　　　　　except Exception,e: 　　　　　　　　print e 　　　　　　　　　　　　　　dir1 = dir2[:] 　　　　　　def checkdirf(self,dir1,dir2): 　　　　　　　　if dir1[0] != dir2[0]: 　　　　　　raise Exception("FILE_DIFF") 　　　　　　def checkdirs(self,dir1,dir2): 　　　　　　　　if len(dir1[0]) == 0 or len(dir2[0]) == 0: 　　　　　　raise Exception("NO FILE") 　　　　　　　　pp = pprint.PrettyPrinter(indent=4) 　　　　for i in range(0,len(dir1)): 　　　　　　# pp.pprint(dir1[1]) 　　　　　　if dir1[1][i][ST_MTIME] != dir2[ 2007-01-16T00:32:14+09:00 1168875134 ファイルとフォルダの検索と時刻 https://w.atwiki.jp/fun-sci/pages/19.html ジェネレータos.walkで階層を総ざらいして os.statでフォルダ情報を拾います。基本的にコマンドで dir /s | grep -E "2006/12/17" とかの方が楽だし、速いのはしょうがない import time,os,sys,fnmatch from stat import * def watch_dir(dflt='.',patterns='*',yield_folders=False,single_level=False): patterns = patterns.split(';') for path,subdir,files in os.walk(dflt): if yield_folders: files.extend(subdir) files.sort() for name in files: for pattern in patterns: if fnmatch.fnmatch(name,pattern): yield os.path.join(path,name),os.stat(os.path.join(path,name)) break if single_level: break if __name__=='__main__': for mypath,mystat in watch_dir(dflt="/",patterns="tex",yield_folders=True): print mypath + " : " + time.ctime(mystat.st_atime) 2006-12-17T09:45:45+09:00 1166316345 pylab関連 https://w.atwiki.jp/fun-sci/pages/18.html -[[matplotlibとthreadの相性>http://www21.atwiki.jp/fun-sci/pages/17.html]] 2006-12-12T00:21:38+09:00 1165850498 matplotlibとthreadの相性 https://w.atwiki.jp/fun-sci/pages/17.html なんとなくmatplotlibを複数の画面で表示したくなったので ThreadPoolを作って、同じグラフを複数表示！と考えていたのですがどうもmatplotlibはThreadSafeではないようです。 PGはPython Cook Bookのスレッドプールサンプルを使い、下みたいな感じに修正。（さすがに全ソースはのせられない。。。） try: 　　if　command　==　'process': 　　　　　　　　　　　　#result　=　'new'　+　item 　　　　#mythread()内でsubplot()を呼ぶ　　　　result　=　mythread()　+　item 　　　　　　　　　　　　　　else: 　　　　　　raise　ValueError,'Unknown command　%r' % command except: 　　　print　'error　raised!' 　　　#report_error() 　　　traceback.print_exc() えーー、見事に止まってくださいました。しょーがねーなーと思い、１回に１度だけ実行させるようにして再度実行。 Fatal Python error: PyEval_RestoreThread: NULL tstate This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. むっきゃーーーーーー（のだめ風）ってわけで、matplotlibは、１回に１度だけ、１プロセス上で使うことをおすすめ。（って、適当に試しただけなので、しっかり改造すればできるかも） 2006-12-12T00:19:01+09:00 1165850341