之前截取了一些羽绒服的销量数据,以为能凑着这次广州飞雪的超强寒潮写一点东西,最后发现数据并不是很有说服力。由于淘宝反爬虫的策略,因此过程中用浏览器配合按键精灵每小时截图,最后也是搜集到了不少位图数据,因此有一个问题就怎样在这些位图中将数字抽取出来,因为数字几乎没有干扰,这是个很基本的OCR(Optical Character Recognition,光学字符识别)问题。
考虑还是用MATLAB做图像处理熟悉,简单查了下资料,思路很简单:
2.(关键)字符分隔,这个其实很简单,向x轴y轴分别投影,找出0和正整数之间的过渡位置,即字符和间隙的分隔位置
4.待识别的新样本重复1和2的操作,计算与记录特征的标准偏差,设定一合理阈值偏差,小于该偏差即可认为识别成功。
对于标准的屏幕显示的数字(或者字母),可以想象上述过程基本能做到极高的识别率了。手写体识别估计就更复杂了,需要用到机器学习等方法,木有涉猎,暂且不表。
识别部分程序代码如下
function img_number()
clear all;
close all;
echo off;
%---------------Read features---------------
x_mat=load('../feature/x_project.txt');
y_mat=load('../feature/y_project.txt');
I = imread('../img/2016011920.bmp');
I=I(612:625,900:946,:);
I_int = rgb2gray(I);
I_int=~im2bw(I_int,0.5); %binary and reverse
I_x_pro = sum(I_int,1); % project to the x axis
imshow(I_int)
num_x_start=strfind(I_x_pro~=0,[0 1])+1; %find the start point of a specific figure
num_x_end=strfind(I_x_pro==0,[0 1]); %find the end point of a specific figure
I_y_pro = sum(I_int,2); % project to the y axis
num_y_start=strfind(I_y_pro'~=0,[0 1])+1; %find the start point of a specific figure
num_y_end=strfind(I_y_pro'==0,[0 1]); %find the end point of a specific figure
fig_num = size(num_x_start); % how many figures in the image
fig_stdwidth = 6;
fig_stdheight = num_y_end-num_y_start+1;
for ii=1:fig_num(2)
I_std_ele_fig = imresize(I_int(num_y_start:num_y_end,num_x_start(ii):num_x_end(ii)), [fig_stdheight fig_stdwidth]);
std_x_pro = sum(I_std_ele_fig,1);
std_y_pro = sum(I_std_ele_fig,2)';
for jj=1:10
if(sum((std_x_pro-x_mat(jj,2:end)).^2)<10 & sum((std_y_pro-y_mat(jj,2:end)).^2)<10)
jj-1
end
end
end
Here is a quora answer about the slab ocean
Slab ocean models consider this whole upper layer as one single body of water, or a 'slab' with one single velocity vector (u,v), and density. The Navier-Stokes equation is modified accordingly to compute the resultant velocity of that slab based on wind forcing and coriolis force. The mixed-layer depth in such models is usually kept fixed.
Here is a FAQ about the slab ocean model. Two things to note:
SOM is a kind of data model, can be switched in a namelist variable F compset, env_run.xml.
Some standard SOM forcing is available in the inputdata repository.
Now we may start to test the slab ocean. compset choose: E_1850_CAM5
ERROR comes:
svn: URL 'https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/ocn/docn7/SOM/UNSET' non-existent in that revision
There should be something wrong with the set of the ocean forcing files. So come to this dir, no problem.
We need to set the specific foring files in the env_run.xml file. (check FAQ)
We choose pop_frc.b.c40.B1850CN.f19_g16.100105.nc for test.
In env_run.xml, only give the name of the file is okay to continue.
After downloading the file, you can build and submit the case. Yet there is no problem even after altering the physics into cam5 -chem none.
Perfect!
突然想到能否有什么方法知道发出去的某封邮件是否被别人查阅,查了下,有人提到图片挂参的方法可以办到,回想之前IP验证和php自动发邮件,突然想到可以做这么一套东西,每天固定时间统计一下今天发出去的邮件多少被查阅了,然后做个报告发到邮箱。将来撒海网套磁的时候,绝对是利器啊。LOL
说搞就搞:
首先,测试一下php返回图片
参照这个帖子,bingo,原来是改一下http的header就可以。
然后引入访问IP的检测,直接抄13年检测服务器访问沿海自动站IP的程序,调用IP138结果记录。
居然就可以了~给php文件挂个参数标志是哪一个邮件,嗯嗯
最后设置下crontab,每天固定时间把log发到邮箱,bingo~再写一个可以随时访问的查看页面,done~