Blog

For the blog index, please go here.

For the 2020 blog, please go here.

Subscribe to the blog in RSS here.

The Weekly Bitcoin Dump

[linkstandalone]

There's a long standing belief that crypto dumps over the weekend, one reinforced by prices over the last few days. If systematic highs and lows do exist the market should adapt, causing the timing of the weekly dump to drift over time. Fortunately, it is easy to verify whether this belief holds true from historical price data. Hourly closing Bitcoin prices on Binance between August, 2017 and February 2020 are available from cryptodatadownload.com. Using this data, I calculated hourly prices in each week as the percentage of the weekly maximum. Afterwards, it is a simple matter to average all the weeks in a year, providing an indication of when Bitcoin is at its lowest.

bitcoin prices
Average hourly Bitcoin prices during the week, as a percentage of the weekly maximum.

As seen in the above figure, Bitcoin is generally at its lowest and highest during the weekend. Between 2017-2021, weekly lows moved from Sunday, to Friday, and back to Sunday. However, over the first months of 2021 the weekly dump has occurred Monday morning. Conversely, the best time to sell has drifted from Saturday to Sunday.

bitcoin price trends
A: A 5th degree polynomial fit to Bitcoin prices. B: Detrended Bitcoin prices.

However, it could be argued that this data only demonstrates the average linear trend in Bitcoin prices over the week. In years where the price is steadily increasing, one should expect that the Saturday price will exceed that of the preceding Sunday. To correct for this bias, I detrended the price by fitting a 5th degree polynomial. As seen in the above figure, this was subtracted from the actual price, so that it now fluctuates around an average of $9900.

detrended bitcoin prices
Detrended average hourly Bitcoin prices during the week, as a percentage of the weekly maximum.

The same hourly prices over the week were calculated using this detrended data. As seen above, prices have still tended to pump on Saturday since 2017. However, this is usually preceded by a dump on Friday morning. More recently, Bitcoin prices have tended to dump Monday, rather than on the weekend. Therefore, the belief that crypto dumps over the weekend does not hold true every year.

Wed, 2021/02/24 17:51:32 +0000

Pulling User Data from Fosstodon

[linkstandalone]

I stopped using Facebook a while ago. Aside from the issues with privacy and free expression, it was a time sink, and the news feed only served as a source of unnecessary anger. However, I have been using the Mastodon instance Fosstodon lately. It's largely devoid of international news, I enjoy the community, and has a code of rules that suit my preferences.

I'm interested in how Fosstodon operates, when its members are the most active, and how often they interact with one another. As such, I've been playing with the Mastodon API, in order to pull statuses and user information from the server. While I'm not much for Python, the Mastodon.py library is an easy tool for interacting with the API. It only requires registering an application and logging in to your server.


from mastodon import Mastodon

# create an application
Mastodon.create_app(
     'myapp',
     api_base_url = 'https://fosstodon.org',
     to_file = 'myapp_clientcred.secret'
)

# log in to our server
mastodon = Mastodon(
    client_id = 'myapp_clientcred.secret',
    api_base_url = 'https://fosstodon.org'
)
mastodon.log_in(
    'parker@pbanks.net',
    'aReallyGreatPassword',
    to_file = 'pytooter_usercred.secret'
)

After connecting to your instance, the mastodon.timeline() function pulls statuses from the home, local, or public timelines, or statuses with a given tag, starting from max_id. For me, each call returned the last 40 statuses, so I used the following loop to pull the last 50000 toots from the local timeline, from May 28, 2020 to February 7, 2021. Keep in mind there is a request limit of 300 per 5 minutes, so the time.sleep() funtion can be used to space out requests.


# request files from local timeline, starting at max_id
myFile=mastodon.timeline(timeline='local', max_id=105687600003001040, limit=40)
output=myFile

# use last status id at starting point for next request
myId=myFile[39]["id"]
for x in range(1249):
    myFile=mastodon.timeline(timeline='local', max_id=myId, limit=40)
    myId=myFile[39]["id"]
    output.extend(myFile)

The end result is a list of dictionaries that contain the time and content of each status, tags, number of replies, boosts, etc. Also nested within is a dictionary detailing the associated account, creation date, total followers, etc., that I pulled into a separate list. Given I am only insterested in the frequency and timing of toots, I also removed status tags, content, and other unnecessary information. Finally, I put each list into a pandas data frame and exported these to excel files.


# put account information into separate list
accounts = [output[0]['account']]
for d in output[1:]:
    accounts.extend([d['account']])

# remove unwanted fields
for d in output:
    for e in ['in_reply_to_account_id', 'spoiler_text','uri','favourited','reblogged',
	'muted','bookmarked','content','account','reblog','media_attachments',
	'mentions','emojis','card','poll','tags']: 
        d.pop(e, None)
for d in accounts:
    for e in ['username','display_name','note','url','avatar','avatar_static',
        'header','header_static','last_status_at','emojis','fields']: 
        d.pop(e, None)

#convert lists to data frames
import pandas
output = pandas.DataFrame(output)
accounts = pandas.DataFrame(accounts)

# delete time zone from status and account creation dates
dfPosts['created_at'] = dfPosts['created_at'].astype(str).str[:-6]
dfAccounts['created_at'] = dfAccounts['created_at'].astype(str).str[:-6]

# cast account id to string to prevent rounding errors
dfPosts['id'] = dfPosts['id'].astype(str)
dfPosts.to_excel("foss50k.xlsx")
dfAccounts.to_excel("account50k.xlsx")

CSV files containing the Fosstodon statuses from May 28, 2020 to February 7, 2021 and associated accounts are available here.

Sat, 2021/02/13 00:48:25 +0000

Scanning Negatives the Hard Way

[linkstandalone]

Lately I've been getting back into film photography. Some years ago I took a fair number of medium format pictures using a Holga pinhole camera. I never digitized any of my negatives, mostly due to the lack of a decent scanner. I still lack a proper scanner, having only a sub-par Brother model meant for documents. However, I may be able to use a few mathematical tricks to make it work far past its intended potential.

scan of film negative
A poor-quality film scan.

As seen in my first scan, the main problem is I can't control the exposure time on my scanner and negatives end up much too dark. While some information can be recovered by correcting the contrast and gamma of each image, this is overwhelmed by the noise produced by the scanner. Moreover, I was lazy so the film was obscured by dust, hairs, and fingerprints. Fortunately, the random noise can be removed by averaging multiple scans. However, I also wanted to try removing various sources of systematic noise (i.e., prints, hairs, film imperfections). Ultimately, I took 125 scans of the negative in four different positions on the scanner bed, each with their own imperfections. After reading each image into Matlab, I corrected the contrast, gamma, and registered each image to align differences in position and rotation.


img = zeros(2692,5738,125);
for i = 1:125
    % read in scans and correct gamma and contrast limits
    img(:,:,i) = 1-mat2gray(double(imread(['Image_',num2str(i),'.bmp'])));
    img(:,:,i) = imadjust(img(:,:,i),[0.5, 1],[],10);
end

% register images, allowing for rotation and translation
mkdir Registered % directory for registered images
regImg = zeros(2692,5738,125);
[optimizer, metric] = imregconfig('monomodal'); % assume similar brightness and contrast
optimizer.MaximumIterations = 1000; % increase iterations
regImg(:,:,27)=img(:,:,27); % reference image

idx = [1:26, 28:125];
for i=1:length(idx)
    regImg(:,:,i) = imregister(img(:,:,i), img(:,:,27), 'rigid', optimizer, metric);
end

meanImg = mean(regImg,3);
figure; % display mean image
imshow(meanImg,[]);

gamma and contrast corrected film scans
Noisy, gamma and contrast corrected scans.

The end result is pretty rough. However, we can remove the noise produced by the scanner by taking the average of each pixel across all scans.

averaged scan
The average of all 125 negative scans.

However, two sources of noise still remain. The various hairs and fingerprints present in the different scans have to be removed. Second, regular vertical lines are present throughout the negative. While these would not be visible in a proper scan, the increased image contrast has made them more present. While the fingerprints will be difficult to address, the lines are regular and can be removed my moving the image into the frequency domain.


ft=fftshift(fft2(meanImg));
amp=abs(ft);
figure;
imagesc(log(amp));
axis image;

amplitude spectra
The amplitude spectra of the average image.

As seen above, a strong horizontal line is present throughout the frequency spectra of our image, representing the regular pattern of vertical lines. By removing these frequencies, we should also remove the lines without affecting the rest of our image. So, we will need a function to filter out frequencies ±2° from vertical. As the lines are closely spaced, we don't need to filter out low frequencies, only those above 200 cycles/image.


function thefilter = ofilter(oLow,oHigh,dim);
    % a bandpass filter of size dim that passes orientations between oLow and oHigh
    % horizontal and vertical are 0 and 90 degrees
    t1 = ((0:dim-1)-floor(dim/2))*(2/(dim));
    [t1,t2] = meshgrid(t1,t1);
    a=atan2(t1,t2)*180/pi;
    t1=t1.*t1+t2.*t2;
    clear t2
    a=mod(a,180);
    oLow=mod(oLow,180);
    oHigh=mod(oHigh,180);
    thefilter=ones(size(t1));
    if oHigh>=oLow
        d=find((a=oHigh) & t1>0); % find out-of-band frequencies
    else
        d=find((a=oHigh) & t1>0); % find out-of-band frequencies
    end
    if ~isempty(d)
        thefilter(d)=zeros(size(d));
    end
end

function thefilter = bpfilter(lowf,highf,dim);
    % a bandpass filter of size dim that passes frequencies between lowf and highf
    dc=[round(dim/2)+1,round(dim/2)+1];
    thefilter=zeros(dim,dim);
    dx=1:dim;dx=dx-dc(1);
    for yk=1:dim
        dy=dc(2)-yk;
        d=sqrt(dx.*dx + dy*dy);
        val1=find((d > lowf)&(d < highf));
        thefilter(yk,val1)=ones(1,length(val1));
    end
end

function thefilter=filter2d(lowf,highf,orient1,orient2,dim);
    % combines ofilter and bpfilter to pass frequencies between lowf and highf
    % and orientations between orient1 and orient2
    thefilter=bpfilter(lowf,highf,dim);
    temp=ofilter(orient1,orient2,dim);
    thefilter=thefilter.*temp;
    return;
end

Given the function filter2d, we can apply it to each image to remove all frequencies above 200 cycles/image that are within 2° of vertical.


thefilter = filter2d(200,5738,88,92,5738);
thefilter = 1-thefilter(1524:4215,:);

for i=1:125
    ft=fftshift(fft2(regImg(:,:,i)));
    amp=abs(ft);
    phase=angle(ft);
    r=size(amp,1);c=size(amp,2);
    dc=[round(r/2)+1,round(c/2)+1];
    dcAmp=amp(dc(1),dc(2));
    amp = amp.*thefilter;
    ft = amp.*exp(sqrt(-1)*phase);
    filtImg(:,:,i)=real(ifft2(fftshift(ft)));
end

filtered image
Frequency filtered image.

Finally! We have removed the lines and are one step closer to having a clean picture. All that remains is to remove the various hairs and fingerprints present in each image. Fortunately, the position of these flaws varied between scans. I wiped down the negative, removing some prints and causing others to be deposited. Since presence or absence of these flaws are independent of the image, we can use independent component analysis (ICA) to isolate them and subtract them from our average image. The FastICA package is one of the best implementations of ICA.


filtImg=filtImg(59:2643,50:5678,:); % to avoid exceeding RAM capacity
[rows, columns, scans] = size(filtImg);
pcaArray = reshape(filtImg, rows * columns, scans)'; % reshape array for ICA
mu = mean(pcaArray,1); % the average image, ICA automatically subtracts this
[icasig, A] = fastica(pcaArray, 'lastEig', 8, 'numOfIC', 8); % take first 8 components
baseCase = mean(A(76:125,:),1); % scores for fingerprint-free images

% positive and negative parts of the first three components
ICAp1 = icasig(1,:); ICAp1(ICAp1 < 0) = 0;
ICAm1 = -icasig(1,:); ICAm1(ICAm1 < 0) = 0;
ICAp2 = icasig(2,:); ICAp2(ICAp2 < 0) = 0;
ICAm2 = -icasig(2,:); ICAm2(ICAm2 < 0) = 0;
ICAp3 = icasig(3,:); ICAp3(ICAp3 < 0) = 0;
ICAm3 = -icasig(3,:); ICAm3(ICAm3 < 0) = 0;

independent components
The first four independent components.

As seen above, ICA has isolated all the fingerprints and hairs that obscured our image within the first three independent components. Since all scans had defects, the absence of a flaw in one image implies the presence of another elsewhere. Therefore, each component captures both the presence (positive values & lighter pixels) and absence (negative values & darker pixels) of various features. As we want to make an image that contains none of these flaws, we separate the positve and negative values of each component.

All that remains is to find the ideal combination of component weights that removes all blemishes from the average image (mu). My approach was to generate random weights for two sets of components, add these to the average image, and indicate which looked better. It is a bit hacky, but a few iterations resulted in an image that looked half decent.


WeightArr = [rand(1,2).* 0.026 - 0.013; ...
    rand(1,2).* 0.016 - 0.008; ...
    rand(1,2).* 0.0608 - 0.0304];
for i = 1:100
    tmpArr = [rand(1,2).* 0.026 - 0.013; ...
        rand(1,2).* 0.016 - 0.008; ...
        rand(1,2).* 0.0608 - 0.0304];
    figure('units','normalized','outerposition',[0 0 1 1]);
    subplot(2,1,1)
    imshow(reshape((mu+ICAp1.*WeightArr(1,1)+ICAp2.*WeightArr(2,1)+ICAp3 ...
    .*WeightArr(3,1)+ICAm1.*WeightArr(1,2)+ICAm2.*WeightArr(2,2)+ICAm3 ...
    .*WeightArr(3,2))',rows,columns),[]);
    subplot(2,1,2)
    imshow(reshape((mu+ICAp1.*tmpArr(1,1)+ICAp2.*tmpArr(2,1)+ICAp3 ...
    .*tmpArr(3,1)+ICAm1.*tmpArr(1,2)+ICAm2.*tmpArr(2,2)+ICAm3 ...
    .*tmpArr(3,2))',rows,columns),[]);
    drawnow;
    w = waitforbuttonpress;
    if w
        WeightArr = tmpArr;
    end
end

imwrite(uint16(mat2gray(reshape((mu+ICAp1.*WeightArr(1,1)+ICAp2.*WeightArr(2,1) ... 
+ICAp3.*WeightArr(3,1)+ICAm1.*WeightArr(1,2)+ICAm2.*WeightArr(2,2)+ICAm3 ...
.*WeightArr(3,2) )', rows, columns) ).*65535),'FinalImg.tif');

the final image
The final image, after filtering, ICA, and averaging.

And here it is, the final image. While it still could use a lot of work, the end result is a far cry from what my scanner initially produced. However, the main lesson is that I should probably buy a better scanner.

Sun, 2021/01/24 05:15:35 +0000

Clean Drinking Water

[linkstandalone]

The importance of clean groundwater is pretty universal, about half of the world's population uses it for drinking water. However, ensuring that groundwater is clean and salt-free usually requires installation of a permanent monitoring well, a costly and often impractical process. Installing a well requires drilling equiptment, the wells aren't cheap, and they often get destroyed.

Conversely, measuring the salt content of a soil sample is a fairly simple process. It's temporary, doesn't require much equiptment, and you dont have to worry about landowners or animals destroying your handiwork. A while ago I published a paper on how to estimate the salt content in groundwater from soil samples. I've been learning JavaScript lately, so I made a tool for estimating groundwater quality from soil samples based on my findings. If you work in the environmental industry, hopefully you find it useful.

Thu, 2021/01/07 17:58:14 +0000