Welcome

Apologia, decisions, & consequences

Pages

Recent Posts

Categories

Blogroll

Books

Data

Gaming

Tools

Archives

Tags

Data Science Time Warp Machine

February 23rd, 2025 by L'ecrivain

Fedora 38 freezes up and crashes sometimes when using Gnome on bare metal.  This may be the result of Gnome reliability issues.  In a previous article I detailed creating a massive repo of Fedora 38, and I still have it.  I will not delete the 238GB repo because Fedora 40 is the last one with Python 2.7 in the repositories.  They elected to completely remove it in Fedora 41 and beyond.  I created some software in Python 2.7 that may never make it to Python 3 because I will be an old man by the time I could complete the conversion relative to my available time in the present day. I had migrated from bare metal to WSL with Fedora 36 a few years ago. I had created my own WSL instance using the Fedora 36 cloud init image, and then upgraded it over the years to Fedora 38 and then ceased updating it.  WSL crashes and cannot be relied upon to run tasks that require many hours of continuous processing.

WSL really was wonderful for development and running Linux applications with underlying Linux features.  I used it for development using Pycharm.  The problem is that I would often return after 12 hours and see a message that the terminal could be closed with a CTRL + D which indicated that the service had stopped for some reason.  I suspect these occurred when available RAM conflicted with the /dev/share features of Linux.  Troubleshooting it would take too long. I don’t trust the releases from the Windows store because forced updates in Windows can take features away or cause unexpected problems.  I upgraded my Windows 11 home desktop to Windows 11 Pro specifically so I could disable Windows automatic updates via group policies, service disablement, and registry modifications that fail to stop auto updates on Windows 11 Home.

To create a long use time capsule of sorts, I decided to switch to Alma Linux 8 from Fedora 38.  Alma Linux 9 follows the tradition of RHEL 9 and removes the easy support for Python 2.

I setup Alma Linux 8.10 Cerulean Leopard, installed from the KDE live DVD, and installed r Studio server to access via web browser.

edit /ect/dnf/dnf.conf and add keepcache=True

dnf install epel-release    
dnf config-manager -enable powertools    
dnf install R    
dnf install python2

The python2 install installs pip2.7 automatically. One calls pip2 via the pip2.7 command.

As regular user the following is required for a script I made because parsedatetime changed after version 2.5 and is no longer compatible with the previous versions.

pip2.7 install parsedatetime==2.5 --user

• Install rstudio-2024.12.0+467-1.rpm from direct download

• Install rstudio-server-rhel-2024.12.0-467.rpm from direct download

systemctl enable rstudio-server

Configure the firewall to allow 8787.

usermod -a -G rstudio-server <username> 
setenforce 0

The last instruction to turn off SELinux is temporary until I can ascertain the specific rules that will need modification to allow it work. With SELinux enforcing with the initial configuration, the server cannot be accessed via web browser remotely

Posted in Computing Notes | Tagged: , , , ,

T+n time series analysis

December 10th, 2021 by L'ecrivain

This isn’t stock advice, it’s a data and math project. AAPL is interesting (not investing advice!) from a data perspective. Part of my big data project was the creation of a tool to analyze every symbol regressed on every other symbol for numerous lags. In addition to revealing the vendor’s mangling of data for unknown purposes, that analysis of lags produced at least one independent variable upon which the regression of future AAPL produced a .3 R-squared value. That is a weak or low effect size, but was a genuine discovery which passed hypothesis testing across time horizons. That was Direxion Daily S&P Oil & Gas Exp. & Prod. Bear 2X Shares (DRIP). DRIP was the first independent variable I discovered through the lag analysis. The problem is that the lag analysis is very computational intensive. Then, after doing that, ferreting out the red herrings because of bogus data takes another large amount of time. It seems that my data provider inserts bogus data producing .9 r^2 values between different vectors. These are of course problems that can one can mitigate with better code. That too, takes time. This isn’t a document about that first statistically significant predictor for AAPLt+3.

Keeping the t+ and t- in mind causes some difficulty for me. T+2 means today plus 2. If today is January 5, t+2 is January 7. If we are analyzing t+3, that is January 8. Programming in 0-indexed programming languages produced an inner impulse to count from 0 instead of 1. With zero indexing, the third option is 7 as shown in this example of a list [5, 6, 7, 8]. This is a heavily ingrained impulse that I must both use and mitigate. This project used Python and R and shell scripting.

The first 1 factor model devised using DRIP was as follows:

-2.065(DRIP Typical %change) -.0040 = AAPL Typical %changet+3

This model was tested in Stata using the data in manipulated in Python and R. This formula may be of no use in the future or even now. I might revisit this in discussing lags portion of the project later

Posted in Computing Notes | Tagged: , ,