Welcome

Apologia, decisions, & consequences

Pages

Recent Posts

Categories

Blogroll

Books

Data

Gaming

Tools

Archives

Tags

Minecraft Tweaks

March 16th, 2023 by L'ecrivain

To launch

java -Xms2048M -Xmx2048M --add-modules=jdk.incubator.vector -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 -XX:InitiatingHeapOccupancyPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem -XX:MaxTenuringThreshold=1 -Dusing.aikars.flags=https://mcflags.emc.gs -Daikars.new.flags=true -XX:G1NewSizePercent=30 -XX:G1MaxNewSizePercent=40 -XX:G1HeapRegionSize=8M -XX:G1ReservePercent=20 -jar server-1.21.1.jar nogui

Posted in Computing Notes | Tagged: , ,

T+n time series analysis

December 10th, 2021 by L'ecrivain

This isn’t stock advice, it’s a data and math project. AAPL is interesting (not investing advice!) from a data perspective. Part of my big data project was the creation of a tool to analyze every symbol regressed on every other symbol for numerous lags. In addition to revealing the vendor’s mangling of data for unknown purposes, that analysis of lags produced at least one independent variable upon which the regression of future AAPL produced a .3 R-squared value. That is a weak or low effect size, but was a genuine discovery which passed hypothesis testing across time horizons. That was Direxion Daily S&P Oil & Gas Exp. & Prod. Bear 2X Shares (DRIP). DRIP was the first independent variable I discovered through the lag analysis. The problem is that the lag analysis is very computational intensive. Then, after doing that, ferreting out the red herrings because of bogus data takes another large amount of time. It seems that my data provider inserts bogus data producing .9 r^2 values between different vectors. These are of course problems that can one can mitigate with better code. That too, takes time. This isn’t a document about that first statistically significant predictor for AAPLt+3.

Keeping the t+ and t- in mind causes some difficulty for me. T+2 means today plus 2. If today is January 5, t+2 is January 7. If we are analyzing t+3, that is January 8. Programming in 0-indexed programming languages produced an inner impulse to count from 0 instead of 1. With zero indexing, the third option is 7 as shown in this example of a list [5, 6, 7, 8]. This is a heavily ingrained impulse that I must both use and mitigate. This project used Python and R and shell scripting.

The first 1 factor model devised using DRIP was as follows:

-2.065(DRIP Typical %change) -.0040 = AAPL Typical %changet+3

This model was tested in Stata using the data in manipulated in Python and R. This formula may be of no use in the future or even now. I might revisit this in discussing lags portion of the project later

Posted in Computing Notes | Tagged: , ,

Stata reference material

October 11th, 2021 by L'ecrivain

Linear regression analysis using Stata contains a look at Stata output for regression. There is also an excellent document on Regression Analysis | Stata annotated output at this link. There is an excellent document on Basic Data Analysis and Manipulation at this link. Yt−1 is the first lag of Yt (Chapter 10, p. 1).

Posted in Computing Notes | Tagged:

Rudimentary versioning system

October 19th, 2020 by L'ecrivain

Some time ago, Python 2 was the default language for use with Linux and Gnome 3. A set of extensions for Gnome, called Nautilus Python existed which allowed one to create customized right click menus. One of these was called “Historical Copy” and it created a lovely copy of the file with a timestamp inserted into the file name. The timestamp was constructed to allow the files to appear easily sorted when perusing directories. Software rot affects all software and especially software that requires Python 2 libraries that maintainers no longer ship on new versions of Linux. To counter this problem, we have a rudimentary versioning system which adheres to the Keep it Sweetly Simple (KISS) principle.

The following code is appended to the .bashrc file in the home directory.

### Historicaly copy rudimentary versioning system
### Saves a historical copy and a note about that copy
### Compressed to 68 character width for website display purposes

historicalcopy() { mkdir -p local-historical-versions &&\
 timestamp=$(date +"%y.%m.%d.%H%M") && read -p\
 "Enter filename:" n && read -p "Enter note: " note &&\
 filename=$(pwd)/$n && file_name=$(basename $filename) &&\
 left_side="${file_name%.*}" && extension="${file_name##*.}" &&\
 cp $filename\
 local-historical-versions/${left_side}-$timestamp.$extension &&\
 cp $filename\
 /var/www/html/hv/archives/${left_side}-$timestamp.$extension &&\
 sed -i "11i${left_side}-${timestamp}.${extension} ${note}"\
 /var/www/html/hv/index.php; }

From the directory with the file to be versioned, the command historicalcopy is typed.  This creates a directory in the current directory called local-historical-versions and copies the historical version into that directory.  It then copies the file to a complete historical versions archive and appends a PHP file in the web-server directory with both a link and comment.  The reason periods are used instead of dashes is because experience demonstrated that my software using Python had difficulty with filenames incorporating dashes.  This naming style is similar to the RPM naming convention which uses name-version-release.  This rudimentary versioning system uses a timestamp as the version number.  There are plenty of more advanced systems such as git, but sometimes we can work more efficiently with a simple direct historical list of what file did what way back when.  This could easily be changed so that an html file is updated in a local directory instead of a PHP file on a web-server.  The PHP file is future usage incorporated with user authentication and a long term code repository.  The code can then be used on OSX and Windows Subsystem Linux by pulling the PHP to the local machine, inserting the necessary file, and then transferring it back to the web-server via SSH.  In a way that seems like Git, but this is for the use case where one wants a simple to use list.

A typical workflow goes something like this.  Open up a terminal and navigate to the directory containing a heavily evolved r script.  Use the historicalcopy command on that script with a note something like “prior to adding the new data frame for time series data on Kentucky unemployment”.  Then open that file in the editor of choice to work on it.  It is very useful for Python programs where huge changes can take place which require significant removal of existing code.  This is the case for one of my projects which has been an ongoing project involving thousands and thousands of lines of code that has evolved over four years.  This simple scheme lets me remember which files had the important code that I still want to use in the future. The flat file format and easy naming convention allows easy migration, backups, and reduces the learning curve.

image

Posted in Computing Notes | Tagged: ,

Persistent Notification in Gnome 3

August 15th, 2020 by L'ecrivain

This is a GTK notification in Linux that remains on screen until it is clicked.

#!/usr/bin/python3

import gi
gi.require_version('Notify', '0.7')
from gi.repository import Notify
gi.require_version('Gtk', '3.0')

Notify.init("Hello world")
Hello = Notify.Notification.new("Hello world - the heading",
                                "This is an example notification.",
                                "dialog-information")

# https://www.devdungeon.com/content/desktop-notifications-linux-python
Hello.set_urgency(2) # Highest priority
Hello.show()

Last updated and tested 12 December 2020 on CentOS 8 (not Stream).

Posted in Computing Notes | Tagged: ,

Labeling variables in R

February 2nd, 2020 by L'ecrivain

This great procedure makes it easy to remember what variables are related to in R. One of the troubles with exploratory data analysis is that when one has a lot of variables it can be confusing what the variable was created for originally.  Certainly code comments can help but that makes the files larger and unwieldy in some cases.  One solution for that is to add comment fields to the objects created so that we can query the object and see a description.  So, for example, we could create a time series called sales_ts, and then create a window of that, called sales_ts_window_a, and another called sales_ts_window_b, and so on for several unique spans of time.  As we move through the project we could have created numerous other variables and subsets of those variables.   We can see the details of those by using head() or tail(), but that may not be an extremely useful and clear measure.

To that end, these code segments allow applying a descriptive comment to an item and then querying that comment later via a describe command.

example_object <- "I appreciate r-cran."
# This adds a describe attribute/field to objects that can be queried.
# Could also change to some other attribute/Field other than help.
describe <- function(obj) attr(obj, "help")
# to use it, take the object and modify the "help" attribute/field.  
attr(example_object, "help") <- "This is an example comment field."
describe(example_object)

The above example refers to an example object, that could easily be sales_ts_window_a mentioned above.  So we would use the attribute command to apply our description to sales_ts_window_a.

attr(sales_ts_window_a, "help") <- "Sales for the three quarters Jan was manager"
attr(sales_ts_window_b, "help") <- "Sales for the five quarters Bob was manager"

After hours or days have passed and there are many more variables under investigation, a simple query reveals the comment.

describe(sales_ts_window_a)
[1] "Sales for the three quarters Jan was manager"

This might seem burdensome, but RStudio makes it very easy to add this via code snippets. We can create two code snippets. The first is the one that goes at the top of the file which defines the describe function that we use to read the field we apply to the comment to. Open RStudio Settings > Code > Code Snippets and add the following code. RStudio requires tabs to indent these.

snippet lblMaker
        #
        # Code and Example for Providing Descriptive Comments about Objects
        # 
        example_object <- "I appreciate r-cran."
        # This adds a describe attribute/field to objects that can be queried.
        # Could also change to some other attribute/Field other than help.
        describe <- function(obj) attr(obj, "help")
        # to use it, take the object and modify the "help" attribute/field.  
        attr(example_object, "help") <- "This is an example comment field."
        describe(example_object)

snippet lblThis
        attr(ObjectName, "help") <- "Replace this text with comment"

Now one can use the code completion to add the label maker to the top of the script. Simply start typing lblMak and hit the tab key to complete the code snippet. When wanting to label an object for future examination, start typing lblTh and hit tab to complete it and replace the objectname with the variable name and replace the string on the right with the comment. These code snippets provide a valuable way to store descriptive information about variables as they are created and set aside with potential future use.

This functionality does overlap with the built in comment functionality with a bit of a twist. The description added via this method appears at the end of the print output when typing the variable name. The built in comment function does not print out. It is also less intuitive than describe() and receiving a description.

R contains a built in describe command, but it often is not useful. Summary is the one I use most often. For a good description, I import the psych package and use psych::describe(data). Because of that, the describe method in this article is very useful. The printout appears like below with the [1]…

lu71802xbt90_tmp_dac5c795

Adding attributes other than “help” could easily be accomplished. DescribeAuthor, DescribeLocation, and other functions could be added. When using a console to program, a conversational style makes it flow better.

Posted in Computing Notes | Tagged:

My Favorite Function

May 25th, 2019 by L'ecrivain

My favorite function of all time is varsoc in Stata.  That’s saying a lot because I have been working with computers for decades and have written software in several languages, used many different types of administrative software tool sets, and owned a lot of books with code in them.  Varsoc regresses one variable, y, upon another variable, x, and then regresses each lag of y on x to produce output that allows one to know the best fit lag for a regression model.   It allows someone analyzing time series data to immediately know that data from the several prior is a better predictor of today’s reality than more recent data.  I adore Stata for scientific analysis.  In order to use this for my big data project, I needed to automate it, and so I wrote an R vignette that would analyze 45 lags and produce the relevant test statistics. My vignette produces r2 values1, parameter estimates, and f-statistics for 45 lags of y regressed on x. The p-values are then written to a CSV file. The decision rule for a p-value is that we reject the null hypothesis if the p-value is less than or equal to α/2.2 The data comes from 5GB of CSV files that were created via Python.

Running the lags shows us the relationships between the historical prices of two securities. When we regress y on x in this case, we are regressing the price of security 2 on security 1. We then do this on a lag. The L1 of security 2 regressed on security 1’s L0. Then we regress L2 of security 2 on security 1’s L0. This occurs for 45 iterations. For example, we might find that the price of a gold ETF 44 days ago has the best relationship with the price of Apple stock today as compared to the price of that same gold ETF 12 days ago and even today. That’s an example only and not anything substantiated in the data. There will certainly be some spurious relationships. An ETF buying shares of Apple and then the same ETF’s fee going up the next month, for example. To mitigate this, the vignette uses the first difference of the logarithm so that the data is stationary. The CSVs are already produced so that unit roots are accounted for. This is a research project to identify what actually bodes well in other sectors. It runs on every listed security on the American exchanges. Every symbol is regressed on Apple. Every symbol is regressed on Microsoft, and so on. The data is stationary and unit roots are eliminated.

I initially began this project some time ago and at that time I stopped because it was going to take a solid month of continuous 12-core processing to accomplish the entire series. In retrospect, I should have let that proceed but there would have been a great tradeoff in that I couldn’t have played Roblox, The Isle, and Ark Survival Evolved with my daughter. Finally, I’ve got the research running on a new machine dedicated to that purpose. That machine uses an AMD Ryzen 5 3500 and NVMe SSD. The program is running on 6 cores in parallel. Previously, with the one month estimate, it was running concurrently on 12-cores of Westmere Xeon CPUs and storing the output in RAM instead of on an SSD. This will serve as an interesting test for the Ryzen since all six cores will be running at 100% for months on end. The operating system is OpenSuse Leap 15.2, the R version is 4.05, and the Python version is 2.7.18.

One of the reasons to write these articles is for my own memory. It gets older to remember as one gets older. These blog posts are essentially a public notebook to aid myself and others.


1  R2 is the coefficient of determination, which is the square of the Pearson correlation coefficient, r, the formula for which is ρ=β1(σx/σy), where β1 is the parameter estimate. ASCI and Unicode text does not have a circumflex, ^, on top of the β. For this documentation the objective is multiplatform long-term readability so an equation editor with specialized support for circumflexes is out of the question.

2  There is also the existence of the rejection region method. We reject the null hypothesis if the test statistic’s absolute value is greater than the critical value, which we can express with the formula Reject if |t| > tα/2,n-1

Posted in Computing Notes | Tagged: , , ,

Synchronize time on CentOS 6

December 27th, 2011 by L'ecrivain

This will start the time service and synchronize the clocks on CentOS 6.

yum install ntp 
chkconfig ntpd on 
ntpdate pool.ntp.org 
/etc/init.d/ntpd start

Posted in Computing Notes | Tagged: , ,

Next Entries »