Special use case with time related data entries

20 views
Skip to first unread message

Thomas Tempelmann

unread,
Feb 16, 2022, 5:08:29 PM2/16/22
to Wizard User Group
Is this list, support and development still active? Activity seems to have stopped a few months ago.

I am a software developer – not a statistics guy at all. The terms used in the tutorial are unfamiliar to me (also, English is not my first language).

I will therefore try to explain what I try to achieve in layman's terms and hope someone can shed some light on whether Wizard can be of use to me.

My own programs "phone home" to let me collect anonymous usage data over time. The basic entries for each recorded call are:

id - that's a random generated ID for each user
when - the time the call was made
os_version - which OS was the user using
app_version - which app version was the user using
language - which system language the user was using

Now, I may have many (repeated) entries from the same user ID. When I want to know the current distribution of OS versions, I'd have to pick the _latest_ call for each ID, i.e. for each ID I'd have to select the "when" with the highest value. Can I achieve such a selection with Wizard, or would I have to extract the latest IDs myself from my database, and feed only the latest ones into Wizard?

OTOH, if I wanted to know which versions were used last year, and how their distribution was, it gets even more complicated: First, I'd have to select all entries in the time range. That may well be possible in Wizard, I think. But then, since some IDs have many more entries than others, I cannot simply call all os_version records. Instead, I'd again have to group them by their ID first.
Consider I have only two callers: One has called 100 times over the year, and always used version 10.12. The other called 10 times, with the 8 calls in January-Feb using 10.11, and the other 2 calls in April and October with 10.13.
That should tell me that one user used 10.12 whereas one user used 10.11 in ~2.5/12th of the year and 10.13 in ~9.5/12 of the year (assuming that the gap between Feb-Apr gets average to around mid-March). Can Wizard do that, too?

Evan Miller

unread,
Feb 18, 2022, 9:31:53 AM2/18/22
to wizard...@googlegroups.com
Hi Thomas,

Thanks for the mail! It's an interesting selection problem you describe here – I think ideally you'd have some kind of indicator variable that is true when the call is the latest call from a user ID, and false otherwise. Wizard isn't really equipped for an advanced query like this, but it's helpful for me to know about these kinds of use-cases.

To your second question – you might be able to make some progress on it with the Pivot view. Basically you could use a "Category Percentages" aggregation function, and then pivot by the User ID. Then copy the Pivot table to a new table for further analysis. This may get a bit hairy with a large number of User IDs (i.e. ranging in the tens / hundreds of thousands).

Overall Wizard is designed to analyze tables that are prepared with independent observations (in your case calls rather than users), so you'll likely need a preparatory step to perform the kinds of analyses that you describe.

Hope that helps!

Evan

--
You received this message because you are subscribed to the Google Groups "Wizard User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wizard-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wizard-users/53027de5-a4a0-48c5-97da-eb0423c7edc8n%40googlegroups.com.


Thomas Tempelmann

unread,
Feb 18, 2022, 11:14:03 AM2/18/22
to Wizard User Group
Thanks, Evan.

If you are still thinking of adding new features to Wizard - I wonder if a "programmable" SQL query would be possible that would enable my need, e.g. that I can provide customized SQL query commands that have parameters, which I can then set in the Wizard GUI. May be quite the power feature that hardly anyone needs, though. As a programmer, I love such options, though (and many of my own apps have scripting options) :)

I would imagine that I could "hook up" my SQL database (as a TCP connection or by selecting a local sqlite) permanently, allong with choosing either a single Table or a View, so that you'd, by default, pull the data out of it instead of copying them into your own storage, if your model allows that. Then, I could add custom queries that create a subset of the database's records, like a filter.

Or, go a different way by allowing scriptable filters that operate on your locally stored data. Have you had a look at Apple's JavaScriptCore, especially the newer ObjC API? I just used that to add very flexible customization to my FindAnyFile (http://findanyfile.app/beta.php), see https://findanyfile.app/scripting.php.

Since the API allows you to pass even NSArray and NSDictionary objects to JS and back, you can have a script filter the data set quite efficiently, I believe.

For me, it's been fun just adding that capability. I don't expect many users to write scripts on their own, but I can, whenever a user has a special request that FAF doesn't currently cover, simply send them a script I made for them now. I did a similar thing a few years ago for iClip, where I added full Applescriptability, and since then I have often been able to resolve app that do not play well with the Pasteboard by sending them custom scripts that handle that problem.
That way, I don't have to make special releases for my users all the time any more and can also help them much more quickly.

Huh, I should blog about that :)

Reply all
Reply to author
Forward
0 new messages