Fitted CDF way off from empiracal data

17 views
Skip to first unread message

Ryan Lehmkuhl

unread,
Mar 13, 2024, 10:55:20 PM3/13/24
to powerlaw-general
Hi,

First off, thanks for maintaining such an excellent package! I work outside this area, and having a tool like this has saved me a ton of time :)

I'm trying to model some empirical data using a power-law distribution. Visually, the fit seems fairly tight, however I noticed that when I inspected the individual PDF values of the fitted distribution they were very far off near the head of the distribution.

To make sure I wasn't going crazy, I used the ipython notebook from the original paper (just FYI the link on the github page is broken). I looked at the word-frequency data–which the paper claimed was a good fit–and noticed that for x=200, the empirical CDF value is ~0.60, but the fitted distribution (I used lognormal which seemed tightest), was ~0.99.

Is this difference simply because the fit doesn't work well for the head of the distribution? This seems like a fairly large discrepancy. Any intuition you can shed would be helpful, thanks!

-Ryan
Reply all
Reply to author
Forward
0 new messages