Sensitive data in constants; language feature dilemma.

34 views
Skip to first unread message

Rob Meijer

unread,
Dec 27, 2025, 11:18:42 AM (13 days ago) 12/27/25
to cap-...@googlegroups.com
Working on a DSL that is closure oriented, I am currently implicitly capturing constants and explicitly capturing mutables (variables).

I think it's commonly considered bad practice to hard code sensitive data into a program, but people do it all the time.

I am now torn on the subject. It is easy enough to add a modifier on constant definition to mark the constant for explicit rather than implicit capture. Something like:

sensitive string myApiKey = "somesecret";

But by adding the modifier I feel that maybe I'm incentifying the hard coding of sensitive data.

I'm interested to learn what others think of this. Should I accept that people hardcode secrets and make the practice a bit saver? Or does providing this extra safety make the problem worse because it makes users think the language has their back and it's all OK to hard code sensitive data into constants now?

Rob Meijer

unread,
Dec 27, 2025, 1:04:59 PM (13 days ago) 12/27/25
to cap-...@googlegroups.com
I wrote some more concrete stuf on this in a short blog post:

https://hive.blog/hive-169321/@pibara/looking-for-feedback-on-a-programming-language-design-issue

All input is welcome.

David Nicol

unread,
Dec 28, 2025, 1:05:00 AM (12 days ago) 12/28/25
to cap-...@googlegroups.com
I haven't got a HIVE login so I couldn't post this as a reply:

keep all sensitive stuff in some kind of configuration file/database outside of the program. This approach also helps make testing safer. So my suggestion is, have the language support reading r-values from somewhere else. I don't want to suggest how exactly to do that.

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CAMpet1Xo19roNPCaC74YeW-7-g850hC40T1F%3DbDEhu4HAKA4fg%40mail.gmail.com.


--
Nothing but net

Rob Meijer

unread,
Dec 28, 2025, 5:49:12 PM (12 days ago) 12/28/25
to cap-...@googlegroups.com
Tnx, that idea aligns with an idea that I had been playing with for another project a while ago, but your suggestion makes it click that I probably should spend some time on that idea. A little under two decades ago (I'm bad at dates) I made a set of two FUSE file-systems that amongst other things basically bridged E and AppArmor so E's VATs were safe from attacks from below. I was thinking about something like that but much simpler as a kind of local vault for sensitive strings. I probably won't need AppArmor, just FUSE. While it wouldn't solve the question if I should add a way to *not* implicitly capture non-mutables, if I make access to that vault part of the runtime, it could create a (potentially balancing) incentive to not put secrets in non-mutables.

I think that if I decide to write the simple FUSE vault filesystem, however much I conceptually like the "hazardous" modifier on the rvalue, I think it's probably going to boil down to "sensitive" or "inert", or maybe it is worth the little bit of extra verbosity to do both so ay scalar declaration is either mutable, inert or sensitive where mutable and sensitive require explicit capture and inert is captured implicitly.

 


John Carlson

unread,
Dec 28, 2025, 11:33:00 PM (11 days ago) 12/28/25
to cap-...@googlegroups.com
While I somewhat agree with not putting configuration data in your program, you must also remember to protect your configuration data if it contains sensitive data.  Don’t just put it on a website without proper protection.  Also, checking it into GitHub is also a non-starter.

If you put it in part of a configuration php file, and the php code is not downloaded or put in a  repository, and there’s no way to download arbitrary files from your website through a request, that’s a good start.

Secondarily, one has to consider using a database which has external credentials is a good idea or not.   The credentials should definitely be capabilities if at all possible, but I’ve not seen a SQL database that supports this, at least Oracle and MySQL.  My data may be dated.  I know that Microsoft used to use DSNs, but I’m not up on current Microsoft database technology, and I don’t know if DSNs are stored securely or not.

Probably you guys have many more ideas than the non-capability and security technology that I normally use.

Note that an API key can be considered configuration data. What I’ve seen in Node.js is keys are stored in .env and .env.local, not checked into a repository and protected from being downloaded from websites.  I think that keeping your .env or .env.local encrypted just moves the attacker backwards one step, and I doubt the effectiveness, having decrypted a user/password stored in a configuration file that ended up being admin/admin.  Putting your decryption code, even compiled, or configuration file in a Java .jar file or even binary code doesn’t stop people, either.

Putting secrets in .ssh files just means the attacker has to break into your account, which might be difficult.   When will .ssh folders be stored in the TPM?

That’s probably the sum total of my wisdom, and it’s not great.  C-lists sound great for storing capabilities, but do they have them on macos, Linux or Windows?

Do ordinary programmers even know how to use TPMs? Do modern compilers use them?   Are binaries that  rely on TPMs portable off their original hardware?

I’ve seen people store their data on a USB drive that’s not plugged in.  Then you have to worry about physical security of that.  Fences, guards, vaults, etc.  And USB drives are a non-starter, because people just walk away with them, even unwittingly.

Maybe make your vault built into the foundation of your building?  Ultimately, one has to trust people, even the people in your military. Avoiding confused deputy is great.  The only alternative is God.

Again, I am not a capability or security expert, I just work with ordinary technology.   I don’t know TPMs at all, but I know hardware has security flaws.

Sigh!

John Carlson

Matt Rice

unread,
Dec 29, 2025, 3:59:10 AM (11 days ago) 12/29/25
to cap-...@googlegroups.com
My thinking is that just tracking capture isn't really sufficient to
avoid escape of the secret data,
If you look at a language like jeeves, consider say GPS coordinate and
a function that converts that into place name strings, or nearest
address.
It doesn't need to capture the coordinates for the data to escape. To
really avoid the escape you need to do something like refer to secrets
indirectly
and not derive them, or avoid computations containing data derived
from secrets and do some kind of taint tracking.

https://github.com/jeanqasaur/jeeves was one language which tracked
computations derived from secret data.

I've also seen cases which were very similar to timing attacks where
compiler optimizations enabled you to figure out secrets without
the ability to read them directly, e.g. if you had a reference +
interning then you could create a new reference that the attacker
controls
and compare them.

All that said, I think capture is far more important than the
treatment it is typically given in programming language literature
which
typically can be summarized by a section on free and bound variables.
I thought this was an interesting paper.
https://dl.acm.org/doi/10.1145/3618003#Bib0033

Lastly, I really don't like when implementations and
signatures/headers are specified in the same file.
Languages like ML don't actually have *any* visibility modifiers, if
something is specified in a signature/header it is visible.
If it is omitted from the signature it is private. Whenever you have
"public", "private", "sensitive", "hazardous" visibility qualifiers.
There is always a tension that this qualifier has a perspective which
is not signaled by the qualifier itself. Treating
visiblity/capturability in this kind of boolean way
has always in my experience seemed to be an oversimplification that
always ends up confusing. Public or private within which layer of the
onion, the qualifier itself
does not say. It implies that there is a global perspective from which
you can view your program, and every value falls to one or the other
side.

In ML, where you have say `signature FOO = sig end;` for something
with no fields, and `signature BAR = sig val x: u8 end;` for a value
with a single field
you are given the perspective of where a field is visible from, when
viewed through either the FOO or the BAR signature, which promotes
local reasoning...

Sorry, if I rant.

John Carlson

unread,
Dec 29, 2025, 4:23:19 AM (11 days ago) 12/29/25
to cap-...@googlegroups.com
Update:

According to Google AI, you can use TPM2 to store .ssh private keys in your TPM (good to know).

I've only had Windows 11 for a year and 1/2, and I hadn't looked this up before.

WSL2 also looks like it might be an option.

Good news, now to unpack it.

John

Rob Meijer

unread,
Dec 29, 2025, 9:15:05 AM (11 days ago) 12/29/25
to cap-...@googlegroups.com
Great input, especially about the computation. That was a missing link in my reasoning right now, and it ties well into my tiny type system.
I've tried to use your input and David's and I think I'm getting close to something I can implement:

https://peakd.com/@pibara/version-03-of-the-merg-e-language-specification--sensitive-data-in-non-mutables-and-future-vault-support

Not sure yet if the choice for dataframes even if practical given my limited spare time is fully defensible. But I guess I can always add more granular tainting later if I must.

Before, my language spec drafts had a simple rule: mutable data holds authority and should be "explicitly" captured from the surrounding closure and immutable data (constants) are perfectly safe to conveniently capture implicitly as long as it happens within a single file (imported function definitions need explicit capture either way). I just hadn't considered sensitive data at all outside of that managed by the runtime under my "ambient" DAG. I'm hoping my current draft as in the linked blog post gives a sane middle ground between ignoring that constants can be sensitive and verbosely making all constant captures explicit (including that of the language itself that lives in its own sub-DAG).

Anyway, thank you for your input, the calculation bit at least helped me take a few more steps.

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.

Rob Meijer

unread,
Dec 29, 2025, 9:21:35 AM (11 days ago) 12/29/25
to cap-...@googlegroups.com
My primary development platform right now is Linux, so Windows wasn't really on my radar for this project, at least not as a serious target platform for now. But TPM might be worth looking into. But I'm not even sure if Linux uses it or even gives user space access to it if it does. I'm guessing it could help with creating a vault solution, so maybe I should look into it as part of the larger stack where I want my DSL to live.

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.

Kevin Reid

unread,
Dec 29, 2025, 10:15:29 PM (11 days ago) 12/29/25
to cap-...@googlegroups.com
On Mon, Dec 29, 2025 at 12:59 AM Matt Rice <rat...@gmail.com> wrote:
My thinking is that just tracking capture isn't really sufficient to avoid escape of the secret data,
… To really avoid the escape you need to do something like refer to secrets indirectly and not derive them, or avoid computations containing data derived from secrets and do some kind of taint tracking.

I think that from a capability perspective, the right thing to do here is to treat the secret labeling as a sort of membrane. Either explicitly or by default, the basic thing you do with a sensitive/secret value is compute another sensitive/secret value in the same membraned-cell (which does not reveal any of the bits of the value), and if you want to extract the data or re-label it into a different cell, that's a separate operation that you have to do explicitly to say “I know what I’m doing”. (If the re-labeling is done by an object *inside* the membrane, then you have a TPM-like system (that can implement properties like "keys don't leave but signatures do"), which could perhaps be implemented on, or API-translated to, an actual TPM.)

John Carlson

unread,
Dec 30, 2025, 3:46:57 AM (10 days ago) 12/30/25
to cap-...@googlegroups.com
I am starting to realize serious issues with TPMs, like what do I do if the system with my private key goes down, and the private key was used to access a remote system through .ssh, and I don’t have a second way to access the remote system?

This means that all remote systems should have at least two public keys, from two hosts, or a password fallback/reset.

Hmm.   At least I thought before I leaped.

John

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.

John Carlson

unread,
Dec 30, 2025, 3:49:57 AM (10 days ago) 12/30/25
to cap-...@googlegroups.com
If I can’t really depend on TPMs to store my private keys, is their only purpose to identify me?

John
Reply all
Reply to author
Forward
0 new messages