Efficiently reading a null-terminated String from process address space

81 views
Skip to first unread message

Daniel B. Widdis

unread,
Jul 12, 2021, 1:29:21 PM7/12/21
to Java Native Access
On Solaris and AIX (and probably other *nix variants using procfs), the strings for process arguments and environment variables are stored in the process address space, made accessible to users via the /proc/<pid>/as pseudofile.  Linux has the equivalent access in /proc/<pid>/mem, although it makes those strings more easily accessible elsewhere.  

Still, this question is more general for reading any null-terminated String from process address space given the starting pointer address.

Obtaining the pointers to the beginning of these strings is easy enough.  But there is no information for how long they are.

I have successfully read these (C) Strings using the following approach:
 - Used libc.open() to open the "/proc/pid/as" file, obtaining a file descriptor
 - Allocated memory for a single byte (new Memory(1))
 - Created an empty StringBuilder.
 - In a loop, starting at the pointer address of the start of the string:
   - Used libc.pread() on the file descriptor to copy the (single) byte at the specified address into my memory allocation
   - Checked for 0 value to exit the loop, otherwise
   - Cast the byte to char and appended to the StringBuffer.

While my implementation works, the byte-by-byte approach seems incredibly inefficient.  Plus, libc.pread() needs an off_t argument which it's impossible to really know the size of. I'm assuming it's a NativeLong but that's technically wrong.

A few questions:

1. Given that I already have the pointer value, do I need to actually do the file descriptor/pread steps at all, or can I just create a new Pointer() at the address and call getString(0) safely?

2. How inefficient is a byte-by-byte call to libc.pread()?  Might this be a good use case for direct mapping to improve that performance without code complexity?

3. If the existing approach is valid, I'd ideally like to read bigger chunks of memory (ideally the whole string at once).  The memory assigned to a process is noncontiguous, but it looks like I can read /proc/<pid>/map to find the boundaries, however.  But that introduces a lot more complexity and potentially reading too much memory.  Might it be safe to assume I own the whole page a pointer resides in, and use the pointer address modulo PAGE_SIZE to calculate that page's boundaries?  

4. Are there smaller allocation units than PAGE_SIZE that are always safe to assume (e.g., 32 bytes or 64 bytes on processes with the same-size data model)?


--
Dan Widdis

Durchholz, Joachim

unread,
Jul 12, 2021, 2:02:08 PM7/12/21
to jna-...@googlegroups.com

Suggestion 1:

  • Read the string and count bytes until you hit the NUL byte.
  • Create a StringBuilder of the right size and re-read the content.

 

Suggestion 2:

  • Open the file normally, like you’d open any binary file.
  • Use the standard Java APIs to create a buffered input stream, and read that.
    Accessing files is supported by the JDK, you don’t need JNA for this.

 

Suggestions 3 (I don’t see the use case but that doesn’t mean it doesn’t exist, of course):

  • Use JNA to open the file.
  • Use JNA to do normal C-level buffered reads.
  • Use JNA to copy each block that you read to your StringBuilder (or StringBuffer if you need it to be synchronized, but usually that’s not a good idea – you use StringBuilder and convert it into a String, which cannot have race conditions as it’s immutable).

 

Regards,

Jo

--
You received this message because you are subscribed to the Google Groups "Java Native Access" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jna-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jna-users/CANb1Va75uhq7fuRiX%3DunJHz%2B_WVg6_-Xzyi9490hMjweBm%3DKWQ%40mail.gmail.com.

 

Sensitivity: C2 Internal

The content of this e-mail is intended only for the confidential use of the person addressed.
If you are not the intended recipient, please notify the sender and delete this e-mail immediately.
Thank you.

Daniel B. Widdis

unread,
Jul 12, 2021, 4:51:52 PM7/12/21
to Java Native Access
  • Read the string and count bytes until you hit the NUL byte.

If I could read the string I'd just use it.  My issue is being paranoid about reading past memory I have access to.  How are you suggesting I read the string here?

  • Open the file normally, like you’d open any binary file.

I will try this using the RandomAccessFile and BufferedInputStream.  I'm not sure whether this will work but seems a reasonable approach if it does.  I think it would still encounter the same issues of being uncertain that I had read access at certain pointer offsets and not knowing how many bytes to read in, but it will certainly get around the whole off_t mapping issue, and I suspect the error handling is better.

  • Use JNA to do normal C-level buffered reads.
  • Use JNA to copy each block
Could you clarify what you mean by C-level buffered reads and blocks?

Tres Finocchiaro

unread,
Jul 12, 2021, 5:18:45 PM7/12/21
to jna-...@googlegroups.com
Open the file normally, like you’d open any binary file.

This was going to be my recommendation as well, I use it too for some small reads, but I too don't know what volatility and/or performance this would have for frequently changing items.

Daniel B. Widdis

unread,
Jul 12, 2021, 5:41:03 PM7/12/21
to Java Native Access
I just found this perl implementation of what I'm doing, and they read sysconf(_SC_ARG_MAX) bytes at a time from the pointer.  That's typically 2 MiB. 


Maybe I'm being too careful with my memory access

I also read enough documentation to convince myself all memory in the same page has the same access, so a modulo by page size could create a sanity check for that max value.

Seems that approach (read up to the end of the page) for numbers of bytes to read, coupled with a RandomAccessFile/ByteBuffer, may work.




--
Dan Widdis

Daniel B. Widdis

unread,
Jul 13, 2021, 11:31:20 AM7/13/21
to Java Native Access
To close the loop on this... RandomAccessFile did not work for /proc/pid/as.   However, I did find out through docs and experimentation that pread may return less than the buffer asked for, and in my use cases always returned exactly one page of data even when I asked for two starting at the start of the page.  I just allocated 2 pages of buffer in a Memory object, read once from the beginning of the page, and used the Pointer getters to fetch the data from the pointer offsets.
--
Dan Widdis
Reply all
Reply to author
Forward
0 new messages