Jump to content

The impossible crash.


Recommended Posts

From the crash dump.

 

0xFFF0 mov xmm0, dword ptr [edi + 24h]

0xFFF3 mov xmm1, dword ptr [edi + 28h]

0xFFF6 mov xmm2, dword ptr [edi + 2ch]

 

Unhanled exception at 0xFFF3. Access violation reading 0x28.

 

This should sound alarms already, but it can look like a bad jump until I tell you the following.

 

a) edi is not zero at the moment the dump was collected.

b) edi + 0x28 points to the correct value, based on the code this was compiled from and variables in memory.

c) Call stack is in good shape. Everything bellow this call looks consistent with this call.

d) This isn't a one-off. This crash happened multiple time at exactly the same instruction in exactly the same way, despite being loaded at different virtual and physical addresses.

Link to comment
https://gtaforums.com/topic/767240-the-impossible-crash/
Share on other sites

SSE alignment. That, or someone messed with page protection, and 0x28 somehow is at a different page than 0x24.

Edited by NTAuthority

These are unaligned moves, or this code would crash every single time. Pages are 4k long and 4k aligned, and page starting at 0x0 is always reserved for the OS. So if edi was zero, all three instructions would cause access violation. And edi does have correct address on it when the crash dump is collected.

 

If it was that easy, I wouldn't have bothered to post this.

 

P.S. Just for some context, these 3 moves are first 3 instructions of non-SSE vector addition code that got compiled with SSE optimizations. The code looks something like this.

 

static __inline void addVectors(float* v1, float* v2, vloat* r){    r[0] = v1[0] + v2[0];    r[1] = v1[1] + v2[1];    r[2] = v1[2] + v2[2];}
Compiler chooses to use xmm registers for intermediate storage, but makes no assumptions on alignment. Effectively, it just uses the SSE registers in place of normal FP registers in order to first load all of the values from v2, then all of the values from v1. Which is a good idea when you are optimizing for cache.

 

Naturally, edi is just a base address of a struct with a float[3] member.

These are unaligned moves, or this code would crash every single time.

Wasn't all too sure; exception parameter 2 was mentioning otherwise, but seeing obfuscated pointers it wasn't too clear.

 

Pages are 4k long and 4k aligned, and page starting at 0x0 is always reserved for the OS. So if edi was zero, all three instructions would cause access violation. And edi does have correct address on it when the crash dump is collected.

0x28 in this case was referring to your 'obfuscated' output (i.e. '+0x28'), and also 0x0 can be mapped from user mode using NtAllocateVirtualMemory prior to NT 6.2 (i.e. Windows 8 - though on these systems ntvdm.exe -- in 32-bit x86 builds -- has an exception for this purpose as, well, it needs to have the null page mapped for the low-memory IVT and related), and IIRC even through VirtualAlloc if passing a value between 0 and [platform page size] -- though I can't find my source on that.

 

Again, why would I assume page boundary straddling when these, if being absolute addresses, wouldn't go across one anyway?

 

P.S. Just for some context, these 3 moves are first 3 instructions of non-SSE vector addition code that got compiled with SSE optimizations. The code looks something like this.

static __inline void addVectors(float* v1, float* v2, vloat* r){    r[0] = v1[0] + v2[0];    r[1] = v1[1] + v2[1];    r[2] = v1[2] + v2[2];}
Compiler chooses to use xmm registers for intermediate storage, but makes no assumptions on alignment. Effectively, it just uses the SSE registers in place of normal FP registers in order to first load all of the values from v2, then all of the values from v1. Which is a good idea when you are optimizing for cache.

 

Naturally, edi is just a base address of a struct with a float[3] member.

 

... so it's actually code from your module at that location? An evil module being loaded into your address space doing whatever nonsense to your code? "overclocking" (since that post, a long-running oldnewthing 'meme')?

 

Also, mentioning physical memory in the first post - is this an embedded platform, or did you somehow end up with a kernel memory dump from a UM process crash? I was assuming Win32 seeing the 'retyped' message looking a lot like MSVS' exception messages, and the usage of 32-bit pointer registers.

Edited by NTAuthority

The actual crash happens at an actual 0x28. That's the address CPU was trying to translate for a read and caused a fault.

 

Let me just say it. There is zero chance that this could have happened without an interrupt being called. Instructions at 0xFFF0 and 0xFFF3 were evaluated under different machine states. The questions are, why the machine state got messed up? Why the interrupt happens at exactly the same place? And why the machine state is correct on the crash dump?

 

And yes, someone was almost certainly messing with that system. The code is unmodified, though. It matches a local dump and the sources.

 

P.S. I was considering a hardware fault, due to overclocking or any other problem, but the crash was reproduced several times. On exactly the same instruction.

Why the interrupt happens at exactly the same place? And why the machine state is correct on the crash dump?

Page faults are an interrupt, or you might've ended up single-stepping, and the dumping implementation may be incorrect, or in fact have been ran for a prior instruction.

 

 

Why the interrupt happens at exactly the same place?

Repetition of circumstances in complex systems can occur a lot - however again I have no idea how complex your embedded(?) platform ends up being, f.i. if you use a context-type structure to restore CPU state, and it includes debug registers, it may very well be you're single-stepping into this and triggering something else - again, it can be anything.

Page faults are an interrupt, or you might've ended up single-stepping, and the dumping implementation may be incorrect, or in fact have been ran for a prior instruction.

Something had to have triggered a page fault to begin with, and edi register state was clean until the interrupt happened.

 

Internal fault management is also done via Vectored Handlers, which would not allow such a state to take place.

 

For what it's worth, debugged flag in PEB was not set, but I do suspect that someone was attached to this process. Just not using a debugger API.

 

 

Repetition of circumstances in complex systems can occur a lot - however again I have no idea how complex your embedded(?) platform ends up being, f.i. if you use a context-type structure to restore CPU state, and it includes debug registers, it may very well be you're single-stepping into this and triggering something else - again, it can be anything.

We're talking a few dozen threads doing their thing in 1GB+ memory pool. I can see thread switching occurring on that same instruction every time, but someone still needs to ride on that thread switch to try and do something they aren't supposed to. Any machine errors associated with thread switching aren't going to be that consistent.

 

But if we are already going with hooks as a possibility, it's more likely that the address is consistent because they are trying to hook the same address. Say, by dropping an int 3 into the code. They're obviously trying to restore the state as the machine crashes, just don't restore registers in time. So int 3 could have been restored earlier.

 

I should, probably, check the single-step flag. But I doubt I'll find it, since other registers have been restored on the dump.

 

Edit: I should also take a look at the SEH block. It's tricky to hook Vectored Handlers from external process, and I'm not seeing any .DLLs that are not supposed to be there or that look suspicious in size. SEH, on the other hand, is easy to hook from anywhere. Maybe I'll find an int 3 handler in there.

 

Edit2: That was a good hunch. Right at the top of the SEH stack, an offset to an injected exception handler. No idea what it was meant to do, but there is no longer a doubt that this crash wasn't from "natural causes".

  • 0 User Currently Viewing
    0 members, 0 Anonymous, 0 Guests

×
×
  • Create New...

Important Information

By using GTAForums.com, you agree to our Terms of Use and Privacy Policy.