Friday, 8 August 2014

When explorer.exe, D:\oesn't play along

Yet another “case of the unexplained” type issue has found it's way to my door step,
unlike in previous instances, this time the client was able to point me
in a general direction providing a description of the issue.
He claimed that it was impossible to access the drive's secondary partition labelled "D:\".
For the sake of disclosure it was an XP machine that couldn't be upgraded
due to compatibility issues with the particular type of machinery it was operating.

 I set sail.

Sure enough, whenever I try to access the secondary partition ( D:\ ) explorer.exe
(the shell) crashes, no matter if the access was directly to the root of "D:" or to a folder
present inside the directory using 'Run' for example.
As long as explorer.exe was used in conjunction with "D:\" it would fail and restart.
(for example the 'dir' command, no involving the shell, didn't have a negative effect).

Naturally you would think that malware might be involved in some manner,
but the system was actively protected by well renowned engines
(which would remain anonymous) so that was of low probability
(later scans, after the fact, confirmed that the system was clean).

I (obviously) turned to the Sysinternals Tools,
my first choice was Process Explorer (procexp.exe) to get a better view of the overall system activity at a glance and possibly getting a clue of some sort.
Nothing popped-up or looked even remotely suspicious, okay leaving procexp open and monitoring I turned to trusty Process Monitor (procmon.exe).
With procmon up and tracing I crashed explorer.exe on demand
hoping to catch the error or the offending process in action.

After a couple of induced faults and spending a decent amount of time analysing
procmon's traces individually and side by side comparing between them
(a very effective and popular technique) following the “normal procedure”
of plausible cause filtering methods have left me with nothing.

I stopped, closed procexp / procmon and thought.

That is when it hit me, I have nothing! That's it!

No error message, no log, no fault indicator of any kind!
Which is why I turned to Procdump. A tool which I have grown to cherish very quickly.

I open a command prompt window (Win+R > cmd.exe).
Changed directory (cd\) to where I had placed procdump for convenience,
in the root of the primary partition “C:\procdump.exe” .
I set procdump monitoring for explorer.exe exceptions, capturing a full memory dump
(using the -ma switch), when it crashes (using the -t switch),
and place the dump file in a folder I have manually created in advance “C:\dumps\”.

 The syntax was as follows:
C:\>procdump.exe -ma -t explorer.exe C:\dumps\

Okay great procdump is all set and on guard.
I press “Win+E” launching explorer which defaults to “My Computer”,
I double click "D:\", halt and success!
Procdump spits out an error code, it was 0xc0000005 (Access Violation)
perfect, that in combination with the full memory dump
I'm positive I'm on the right track.

Carrying the post-mortem dump file onto my main debugging PC,
that still has Debugging Tools for Windows 7 (which still supports XP)
for cases such as these, with the good 'ol Windows Debugger or Windbg.
Double clicking the .dmp file (I am able to simply double click .dmp files
because I have set the correct file associations)
 
which can be set like so:

1. Elevated cmd.

 2. cd into the path where Windbg.exe is located in your system,
for example mine is “C:\debuggers”.

3. Run this command (no quotes) “windbg -IA”.
Windbg will then prompt you that the file associations were successfully
carried out then you may accept the prompt and exit the debugger.

Other wise, you have to open Windbg and click on file and choosing
“Open Crash Dump...”
or using the keyboard shortcut “Ctrl+D” .

 // End interlude.

Opening explorer's dump file generated by Procdump
revealed a very unusual cause.
In-line with the initial Procdump revelation
of an access violation the dump file, naturally, had more information to share.

After calling upon the debugger's internal heuristics engine by using “!analyze -v
the results pointed to no other then a codec,
the particular codec was “FFDShow” (MPEG-4 Video Decoder).
It showed that it was a “null_ptr_ref”, null pointer reference.
In short what happens is that the codec, for some reason,
tried to access an address in memory that it didn't own or didn't exist.

To maintain a fair playing field I didn't uninstall the codec
right off the bat in spite of the deeming evidence.
I achieved that by using another Sysinternals Tool called Autoruns,
which (as the name implies) allows you disable the automatic execution
of many components on your system with codecs being among them.

Launched Autoruns.
Disabled the entry for “ffdshow” and restarted to take affect.


*Windows XP logon sound takes me back*

Feeling confident, I navigated to "D:\", not a hiccup.

SOLVED.

// I ended up unistalling the codec entirely as it wasn't necessary
on that particular machine's usage model.

Sunday, 13 July 2014

Case of The Runaway Working Set

On an undisclosed date in the past I had a PC came in that was acting sluggish, as to the owner's "very detailed" description of the problem , he also said it started very recently.
I place on the working bench and set out investigating.

My first step is, you guessed it, Process Explorer!
At first everything looks to be in order.
No suspicious processies, no abnormal CPU utilization, temps look very good and the PC is being normally responsive in general.
Hmm, what might it be?
I cold boot the PC, then my very first step is again procexp, but this time I'm keeping a close eye on the working set / private bytes hit graphs.
They look normal as well, to begin with that is.
I notice that slowly but very steadily (too orderly!) the working set usage grow with a few tens of MB's at a time, accumulating over the course of 30~45 minutes

at that point the OS is starving for memory and the usability is practically none.

OK! now I need to figure out why this is happening.
Reboot.
Procexp.
Digging dipper.
I see that the service that is hogging all of the resources is svchost, that is no help.
I take a look inside at the threads tab, but no strange dll's or the like that might be

pointing to other software / service.
 

(spool was on the stack of one of the threads
but I didn't think anything of it as it's pretty normal).
Dead end. At lease with procexp.


Now I turn to my equally favourite tool, Process Monitor !
Procmon, launched.
Going through the massive list it generates trying to get a hint of some sort, something!

I start to notice a pretty consistent regkey being
queried by NOD32, this is much more helpful, I set out looking.
That regkey(s) files NOD32 is querying end with a .spl & .shd, I think to myself what the heck are those?!

A quick Google search reveals that they are temporary files that the OS uses to queue up printing jobs.
Great I'm getting somewhere!
I navigate to C:\Windows\system32\spool\PRINTERS and I try to open the folder, double click, wait a couple of seconds, no response.
The PC froze, hard-hang!
the occurrence was reproducible (regardless of the amount of memory leak it seemed to be at the time).
Alright I'm onto something here.
I restarted the PC, this time I booted into a Linux distribution to peek inside the folder,
Windows obviously can't help me there.

I navigate to the same folder and this time I am able to gaze upon the horrors of mistake in code!

Just to get things in perspective.
the .spl & .shd together average at about 3~200kb each in that particular folder.

The total size of the folder was scratching the 700MB range,
you can do the maths and figure out how many many tiny files were there.

Those files needs to be deleted shortly after if not immediately after the printing job had been carried out, they are spouse to be temporary!.

I manually deleted the files, restarted the machine and logged back in to Windows.
Lo and behold all is well!
No memory was being "leaked", great .
I also double checked if can now access the spool/PRINTERS folder and not surprisingly Windows is having no issues with that request, the folder is empty as I left if.
Now as to why and how it got to be this way in the first place so we'll know to avoid it.

I question the owner of the PC about his working habits (it was an office pc) and what are the tools he's using in he's day to day workflow and mainly what printers are directly connected to the PC and being shared to and from it.
He states that his main usage is not even local and he gets most of his work done via an RDP connection.

Armed with that information I go about looking at the printers installed and their drivers as well as on the RDP version. they are all very recent of not the latest.
I connect to the RDP server via a premade RDP shortcut placed on the desktop.

Nothing seems to pop out, at least at first.
I minimize the RDP connection window and had a look at the system with the RDP connection still active. I open procexp and note the usage of the working set (it was about 800MB give or take from a 2GB total) for future reference.
I once again navigate to the spool/PRINTERS folder to check up on it.
I'm seeing files being generated "out of thin air" ...
I sprung into action.
Launching procmon setting a filter to .spl & .shd files I notice they are coming from the RDP.
Looking through the printer jobs on the RDP is see
a huge queue list (and probably growing) on one of the printers.
I rebooted the PC and once again booted into Linux to empty the PRINTERS folder again.
Reboot.
Windows.
RDP.
Printers and devices.
I decide to disable the printer that is generating all of the jobs while keeping an eye on the local PRINTERS folder I opened in advanced prior to connecting to the RDP.
By the time I disabled the printer some files were generated but once I did the files stopped.
Jackpot!
I asked the owner if this printer is event physically present any more, the answer was no.
I uninstalled the printer and the machine was free of problems (well this problem) ever since.

I wish I could dig inside the RDP code or possibly the printer installation and figure out what went wrong, but that is what they pay people at Microsoft to do.