Sunday 13 July 2014

Case of The Runaway Working Set

On an undisclosed date in the past I had a PC came in that was acting sluggish, as to the owner's "very detailed" description of the problem , he also said it started very recently.
I place on the working bench and set out investigating.

My first step is, you guessed it, Process Explorer!
At first everything looks to be in order.
No suspicious processies, no abnormal CPU utilization, temps look very good and the PC is being normally responsive in general.
Hmm, what might it be?
I cold boot the PC, then my very first step is again procexp, but this time I'm keeping a close eye on the working set / private bytes hit graphs.
They look normal as well, to begin with that is.
I notice that slowly but very steadily (too orderly!) the working set usage grow with a few tens of MB's at a time, accumulating over the course of 30~45 minutes

at that point the OS is starving for memory and the usability is practically none.

OK! now I need to figure out why this is happening.
Reboot.
Procexp.
Digging dipper.
I see that the service that is hogging all of the resources is svchost, that is no help.
I take a look inside at the threads tab, but no strange dll's or the like that might be

pointing to other software / service.
 

(spool was on the stack of one of the threads
but I didn't think anything of it as it's pretty normal).
Dead end. At lease with procexp.


Now I turn to my equally favourite tool, Process Monitor !
Procmon, launched.
Going through the massive list it generates trying to get a hint of some sort, something!

I start to notice a pretty consistent regkey being
queried by NOD32, this is much more helpful, I set out looking.
That regkey(s) files NOD32 is querying end with a .spl & .shd, I think to myself what the heck are those?!

A quick Google search reveals that they are temporary files that the OS uses to queue up printing jobs.
Great I'm getting somewhere!
I navigate to C:\Windows\system32\spool\PRINTERS and I try to open the folder, double click, wait a couple of seconds, no response.
The PC froze, hard-hang!
the occurrence was reproducible (regardless of the amount of memory leak it seemed to be at the time).
Alright I'm onto something here.
I restarted the PC, this time I booted into a Linux distribution to peek inside the folder,
Windows obviously can't help me there.

I navigate to the same folder and this time I am able to gaze upon the horrors of mistake in code!

Just to get things in perspective.
the .spl & .shd together average at about 3~200kb each in that particular folder.

The total size of the folder was scratching the 700MB range,
you can do the maths and figure out how many many tiny files were there.

Those files needs to be deleted shortly after if not immediately after the printing job had been carried out, they are spouse to be temporary!.

I manually deleted the files, restarted the machine and logged back in to Windows.
Lo and behold all is well!
No memory was being "leaked", great .
I also double checked if can now access the spool/PRINTERS folder and not surprisingly Windows is having no issues with that request, the folder is empty as I left if.
Now as to why and how it got to be this way in the first place so we'll know to avoid it.

I question the owner of the PC about his working habits (it was an office pc) and what are the tools he's using in he's day to day workflow and mainly what printers are directly connected to the PC and being shared to and from it.
He states that his main usage is not even local and he gets most of his work done via an RDP connection.

Armed with that information I go about looking at the printers installed and their drivers as well as on the RDP version. they are all very recent of not the latest.
I connect to the RDP server via a premade RDP shortcut placed on the desktop.

Nothing seems to pop out, at least at first.
I minimize the RDP connection window and had a look at the system with the RDP connection still active. I open procexp and note the usage of the working set (it was about 800MB give or take from a 2GB total) for future reference.
I once again navigate to the spool/PRINTERS folder to check up on it.
I'm seeing files being generated "out of thin air" ...
I sprung into action.
Launching procmon setting a filter to .spl & .shd files I notice they are coming from the RDP.
Looking through the printer jobs on the RDP is see
a huge queue list (and probably growing) on one of the printers.
I rebooted the PC and once again booted into Linux to empty the PRINTERS folder again.
Reboot.
Windows.
RDP.
Printers and devices.
I decide to disable the printer that is generating all of the jobs while keeping an eye on the local PRINTERS folder I opened in advanced prior to connecting to the RDP.
By the time I disabled the printer some files were generated but once I did the files stopped.
Jackpot!
I asked the owner if this printer is event physically present any more, the answer was no.
I uninstalled the printer and the machine was free of problems (well this problem) ever since.

I wish I could dig inside the RDP code or possibly the printer installation and figure out what went wrong, but that is what they pay people at Microsoft to do.