Saturday 31 May 2014

Drives getting filled up soon because of the Usage Data–THE TIMER

I recently faced this problem in my production farm where the D: drive which we usually store log files from various technologies running on SharePoint server, gets filled up too soon. We have a directory called D:\LogFiles which has various subfolders for storing log files that get created from SharePoint, IIS, log4net (custom logs) and also the Usage data generated by health and usage feature of SharePoint. The drive was almost 50GB in size and at any point of time the complete LogFiles folder generally will be of size 20 GB in total. The majority of those logs will be ULS because we have enabled verbose logging.

When we got a disk space low alert from our monitoring solution for D: drive, we initially thought that the culprit was IIS logs since the SharePoint logs and the log4net logs are configured for rotation. We cleaned up majority of those logs and even cleaned up old, unused setup files etc from D: drive. However, the problem started to repeat again and there is nothing else to delete from that drive.

We did a deep dive to find out that D:\LogFiles\UsageData contains large number of files and they were never getting rotated. We know we have enabled health and usage data collection and it should collect the data but it should not store the data on the drives. Also other servers in the farm (total of 7 servers in the farm) were not affected. That had given us a clue that there was something wrong with THE TIMER service.

We know the timer is the most important service of the SharePoint architecture. It makes sure all tasks with in SharePoint farm are carried out at the designated times. I really think of it as the heart of the SharePoint body since it mobilizes the rest of the SharePoint tasks and activities in the farm of timer jobs. If the timer is not working, multiple things start happening on the server and the farm like the problem described above. There is a timer job called Microsoft SharePoint Foundation Usage Data Import which will actually import all the usage data into the database and deletes those temporary files from the server. I checked the timer services from the Services console but it was running. However its not doing its job properly. So we restarted the timer service, which cleaned up the usage logs like eating up a piece of cake. Once again, the importance of timer service is understood.

Hope this helps in your troubleshooting and thanks for reading!!