Cookies and PHP: sync framework

I find it stupendous that a single application can bring a decently modern i7 quadcore with 4gb of RAM to its knees. The app in question is our custom product that I'm helping work on. It stores data locally on the client in SQL Server with Entity Framework, syncs with a server database via Sync Framework, and renders itself with WPF. It's simply a beast. Solution explorer in Visual Studio shows about 25-30 projects in the solution and each project has anywhere from 20 to 200 classses, with quite a few classes weighing in at over 1,000 lines of code. The whole suite easily gobbles up 2gb of memory on its own, while the remaining 2gb are consumed entirely by Windows 7, Visual Studio 2010, and this browser.

Now I mostly blame Entity Framework and only slightly WPF. Entity Framework is a discordant combination of being easy to use and while at the same time extremely difficult to master. On the one hand it's pleasantly easy to create the schema and start using the entities in code the very same day (and by the same token also easy to fall prey to lazy practice that apparently will do a number on your performance), while on the other hand, it's very difficult to reign in the effects of this ease-of-use and the results can get out of hand. We have had several teams analyze performance from different levels - from the overall architectural choices made early on and down to the details of how views are written, entities materialized, and so forth. We've certainly made quite a few improvements, but the biggest of these came not from improving the efficiency of the data layer, but simply by wrapping the data operations in asynchronous calls. WPF shares a portion of the blame here as well. The queries, while slow, typically don't account for more than half or three quarters of the time a user spends wishing he was at the dentist instead. No, the rest of the time is spent rendering huge data grids. Why am I letting WPF off the hook easily you ask? Well, the problem here is largely attributable to design choices, such as caving in when the customer asked for data grids that show hundreds of rows of data with an equal number of columns. Another portion of the blame can also be leveled at Telerik, which while providing beautiful and functional controls is at the same time another dirty pig in this mud bath the user has to slog through. Nevertheless, I still can't help but blame WPF for some of the performance problems when I can see scroll bars chug along and occasionally see the entire UI lock up for entire seconds. But I shouldn't be so hard on this application since it really has sped up a lot over the last year. It was entirely normal to have to wait upwards of 30 seconds for a single tab to open and display nothing more complex than a simple message inbox. I am aware that care is required in tweaking the display of grids, but that's exactly the problem in my opinion. The whole technology stack we're using is extremely simple to use and yet infuriatingly, devilishly difficult to tune and tweak.

When I say this app dwarfs iTunes and makes Visual Studio feel like Notepad, I'm not kidding folks. I also admit that the app deals with huge amounts of data due to its "enterprise" nature. There is verbose logging being done in triplicate to the hard drive, the database and a logging service, which in a manner of speaking could be considered necessary since each log records different variations of the same events for different audiences. No data is being deleted and even our sparse test data set has bloated the client database to over 9gb, though most of this is transactional logging (exporting the same data sans transactional data yields a file that only weighs 300mb). Myriad small things have crept into the design and architecture of the application to meet very specific and sometimes narrow minded business requirements that were assembled by different business units within the same client organization. These requirements are often impossible, costly or just painful to implement in light of oftentimes conflicting or even duplicate requirements that originated from different business units. Lastly, the rigid and literal interpretation of design documents by the testing teams combined with unrealistically short turnaround times required for bug fixes has resulted in many suboptimal solutions to problems that never existed. I'm going to skip over the half dozen or so integrations to third party systems that are at the best of times brittle and non-functional, but I hope I've drawn a clear picture of the problem.

Microsoft's Sync Framework is a fragile creature, and as such doesn't like being abused with data. Today I ran into a problem with our app trying to sync a record that had a field of 8266 characters, which is over the (apparent) 8192 character limit. To simplify the problem of finding which record out of millions scattered across hundreds of tables with dozens of columns, I first hit google like any lazy coder.

This thread got me started by giving me a base for searching a single table for the longest column:
http://www.dbforums.com/microsoft-sql-server/1069468-possible-get-max-length-text-field.html

...but I needed more.

So I wrapped that thing in another cursor, and came up with this:

DECLARE @table SYSNAME, @field SYSNAME

DECLARE b_cursor CURSOR STATIC
FOR
SELECT name FROM sys.tables WHERE name not like '%tracking' -- customize table SELECTion here
OPEN b_cursor
FETCH NEXT FROM b_cursor INTO @table
WHILE @@FETCH_STATUS = 0
BEGIN

CREATE TABLE #Temp (TableName SYSNAME, ColumnName SYSNAME, MaxLength INT)

DECLARE a_cursor CURSOR STATIC
FOR
SELECT name FROM syscolumns WHERE id = object_id(@table) and name like '%history%' -- customize column SELECTion here
OPEN a_cursor

FETCH NEXT FROM a_cursor INTO @field
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO #temp
EXEC ('SELECT ''' + @table + ''', ''' + @field + ''', max(len(' + @field + ')) FROM ' + @table )
FETCH NEXT FROM a_cursor INTO @field
END

CLOSE a_cursor
DEALLOCATE a_cursor

SELECT * FROM #Temp

DROP TABLE #Temp

FETCH NEXT FROM b_cursor INTO @table
END
CLOSE b_cursor
DEALLOCATE b_cursor

--DROP TABLE #Temp

-- select from a table to see the record with the largest field
SELECT * FROM (SELECT Id, len(History) AS len FROM WorkOrders) AS a ORDER BY len DESC

The guts of it is the EXEC statement that queries each column of each table. Note that I've put in comments where you can customize the queries to filter tables and columns by name, etc. At the bottom is a query that shows how to query a table for records and sort by the record size. Emjoy

** Note: If it complains about the table #Temp already existing, just drop it by running the commented drop statement towards the end.

Cookies and PHP

Friday, February 10, 2012

The Cost of Complexity

Tuesday, September 13, 2011

A (Huge) Needle In A Bigger Haystack