Thursday, May 31, 2007
CRN Australia today just published an article "Vista, XP users equally at peril" detailing Test Center engineers validation of my statements.
Reading through their tests, they seems to systematically evidence each of the points I made in my post, with a couple of exceptions. They made no mention of patches, but when performing a comparative test of 2 operating systems, I don't think there are any practical tests to be performed (and it's too early to start doing statistics on average number of patches per month/year/whatever).
Of their 6 tests, I only missed one - Test 6: Signatures & Phising Filter. Y
Yep, Vista has a slight edge here (not all of IE7 functionality is back-ported to XP), and this is were I missed the boat a little bit in my recommendations. I should have put something into my list to make sure you're running a newer browser than IE6 (either IE7 or Firefox). Personally, I'd recommend both - the reality is that there are still fewer exploits for Firefox than IE, so it makes a solid primary browser, but many sites still need IE (and lots of businesses apps too).
I've needed IE on government web sites like the Ontario Ministry of Transportation for credit card payments - for some reason I couldn't get it to work with Firefox. People are still (and always will) prioritize which browsrs they code for based on install base - the largest base is still IE.
I would also point out that IE's Phishing Filter is not your only option here. NetCraft toolbar has been doing this for a much longer time.
Their 5th test leaves something to be desires, though. They tested flaws with image files, spoofing & scripting. Most of this has to do with malformed data formats. I'm not aware of any products that test for valid formatting of yet (largely because most file formats rely on relatively "loose" standards), so everyone still needs to rely on AV, anti-spyware/malware, and buffer overflow prevention to prevent or detect exploitation through these paths.
rG0d
Monday, April 9, 2007
Credit Agencies - The Ultimate Scam
I'd be perfectly willing to forgo personal notification of the theft of credit card numbers. I just don't think it's that important, and the liability lies with the banks and the merchants. In contrast, the outcome of my SSN being abused falls back to me, in credit reports, false arrests, etc.
I don't particularly agree with this statement, but what really caught me - as it has in the past - is the ridiculous amount of power the credit reporting agencies have over our lives. Credit agencies have performed data-mining almost since before data-mining had a name - and the amount of financial data available to them is provided, typically without your clear consent by your bank, credit card company, mortgage company, car loan company etc...
What's worse is that, if you ever want a loan (or even an apartment), you're required to provide access to this data as a reference that your credit is good.
Even worse than that, is there are virtually no controls over what people (malicious or otherwise) can report about you to the credit companies. There are "appeal processes" in place with these companies to have invalid entries "expunged" from your credit rating - but if you've ever gone through this process, it is ridiculously arduous and the invalid information never truely gets removed from your rating - it still appears as a reported item but, at least in theory, isn't used to calculate your "risk rating".
Now, if you go to the web pages of the credit reporting agencies, the first thing you see is the ability to watch your own credit rating for a fee (this is at least true for Equifax=$12.95/month and Transunion=$9.95/month).
In my opinion, these companies are highly responsible for the lack of controls on personal information, because there are no meaningful controls. As far as I know, any controls that are in place, are largely "self-regulated" since there has been no significant backlash from the average proletariat who has been largely abused by their "services", so the government hasn't bothered.
To my mind, there are several controls that should be federally enforced for these types of companies:
1. Requirement to have a method to contest an item on your statement such that the company that provided the originating information is required to substantiate their claims - right now, you are on the hook to prove to them that the statement is inaccurate.
2. Requirement to have this information permanently expunged from all of their reports.
3. Requirement for you to sign a separate agreement with your loan company (or whoever) stating that you agree to allow them to share information with CreditRatingCompanyX - after all, for them to access this information you need to sign a separate agreement (one of the few government-enforced regulations).
4. Requirement to share this information with the owner for free - it is after all, information about you. (As an aside, I wonder if they've ever been sued for slander for sharing inaccurate information??)
I really think this would have a positive impact on identity theft because it would make people aware that this information exists about them, and a pro-active method for people to prevent abuse of their credit.
Friday, March 30, 2007
Blog Redirects
I've never been a big fan of people who post blogs that just point to another blog posting, essentially reiterating the original point. I've always thought the motivations behind these can too often be lame attempts at name recognition, increase hit-counters (ad-counters), or just the "I want to be part of something bigger" that I feel permeates the blogosphere.
I do feel there is some merit in linking to a blog if you're refuting someone else's blog entry, or referring to them to enforce a point of view, but all too often I feel these "blog redirects" are just an attempt to increase hits, with little to no additional substance involved.
This is one of the reasons I haven't put ad-banners on my page. I decided to blog as an outlet, not as a money maker or for industry recognition - hell, I don't even use my real name. The anonymity is something I enjoy, not because I can say stupid shit without people being able to hold it against me (those of you who know me socially know this is the last thing I worry about when my lips move). I write these little tidbits so that I can share some thoughts that otherwise would need to be sensored for various reasons, or I just believe I have something valuable to contribute.
I will admit to having a bit of a soft-spot for humour though (shameless redirect to something that had me laughing http://www.vitalsecurity.org/2007/03/browser-condom-opinion-split.html). I think sharing someone else's great (and especially funny) idea is good, and it gives credit to the person who came up with the original content. I am NOT going to provide links to the type of content I'm talking about - that would be somewhat hypocritical - but if you're reading this site (which gets very little traffic) then you read enough blog entries to know the type I mean.
My point is this:
If you blog, and you are talking about someone else's article, please make sure you have something MEANINGFUL to add, don't just rehash (sometime badly) what the original author wrote, then link to it. Alternatively, if you really liked an article, a nice suscinct intro to a "good article about..." is usually enough - don't try to make the idea yours.
rG0d
Tuesday, March 27, 2007
New Toronto Security Conference
As it turns out, that's about to change with a new conference this fall in Toronto: SecTor, which stands for "Security Education Conference - Toronto".
Thought I'd share this with everyone out there since I'm involved with helping set this up. Before the "self-serving" idea pops into anyone's head, I'm not getting any money out of it - I'm spending my own cycles on helping out with the conference because I think it's a GOOD IDEA, and much needed in the GTA technology space. (Okay, I'll probably get a free pass out of the idea, but mostly I'll be on-site lending a hand with setup, etc.)
The conference will be held on Nov 20-21, 2007 at the Toronto Convention Centre. There are already a couple few good speakers lined up including Mark Russinovich, Joanna Rutkowska, Johnny Long, Dan Kaminsky, Mark Fabro, and Ira Winkler.
Currently, there's a call for papers on the site, and registration should be opening in the next couple of days. (If you're a vendor, SecTor is also looking for sponsorship to help cover costs, so you can either hit the link on the website, or you can drop me a comment)
http://www.sector.ca
If you're in the Toronto area (or plan to be around that time), check out the site. If there's any other info not on the site, please let us know - again, there's contact info on the site, or you can drop me a comment.
Friday, March 23, 2007
Whitehouse Directive 2
UPDATE: FLASH REPORT ON THE WHITE HOUSE SECURE CONFIGURATION MANDATE
The White House posted a second memo last night, confirming its mandate that all federal agencies must use secure configurations if they choose to deploy systems that run Windows Vista or XP. The latest memo was signed by the top executive in US government management, Deputy Director of OMB, Clay Johnson and is posted at the White House site, http://www.whitehouse.gov/omb/memoranda/fy2007/m07-11.pdf . The original (March 20) memo from Karen Evans to Federal CIOs is now posted at http://cio.gov/documents/Windows_Common_Security_Configurations.doc .
This initiative matters because it provides the incentive ($65 billion in US government IT purchasing each year) and confidence (agreed upon
configurations) to allow every software vendor to ensure and affirm the software they sell works on the secure configurations. That takes the pain out of secure configuration and rapid patching.
On April 11, federal CIOs and their senior staff will be briefed by the Air Force and OMB and NSA seniors on how to take advantage of the new mandate, and the lessons learned in the Air Force pilot implementation involving 575,000 computers. We will ask permission to make the essence of those briefings available to the entire security community, because this initiative will affect every medium and large buyer of computers running Windows software.
Alan
Also, the "SSLF" configuration standards referred to in the original SANS posting are for the "Specialist Security - Low Functionality" security templates produced by Microsoft for both XP and Vista.
Links to both the "Windows XP Security Guide" and "Windows Vista Security Guide" can be found here: http://www.microsoft.com/technet/security/guidance/default.mspx
Wednesday, March 21, 2007
Whitehouse Directive: All systems acquisitions must run on Hardened Configurations
FLASH ANNOUNCEMENT: The White House just released (at 9 AM Tuesday, March 20) a directive to all Federal CIOs, requiring that all new IT system acquisitions, beginning June 30, 2007, use a common secure configuration and, even more importantly, requiring information technology providers (integrators and software vendors) to certify that the products they deliver operate effectively using these secure configurations. This initiative builds on the pioneering "comply or don't connect" program of the US Air Force; it applies to both XP and Vista, and comes just in time to impact application developers building applications for Windows Vista, but impacts XP applications as well. No VISTA application will be able to be sold to federal agencies if the application does not run on the secure version (SSLF) of Vista. XP application vendors will also be required to certify that their applications run on the secure configuration of Windows XP. The benefits of this move are enormous: common, secure configurations can help slow bot-net spreading, can radically reduce delays in patching, can stop many attacks directly, and organizations that have made the move report that it actually saves money rather than costs money.
The initiative leverages the $65 billion in federal IT spending to make systems safer for every user inside government but will quickly be adopted by organizations outside government. It makes security patching much more effective and IT user support much less expensive. It reflects heroic leadership in starting to fight back against cyber crime. Clay Johnson and Karen Evans in the White House both deserve kudos from everyone who cares about improving cyber security now.
Alan PS. SANS hasn't issued a FLASH announcement in more than two years. IOW this White House action matters.
This is hugely significant to the security industry. This means that any application that wants a hope in hell of selling their product to US Federal Agencies of any sort must certify that their software will run under the US Gov'ts secured platform configurations. While I believe June 30, 2007 is too short for many existing projects to possibly accomodate (some acquisitions will occur that don't fulfill this directive), this is a MASSIVE step in the right direction - sorry, I just can't emphasize enough how I feel on this topic :)
This will have a direct impact outside US Gov't as well, especially enterprises who typically use many of the same tools as government, and might (gasp!) finally allow Microsoft to have their default installations applied in a truely hardened mode - currently they still "tone down" some settings for compatiblity or end-user usability issues.
I whole-heartedly agree with Alan Paller" "Clay Johnson and Karen Evans...deserve kudos from everyone who caresw about improving cyber security now"
...Let the vendor-scramble begin... :)
Tuesday, January 9, 2007
Dealing with Logs (Part 1): What Vendors don't tell you about Log Management
So this blog will be dedicated to reviewing the path I followed in my learning - I hope there are a few "gold nuggets of wisdom" that can be gleaned along the way.
Frustration & Misinformation
After looking at a few different products, their capabilities and extensibility, I became frustrated with the lack of information the vendor "technical leads" really had on how the products worked, what their limitations were, and the fact that they all said their products were "infinitely scalable". Does anyone actually believe that a product is "infinitely scalable"? I hope it comes with a silver bullet.
If the "technical leads" can't explain, in terms I can understand, what differentiates their product from another, how can I ever make an educated decision? After all, they're the ones who understand this log management space, right?
Lack of Understanding
I took this on more as a frustrated challenge than actual "fun". I wanted to prove, to myself at least, that I could probably do as much (or nearly so) as any of these vendors could with the $1 Million dollar price-tag software they were pushing. After all, Syslog has been in use for a couple of decades, right? How tough could this be?
As it turns out, you CAN do a lot with free tools - but not everything - or at least not yet. Even if I can't do everything, I've learned a lot (and am still learning). With all the work I've put into this so far, I hope this info is worth sharing - so if you are buying $1Million dollar piece of software, hopefully this will help you know what to look for, determine what you need, and be able to tell the vendor when you think they're full of crap.
As I said, this posting is going to delve into the complexities of Log Management. I'll explain, briefly, what log correlation is, but I won't get into that just yet - that'll be another post. There's a lot of info to share already.
Correlation vs Consolidation
Log correlation is the act of taking logs from disparate sources, and combining the information together in some meaningful way to help determine the underlying problem. For example, a simple failure event on a firewall may be commonplace, and something you'd typically ignore as background noise; but if it's coming from a UNIX box that just had someone successfully logon remotely, it may be more meaningful - perhaps even a compromised box. That's rather simplified, but I hope you get the idea.
Log consolidation is simply that - getting all logs into a common repository.
They are complementary technologies, but the primary goals are vastly different since. With correlation, not all of the informatin is necessarily maintained, just the correlated result. Most IDS/IPS systems work in a similar manner - entire packets aren't maintained, just the fact that the packed matched some signature.
A common misconception is that log correlation is a superset of log consolidation. After all, if correlating logs, the system doing the correlation has to get data from all the sources it's correlating, so it's very similar to a repository of data. Unfortunately, it's not that simple - but I'll have to address that in my future correlation blog. I will say that I believe correlation to be much more complex than collection and leave it at that for now.
Consolidating the Logs
My journey along the the log management path started with the relatively simple (though non-trivial) task of log consolidation. In it's simplest terms, all that's necessary for log consolidation is to get the data from a remote source and write it to disk.
From a performance standpoint, a ridiculous amount of data can be handled by pushing all of this data to a flat file. I can easily handle tens of gigs per day, even on a relatively slow server using this simplified architecture. The primary bottleneck in this scenario is Disk I/O - assuming the disk is not big$$ disk arrays. Typical server RAID arrays can handle Megs of data per second. From a performance standpoint, no big problems so far.
It's when trying to READ this data, however, things start to go downhill; and, let's face it, the data is being collected for a reason. With this volume of data in a flat file, though, it's like looking for a needle in a haystack if to answer something as simple as "who caused this account to get locked out", or "are all these SSH sessions coming from one (misbehaving) host, or is someone trying to break in", etc.
So, being the rational sort, I ask "Why not just push all the data into a database to allow for easy querying and reporting?". This should resolve most of my READ issues by limiting the amount of data I need to pour through.
Now, with databases, a few more issues came up:
- transaction times
- indexes
- parsing
- data archival
- maintenance
As I already mentioned, relative to the rest of the system, disk is slow - databases are even slower. Why? Well, this gets even more convoluted to understand, so please bear with me as I explain some technical details while I go through the description of my trials.
Transactions
Because of the transactional nature of databases, all events involved in a single "transaction" against the database either all succeed or all fail. So, to keep it simple, if I'm INSERTing a new record into the database, the database needs to first "stage" the data to a temporary location - called the transaction log - in case the insert fails (cancelled, out of disk space, database not available, etc.). Then, the database software will periodically say "all the transactions that have successfully completed in the transaction log will now be written to the actual database and expired from the transaction log". That's a lot of disk management and data movement, especially since most event records will be less than 1024 bytes; at least 3 disk writes take place, one to the transaction log, one to the actual database, and one to expire the transaction log data once written to the database. The nature of the beast means that this is SLOW, so databases reduce the number of events per second the system can handle.
Indexes
The whole point of writing the data to a database was so that we could search through it quickly rather than scanning Gigs of log files for individual events and then trying to pick out the relevant ones. That means indexes are needed for fast search, which translates to analyzing the data for sort-order, and writing the index. Index updates are also transactional, and all of this takes up CPU time and disk I/O.
Parsing
Most events found in logs are full-text items and look similar to the following:
"June 10 2006 3:02pm Event ID 32: User XXX logged on from Y.Y.Y.Y IP address" or
"July 11 2006 4:04am Event ID 108: Firewall allowed TCP port 80 to destination A.A.A.A from source B.B.B.B".
Short of performing substring searches for every query run, or creating a full-text index (not recommended when dealing with millions or billions of records), the relevant pieces of information need to be parsed out and store in the database.
So I'd want to parse out the following values for the first fictitious event above:
June 10 2006, 3:02pm, XXX, Y.Y.Y.Y
For the second event, I'd want to parse out:
July11 2006, 4:04am, 108, TCP, 80, A.A.A.A, B.B.B.B
This takes up CPU time for each and every event, and also requires architecing a database format that allows for multiple strings associated with a single event - not a flat database table. And I'd have to parse these out FAST, because there are more events coming in all the time.
There are some trade-offs that can be made, like parsing out just the Event ID number, and date/time stamps, and indexing just those. Then, when I need to look something up, I can at least narrow the criteria, and then do substring searches. In fact, I do exactly this all the time - it works very well, but does have limits when I'm creating a report making use of a very large number of data records.
Data Archival
Archival caused me quite a few problems with database management - mostly because I was using MS SQL Server 2000. I didn't understand, at the time, how important a function called "partitioning" could really be - something many database systems have, but not introduced into MSSQL until SQL 2005.
Obviously, I couldn't just let my database grow forever. I had to delete old data after a while, and I wanted to be able to put it to backup tape so I could restore it later if needed. Restoring database data without restoring an entire database is a whole new art-form that DBAs and backup software companies have been dealing with for a long time, but as it turns out, that's not something I needed to get involved with. I simply wrote the data to a flat file (remember, there's very little overhead to doing this) as I wrote it to the database. No more need for archival since I'm creating the archive as I insert data into the database.
I still had the problem of expiring/deleting the old data, though. Remember, databases are transactional - more disk writes, more CPU, more memory used - even for delete activities. What's worse, I'm trying to delete a LOT of data all at once, not just simple transactions like writing individual events to the database - I'm talking 20 million records per day.
Whenever this task ran, all the events coming in from my remote devices would get queued up (because the database is busy with a huge delete event) and I'd eventually run out of physical memory due to all the queued events and the collection process would crash. There was also a disk space issue since my transaction log was storing millions of records during this delete process. So, breaking this down into smaller chunks (let's say, one hour at a time) helped, but this could still take several minutes - causing more event queueing. I couldn't guarantee memory wouldn't run low especially if a system was really under attack - and I really didn't want to lose the data during those instances.
Here's the advantage of "partitioning". Partitioning allows chunks of data to be stored into separate database files rather than one large file like traditional databases. Then, an entire partition file can be dropped rather than trying to delete chunks of data from a single file. So, if by creating daily partitions, it's pretty easy to drop any single day of data - there's still some performance considerations, but not nearly as bad.
SQL 2005 wasn't available at the time, and I didn't have the luxury of moving to Oracle. So, I created a batch job that was scheduled to run every hour, which deleted all the records older than 30 days in chunks of 100,000 records. Even though it specified data older than 30 days, since this runs hourly, it's really only deleting 1 hour worth of data during any typical run. Further, since I'm deleting it in 100,000 event chunks, then pausing for a few seconds, it gives other processes time to access the database (less queueing). There are a few other optimizations to this technique, but I've already documented most of that here on the Adiscon site (I was using WinSyslog at the time I developed the T-SQL Script).
So far, so good - but every now and again, maintenance still needs to be run (update indexes, shrink files, make sure there's no corruption, etc).
Maintenance
Maintenance is a brutal time consumer of SQL databases. MSSQL 2000 promises being able to run maintenance on a live database, which is true, but the same maintenance run against a live database runs 18 hours vs. 2.5 hours if the same database is off-line. These times were what I experiences on my hardward - these time will vary depending on how many and which maintenance tasks are being run. I'll be honest that I haven't revisited this aspect of MSSQL 2005 yet to see if there are any significant improvements, but they would have to be very significant to improve by 15.5 hours.
The best I could do
The best I could ever get was a sustained rate of about 1,800 events/second with everything else happening. I would get periodic spikes which would cause backlog, but these would finish up during low-usage times and the system would continue running.
But this wasn't enough for me. Trying to insert data, delete data, and run maintenance with the volume of events, all in real time, did me in. I simply couldn't live with just 1,800 events/second on average. I would get hours at a time where I'd see 2,000+ events/second.
Time to Re-evaluate my Priorities
Big question: Do I need to have all of this data into the database real-time?
For me, the answer was "no". I was doing next-day reporting and forensic analysis with the data. I know I could easily pull all of this to a flat file. So, for me, the solution was to do a nightly batch INSERT of all the events in my flat file. As long as maintenance and delete scripts weren't running, this worked really well, and allowed me to run all the reports I wanted to. I even created a modified version of my import script which would allow me to do a bulk-import of the files restored from tape during an investigation. This method also allowed me to focus on database optimization so queries would run faster, rather than just keeping the database running at all.
So, I had 4 scheduled tasks on my server:
1. Hourly data delete (impact of doing this throughout the day is low, and prevents any nighttime bottlenecks). This would check to ensure none of the other 3 tasks are running first, so sometimes it would actually delete 2-3 hours worth during some cycles.
2. Nightly import script - at the end of which I update the datetime index used regularly by Step 1 - I found this make Step 1 run better each day. The script also makes sure that the next 2 scripts aren't running first.
3. Saturday mornings I do a full reindex (not just the datetime index).
4. Sunday mornings I perform a database defrag & shrink operations. This has a significant impact on improving performance.
There was no need for integrity checking since the database could always be rebuilt from the flat files, so I also set the database to a "simple" recovery model rather than the "full" recovery model when I was doing only direct-to-database logging.
But I still had a really big question...
How do the vendors manage to resolve all of these problems?
I started this article off by highlighting 3 problems I was experiencing: Misinformation, Frustration, and Lack of Understanding.
They say "forewarned is forearmed" - and after all my work - I was ready to talk to vendors again.
At this point, I knew exactly what to ask to ensure I received clear answers; I knew what I needed the product to be capable of (including data volumes which is important); I knew how to get an answer that made sense to me rather than "smoke and mirrors" answers that I'd received before. My belief, at this point, is that any database-driven solution won't handle my 10+GB of data per day (and growing) without some VERY serious hardware on the back end which would make the solution cost prohibitive.
But some vendors claim they can collect tens-of-thousands of events per second - and still allow for fast querying & reporting - even though they're just and appliance. How could that be?
They maintain flat files, with database indexes.
This makes perfect sense to me. Speed of flat files for storage, and using database-like indexes for fast searches. This also requires very little maintenance since, in a worst-case scenario, the index can simply be rebuilt. I'm sure though, that there would need to be some special handling routines for detection of corrupted files, etc, but overall a sound solution. This also makes restoring archived data very fast since it means just re-creating an index once the flat files are restored.
Some vendors - probably all of them handling large volumes - have been down the same path as I have, and learned the same mistakes I have, and have developed this hybrid approach. To my knowledge, there is no open-source solution performing in this manner, but unless very high volume collection is required, one of the above approaches may already work.
I really hope my sojourn down memory lane helps others to avoid some of the same pitfalls that I've experienced - and hope you've learned something that will help you. Keep in mind, I've only talked about log collection this time around.
Don't think, though, that this is the only thing that differentiates the vendors from home-grown, or even from each other. Very few logs are standardized, and definitely no standardization between competing vendor products, so event parsing can be a lot of work. There's also correlation, reporting & analysis, which are all hefty topics.
But, there's plenty of internet whitespace left to discuss these topics in the future...