Tuesday, December 19, 2006

GA - Making Use of the CS-MARS Archive

As promised the first of the Cisco MARS Blog Guest Articles.

You may have been on the Cisco Website Networking Professionals Connection Discussion Forums, and asked or looked at the MARS questions.

I can guarantee the author below Matthew Hellman has either provided the answer or provided some input into the solution to virtually every question! Matthew works for a fortune 250 financial services company in the US, and has been using the MARS product since it was purchased by Cisco in late 2004.

In his Article, he explains more on the Archiving process in a busy network, and how you can create a 3rd Party Application to make use of the data that has been archived onto a NFS server.

Making Use of the CS-MARS Archive

The following information is based on my own personal experiences with CSMARS. Many assumptions have been made based on observed behavior that may be dependent on my particular environment. YMMV.

Key Benefits of Archiving

Even in the most modest of CSMARS deployments, a considerable amount of effort is required to initialize, tune and maintain the configuration of the appliance. The primary reason for enabling the archive functionality in CSMARS is so that it can be used to recover configuration data in the event of a disaster. This alone should provide enough incentive to deploy an archive server and enable the functionality. There is however a secondary and perhaps equally important reason for enabling the archive functionality; it is currently the only way to guarantee that raw events will be saved for a specific length of time.

Many new users don’t initially realize that raw events, events, sessions, and particularly incidents in CSMARS don’t necessarily stick around very long. They are considered “dynamic” data and their lifespan is dependent on the MARS model and how busy the network is. When the database starts to fill up, the oldest dynamic data is purged. For example, dynamic data in one of our MARS 100e appliances, operating at near capacity, has a lifespan of less than 3 weeks. There are many variables that impact this lifespan, so as usual YMMV.

The only way to make dynamic data “permanent” is to assign it to a case (technically this copies the information, leaving the original dynamic data to be purged). However, this still does not save the raw events.

If you attempt to click the raw event link in a case [that is older than the lifespan of dynamic data], you will get the dreaded PSOD(pink screen of death):

Quick & Dirty Raw Message Search

With archiving enabled, you will have all the data required to recover an appliance in the event of a disaster. In addition, raw messages will be archived for a known length of time. That’s great, but how does one go about querying the raw messages? Unfortunately, CSMARS doesn’t provide an interface for this purpose. There is an interface for retrieving raw events from the archive, but you can only enter a date and time range…you can’t really search. What if you want to see login messages for a particular user going back a whole year? Or maybe you want to see all the raw messages received from a particular reporting device for a 1 week period. You can’t do that using existing CSMARS functionality. Luckily though, this can be accomplished with as little as a single command on the archive server itself (provided you run Linux).

I have a UNIX background and we already have many RedHat servers, so it was natural to implement archiving on a RedHat box. Cisco also supports archiving to a Windows server running NFS. This decision drives the selection of the tools one might use to search the archive. The information and examples below were developed on a RedHat system, but most of the information and certainly the concepts are applicable to either.

In order to more intelligently search the archive, we must first understand the layout of the archive directories and format of the relevant files. Here is a diagram of the directory layout:

As you can see, each day has its own directory in the archive. There are multiple directories within each day directory. The only one we care about though is the ES directory (which one might assume stands for (E)vents and (S)essions). This is the directory that contains the raw events. There are only two types of files in the ES directory. The “rm-“ files are the raw message files, so they’re the files we want to work with. The “es-“ files are presumed to have something to do with normalized events and sessions and appear to contain a bunch of foreign key values…nothing really useful outside the database environment. Both types of files are compressed with the Ziv-Lempel algorithm, which means they can be uncompressed with just about any compression utility.

The format of the file names gives us important information about the data within:

At this point, we have all the details required to determine which files are relevant given a particular date and time range. Hopefully you can envision how you might build a regular expression to match on a range of days. For help with regular expressions, try this link.

Now, onto the format of the data within the files. Below is a copy of a single row from a file:

Notice that the field delimiter is hex character xBB. Also remember that raw messages are currently truncated to 512 bytes. Don’t assume that rows within the file are sorted by date/time, because they aren’t (i.e. the first row will not always have the start date and start hour from filename).

Certainly one of the advantages to deploying the archive on Linux is that the basic tools necessary to search through compressed data are already available. For example, zgrep is a wrapper script found in most Linux distros that allows the user to perform regular expression searches through compressed data. Let’s say you want to find all instances where the Windows Administrator account was used in a security context on December 12-14, 2006 during the 1am hour.

The following command is all you need:

# zgrep –i –P “MSWinEventLog.*Security.*[ ]administrator[ ]” /archive/2006-12-1[234]/ES/rm*_2006-12-1[234]-01-??-??_*

That’s it! The zgrep command alone may satisfy basic search requirements. In an environment that processes many millions of events per day and/or keeps archived data for years however, something a little more sophisticated is needed.

Building a more sophisticated solution

As previously mentioned, we chose to use RedHat Linux as our NFS archive server. It’s a relatively beefy Intel box with dual 64-bit processors, 4GB ram, and >2TB storage.

Our CSMARS 100e processes in the neighborhood of 80 million events per day. This translates into as many as 400 raw message files written to the archive per day, and each file can contain over 70mb of uncompressed data. That’s a huge amount of data that must be searched though.

Our high level requirements for the solution were that it must be fast and it must be easy to use. We recognized early on that in order to search through this much data efficiently, we needed to spawn multiple processes. We also knew that running multiple searches concurrently was not feasible. To support these requirements, the following individual components were developed (in Perl, our language of choice):

  1. user facing CGI to create/read/update/delete search jobs
  2. a job manager that maintains the queue, starts search jobs, and notifies user of job status
  3. a query program that actually performs the search
The key features/characteristics of our solution include:

  • written in Perl
  • allows PCRE searches
  • includes recipes for common searches (checkpoint control transactions, windows login, etc)
  • uses configuration files for most options
  • supports any number of child search processes, as determined by configuration file
  • limits maximum size of search results, as determined by configuration file
  • auto detects when regular expression search is required
  • performs case insensitive searches, as determined by configuration file
  • compresses search results > n bytes, as determined by configuration file
  • notifies user when job status changes
I spent only about 60 hours building a solution written in Perl, thanks in no small part to the judicious use of other peoples work via existing modules. Most of the harder stuff I had never done before (forking processes, managing a job queue), so I know this can be done by anyone with the ability to find and combine the right tools. This solution is the only method we have for viewing raw messages that have been purged from CSMARS, and it is serving that purpose exceedingly well. Not surprisingly, it has proven to be orders of magnitude faster at searching through events that have not yet been purged from CSMARS. A keyword query for a specific username during the last 2 weeks takes over 11 hours when done in our CSMARS. The same search takes 13 minutes using the custom search tool. This has led to the solution becoming an extremely useful tool for troubleshooting general operational issues with reporting devices.

My work does not permit me to share the code developed as part of my job, so I can’t provide it. I’m a terrible hack anyway, so you wouldn’t want it;-) My basic pseudo-code is simple enough though:

  1. create a list of files that must be searched based on provided time range
  2. for each file, spawn a child process to perform the search, writing results to unique file based on pid
  3. wait for all children to exit, then clean up the output
Here is an example of the submission form used to submit jobs:

I will share a snippet of the code that encompasses the main logic of the search program. This is just an example of how one might fork processes in Perl.

foreach my $file ( @dayfilelist ) {

## for each file that must be searched through, we fork a child process. Limited using Proc::Queue (see $maxkids)
if ($pid = fork) {
# parent process
elsif (defined $pid) {
# child process
# if we've already written > max results size, we're done
my ($dirsize) = split(/\t/,`du -sb "$jobdir/$jobid"`);
exit(0) if $dirsize > $maxresultsize;

print STDERR "Kid: started processing ".substr($file,-56)." at ".scalar localtime()."\n" if $statusfl;
# perform our search on this file
my $results = `$zgrep $caseopt $pcreopt "$searchStr" $file`;
exit(0) unless $results; #if we don't have any results, exit
# create individual output files for each child
open (OFILE,">$ofile.$$") or die "Unable to open $ofile.$$ for writing: $!";
print OFILE $results or die "Can't write to output file: $!";
close OFILE or die "Can't close output file: $!";
print STDERR "Kid: done processing ".substr($file,-56)." at ".scalar localtime()."\n" if $statusfl;
} ## end child process
else {
# forkin' error
die "Can't fork: $!\n";
} # end of loop through files

# wait for all children to finish

Matthew Hellman

I`d like to thank Matthew, for sharing his knowledge and experience of creating a 3rd Party Application with us, and I must admit i didn`t even think this was possible! Thanks again Matthew.

CS-MARS and Archiving - Part 3

The CS-MARS archive process runs daily at 2AM, and it creates a dated directory on the NFS share for its data. You cannot specify a different time to archive the data.

What does the NFS Share directory structure look like?

The pnos directory is where the operating system backups are stored.

A dated directory is also created for each day. Within these daily directories, subdirectories are created for each data type, as shown below.

The primary use of an archive is to restore the appliance in reponse to a catastrophic failure. But the archived data can alse be retrieved manually and viewed.

Now before i go any further into Restoring the appliance, i have a guest article on CS-MARS and Making Use of the Archived Data, which i will publish later.

Wednesday, December 13, 2006

CS-MARS and Archiving - Part 2

As mentioned in Part 1 we can use a NFS Server to Archive our MARS data.

Now NFS is supported natively in Linux, and is what Cisco would recommend for your NFS storage, which is quite simple to set up....

NFS Setup for Linux

1) As root, create a directory for the archive data eg,..

mkdir -p /archive/MARSBox
chown -R nobody.nobody /archive
chmod -R 777 /archive

2) In the /etc/exports file, add the following...

/archive/MARSBox MARS_IP_Address (rw)

3) Restart the NFS Service...

/etc/init.d/nfs restart

But what if we want to do this with Windows? Well we can use Windows Services for UNIX (WSU). This allows an NFS mount to be created on a Windows file server.

Windows Services for UNIX Setup

This can be downloaded from HERE current version is 3.5, filename SFU35SEL_EN.exe

1) Extract this to a folder on your windows box, and run SfuSetup.msi

2) Click NEXT to Continue

3) Enter values for the Username and Organization fields, and click NEXT

4) ACCEPT the agreement option, and Click NEXT

5) Select the CUSTOM Installation option, and NEXT

6) As a minimum, and for this example, you must select Entire Feature (including any subfeatures if any) will be installed on local hard drive, for the following components..

  • NFS (see note below)
  • Authentication tools for NFS
But, make sure you deselect Gateway for NFS, or else you will get this error below...

7) Verify that Change the default behaviour to case sensitive check box is NOT TICKED, then click NEXT.
8) The Username Mapping Panel appears. Verify that...
  • Local User Name Mapping Server
  • Network Information Service (NIS)

are selected, then click NEXT.

9) Enter the Windows Domain name, the NIS Domain and Server are optional, click NEXT.

10) Enter the desired location for the Windows Service for UNIX install, (Not MARS Archive Share), and click NEXT.

11) Click FINISH, and REBOOT.

Now you have successfully install the NFS Windows Component, we are now ready to define a share to used by the MARS appliance.

Define a NFS Share on Windows

1) In Windows Explorer, create a folder for MARS to Archive to.

2) Right-Click the folder, and select the NFS Sharing Tab.

3) Select Share this folder, and enter a share name. Encoding ANSI.

4) Select Allow Anonymous Access

5) Click on Permissions, and select ALL Machines, and No Access

6) Now click ADD, and type the IP Address of your MARS Box, with Read-Write, and ANSI Encoding.

7) Click OK, and Apply.

8) From a DOS Window, type the following, to modify the shared folder permissions so that Everyone has local filesystem access to the folder.

cd share
cacls share /E /G everyone:F

8) Under Administration Tools/Local Security Policy / Security Options

Double click.. Network Access: Let Everyone permissions apply to anonymous users, and select Enabled. Then click OK.

You have now completed the NFS configuration for the Windows Server.

Before we configure MARS to Archive to the NFS share, there is one last thing that is recommended. This is to add the NFS Client, which is the MARS Box, to the hosts file
  • /etc/hosts on Linux
  • Windows\system32\drivers\etc on Windows
And also Add the MARS Appliance to your internal DNS.

CS-MARS Archive Configuration

1) Select Admin / System Maintenance / Data Archiving

2) In the Remote Host IP Field, enter the IP Address of your NFS Server

3) In the Remote Path Field, enter the export path on the NFS Server

(In Windows a forward slash is required to resolve the UNC Share name)

4) Archiving Protocol, NFS is the only option at the moment!

5) Remote Storage Capacity in Days - enter one of the following...
  • The maximum number of days for which you want the NFS server to retain data. The NFS Server keeps your data for the number of days previous to the current date.
  • The number of days that the NFS Server can maximally retain, ie the upward capacity of the archive server.
6) Click START, to enable your MARS Archiving

If everything has gone to plan, you should receive a message back of DONE Status OK.

If you see an error message such as "Invalid remote IP or path", your NFS Server may not be correctly configured. I`d check the remote path is correct in the first instance.

In a Windows environment, we can also enable logging of NFS events...

1) Programs/ Windows Services for UNIX/ Services for UNIX Administration

2) Select Server for NFS

3) Under Logging, tick all the boxes, and specify a folder where you want to place the log file.

4) Click Apply to save changes.

Thats the Archiving all set up. In Part 3 of this series, i`ll go into what different folders and files are created on the NFS Share, and how we can go about doing restores of this data.

Wednesday, December 06, 2006

CS-MARS and Archiving - Part 1

The MARS appliance stores data on-box in raw format, ie, uncompressed. It breaks event data into ten different partitions.

Remember from an earlier post, the command "pndbusage"

[pnadmin]$ pndbusage
Current partition started on Tue Aug 8 00:41:54 BST 2006 and uses 30.7% of its available capacity.
Switching to next partition is estimated for Thu Mar 22 18:23:29 GMT 2007.
9 empty partitions are available for storage

So in the above example we have plenty of space available, but what if this was a busy MARS box? Planning your Archiving and Space Requirements becomes very important.

The MARS appliance uses a FIFO approach, and when the internal storage reaches its capacity, it automatically purges the data in the oldest of the ten partitions, roughly 10% of the stored event and session data. Ouch!

Some rough estimations (1st example via Cisco Press Book)....

1) Assume we are receiving 100 events per second, and an average of 200 bytes/event. This gives for 1 day (100x200x86400 secs)=1,728,000,000 bytes of data or 1.72GB of storage required a day.

Remember from the "earlier post" the M20 has 77GB of storage onbox available, but minus 20% of this for tables/indexing etc, with 61.6GB remaining. Now 61.6/1.72 gives approx 36 days worth of storage.

2) Assume we are receiving 50 events per second, and an average of 200 bytes/event. This gives for 1 day (50x200x86400 secs)= 864,000,000 bytes of data or 864MB of storage required a day.

Remember from the "earlier post" the M20 has 77GB of storage onbox available, but minus 20% of this for tables/indexing etc, with 61.6GB remaining. Now 61.6/0.864 gives approx 72 days worth of storage.

It is obviously apparent that we need some form of Archiving the data, especially if you have a busy network, and are following compliance regulations.

The good news, is that we can archive data from a MARS appliance, and use this data to restore the OS, system configuration settings, dynamic data (event data) or the complete system.

This is done via NFS, but you cannot schedule when the data backup occurs. The MARS appliance performs a configuration backup every morning at 2am, and event data is archived every hour.

The not so good news, is that dynamic data (event data) archived, includes only the data that is received or generated after you enabled archiving.

Therefore, it is recommended that you enable archiving and plan your storage requirements before you configure your MARS appliance to receive events from reporting devices!

Once you have enabled archiving via NFS, dynamic data is then written by MARS once to its local database, and once to the NFS Archive (and this is compressed with the Ziv-Lempel algorithm, also used in GZIP).

It is recommended that a reliable network connection of at least 10MB/s exists between the NFS Server and MARS Appliance. You should consider using the eth1 interface on the MARS appliance, to ensure that the backup operation is not competiting with other operations on the MARS box.

Another thing to note, is that you can use the same NFS Server for archiving multiple MARS appliances, BUT, you must specific unique directory's in the NFS path for each appliance, or you will corrupt the archives.

Now when we configure the MARS box to archive, we need to set a "Remote Storage Capacity" in days. ie, to keep a years worth of data on the NFS server, we need to specify 365 days!
In planning the NFS space requirements, the same formula used for on-box storage is used, but the result is divided by the compression ratio.

Using the previous example, of receiving 100 events per second, and an average of 200 bytes/event. This gives for 1 day (100x200x86400 secs)=1,728,000,000 bytes of data uncompressed, or (100x200x86400/10) = 172,800,000 bytes compressed = 172MB per day.

Or for a year (172MB x 365) = 63072000000 bytes of data = approx 63GB.

In Part 2 of this document, i`ll go into setting up a NFS Server in Windows, and the format of the Archived Data, and how we can restore via the pnrestore command.

Friday, December 01, 2006

Search the Cisco MARS Blog

Sorry for no new posts this week, but i`ve been knocked out with the flu!

Its been a great month for the ciscomars blog, as i`ve had just short of 800 visitors in November, Thanks!

What has occurred to me, that many people may have not read some of the older posts, and i`ve have nearly 30 now.

Hence, the Blogger SEARCH

Works fantastic, as an example type: Netflow, and you receive a list of all my posts related to just that.

To also let you know whats coming on the Cisco MARS blog....

I have arranged some great "guest" articles from other people in the Security Industry, who know MARS inside out, watch out for these.

Also i have some articles coming on the Custom Parser functionality of MARS.