The New York Chapter of PRMIA hosted "Regulatory, Compliance, and Risk Data Technology Challenges" at Credit Suisse's offices in New York,
last Thursday 10th April. Abraham Thomas introduce the panelists, and Don Wesnofske started off by setting the scene for the evening's event.
Don outlined how in reaction to the 2008 Crisis the regulators now require data retention for up to 10 years or more. Don cited one particular example where data must be reconstructed within 24 to 48 hours for any date up to 7 years back, and said that this
kind of "forensic" investigation capability was an important consideration for many financial institutions. He took us through a good presentation slide of his view on data management/risk architecture, and outlined how operational risk is comprised of people,
process, technology and events. Don ended his presentation by taking us through Wikipedia's definition of "Big Data", and in particular talked about how data
has a life cycle going through:
Don handed then handed over to Luigi Mercone of Credit Suisse who is a Director of Engineering Strategy & Architecture at Credit Suisse. Luigi started by saying that to the business at CS, he is technical support which involves asking "What is on fire today?
And whats going to be on fire tomorrow?" Luigi described how some time back CS had regulatory enquiry around their equities business which required them to reconstruct data from 2 years back.
The project to do this took around 4-5 months of database adminstrators time to reconstruct the world as at that point in time (I guess because tape storage was being used, and this needed restoring to disk/database). This was for an equity order management
system that had doubled in size every year for the past 17 years, and at that point CS was only retaining data going back 2 years. Luigi said that it was then thought that with new regulations requiring the ability to produce forensice evidence at any point
in time would potentially swamp CS's resources unless it was addressed head on and strategically.
Luigi described the original architecture that they were using being based on an in-memory database for intraday workloads, then standard Sybase (probably ASE I guess) and then Sybase IQ for longer term archiving, taking advantage of the column-store capabilities
of Sybase IQ and the resulting data compression possible. He added that the data storage requirements of the system had grown from 150TB to 1.2PB in
Luigi then offered a comparison of this original architecture with what he found by implementing RainStor, in the original architecture the Sybase IQ database compressed data down
into 160TB, whereas this was improved by a further factor of 10 down to 14TB using RainStor. He said that the RainStor was self-service providing a standard SQL interface, eliminated the need for tape storage, reduced the system "footprint" by 90% at CS, was
1/5 of the cost and the performance was good. (I guess here I would like to caveat that I know nothing of the original architecture other than the summary Luigi provided, and as such it is hard to judge whether the original architecture was optimal for the
data growth experienced, and hence whether this was overall an objective comparison of Sybase IQ's capabilities with RainStor.) Luigi closed by saying that whilst RainStor was a great archive database, its original origins were in in-memory databases and he
would encourage RainStor to re-enter that market too, given his experience so far.
John Bantleman CEO of RainStor took over and described how RainStor had been designed specifically for the needs of data archiving (I guess talking more about what it does now rather than its origins outlined by Luigi above). He said that RainStor offers
a 20-40x storage footprint reduction over traditional database technology and operates efficiently even at the PetaByte (PB) scale, based around RainStor
proprietary database technology making use of columnar storage and being capable of storing data in both relational-style tabular format and also in more "document" style using XML and JSON formats
using Key-Valueaccess. John mention that in terms of being able to store data that not only could RainStor retrieve
data at a point in time, but it could retrieve the schema being used at that point in time for a more complete view of the state of the world at that point. This echos a couple of past articles that I have penned, one for
IRD and one for Wilmott Magazine on bitemporal regulatory
John said that regulation was driving the need for data archiving capabilities, with 1400 regulations added since 2008 (not sure of source, but believable) and the comment from a Chief Data Officer (CDO) at one financial markets client that if a project
wasn't driven by regulatory compliance then the project isn't going to get done (certainly sounds like regulatory overload). John's opening remarks were really around how regulatory cost, complexity and compliance were driving forces behind the growth of RainStor
in financial services technology, and whilst regulation is the driver, firms should look at archiving of data as an opportunity too, in order to create value from corporate memory, and to be proactive in addressing future reporting and analysis needs.
John illustrated the regulatory need for data archiving through the Consolidated Audit Trail (CAT)regulation with data retention over 7 years will generate 100PB of data. He also
mentioned SEC Rule 17a-4 for broker dealers as another example of "data retention" regulation, with particular
reference to storage of records in on-rewriteable, non-erasable format. John termed this WORM storage, meaning Write Once, Read Many.
John seemed to imply that both the software (RainStor) and the hardware it runs on (e.g. EMC or Teradata etc) need to be WORM compliant. One of the audience members asked John about BCBS
239, to which John said that he didn't know that particular regulation (fair enough that John didn't know in my opinion, RainStor's tech is general about "data" and is applicable across many industries, whereas BCBS 239 is obviously about banks specifically
and is more about data aggregation and reporting than data retention/archiving to my understanding, and this seems to be confirmed with a quick doc scan for "archive" or "retention".)
To finish off the main part of the event (before the drinks and food began) there was a panel discussion. Luigi said that it was best to "prepare for all time, not just specifics" with respect to data retention and that there were dangers in rolling up data
(effectively aggregating and loosing granularity to reduce storage needs). John added that his definition of "Big Data" was "All information, for ever". Luigi added that implementing RainStor had allowed CS to spend more time on interesting questions rather
than on database restoration. John proposed that version 1 of Big Data involved the retention of web data, and as such loosing a data point here and their didn't matter. Version 2 of Big Data is concerned more with enterprise data where all data has value
and needs to be retained i.e. lots of high value data. He added that this was an opportunity for risk and compliance to become an asset.
Overall it was a good event which I found very interesting (but I have to admit to a certain geeky interest in this kind of tech). The event would have benefitted from say another competitive or complementary technology vendor involved maybe, plus maybe
an academic to give a different slant on data retention and on what the regulators hope to gain from this kind of mandated data retention. Not that the regulators have been that good at managing data themselves recently.