Role of data integration , logging and machine learning in the realm of cyber security

This news back in October last year triggered a train of thought about the critical role logging ,data integration and machine learning can play  in building next generation proactive safeguards to enhance cyber security.


Let’s consider GitHub as a test case. GitHub is a web-based Git repository hosting service, which offers distributed revision control and source code management (SCM) functionality of Git.It also provides access control and collaboration features such as wikis, task management, and bug tracking and feature requests for public as well as private repositeries. GitHub faces several cybersecurity challenges such as: a) Denial of service attack for hosted web application b) Code injection c) Cross site scripting d) Security breach due to cookie or session hijacking e) Man-in-the-middle attack.

Critical team skills

Given the extensive and ever evolving security challenges companies face today,information security team needs skills and technical expertise that can enable it to respond to incidents, perform analysis tasks, and communicate effectively with their constituency and other external contacts. Team members should have experience and understanding of multiple security platforms such as automated and manual testing tools, firewalls, proxy servers, intrusion prevention systems, logging correlation/management, operating systems, protocols , risk assessments and web application firewalls.A real time analytics platform for unstructured log files ( tera peta bytes of logs ) will be helpful in logging correlation and management aspect of security.Machine learning algorithms can be used to develop predictive models to find patterns and detect anomalies in logs.This enables security teams to take preventive and corrective actions.


Sources of telemetry such as IPFIX can be enabled on infrastructure devices. It caches and generates records about network traffic and their characteristics. It can report on various OSI layer network traffic details. For example, it can report traffic on source and destination IP addresses or on transport-layer source and destination port numbers, or it can extract parts of the TCP header. After the information is extracted from the network device, it can be stored and used to perform correlation and analysis.

Network device logs are also useful in certain situations. For example, attempts to compromise an infrastructure device’s management credentials may generate log messages that would reveal the suspicious activity.Network taps or captures. Deep packet inspection from taps in the network is useful when investigating end-host compromises. Indicators of compromise can be investigated from historical packet captures, assuming they are stored for long time duration and the analysis tools can offer the necessary analytical functionality.

Signature vs anomaly based security monitoring

Signature based IDS search for a known identity or signature for each specific intrusion event. Signature monitering is very efficient at sniffing out known signatures of attack but depends on receiving regular updates to it’s signature database to remain in sync with variations in hacker techniques.It becomes in-efficient as the signature database grows in size and complexity.It also requires more CPU cycles to check for every signature and also increases the possibility of false positives.

Anamoly detection based IDS captures all the headers of the IP packets running towards the network. From this, it filters out all known and legal traffic, including web traffic to the organization’s web server, mail traffic to and from its mail server, outgoing web traffic from company employees and DNS traffic to and from its DNS server. This helps detect any traffic that is new or unusual.It is particularly good at identifying sweeps and probes towards network hardware that precede any attack. It gives early warning of potential intrusions. Anomaly detection technique requires continuous uninterrupted timely huge amount of sensor data collection , management and analysis with full data integrity.It also needs deployment of unsupervised,supervised statistical algorithms to train and detect anomalies in network traffic in real time.

Data distribution service

Data distribution service (DDS) or message brokers such as Kafka is deployed to keep telemetry flows directed and in sync, and to ensure timely delivery and message integrity.It implements a publish/subscribe model for sending and receiving data, events, and commands among the nodes. Nodes that produce messages (publishers) create and publish “topics.” DDS delivers the messages from topics to subscribers that declare an interest in that topic.

DDS handles transfer chores: message addressing, data marshalling and unmarshalling (so subscribers can be on different platforms from the publisher), delivery, flow control, retries, etc.

Data enrichment

Once we have access to that data, some forms of data enrichment can be beneficial in adding context to a security event.Low false alarm rates are critical in anomaly detection and desirable in data cleaning. False alarms are generated in anomaly detection systems as not all anomalies are representative of attacks. Purging such anomalies (program faults, system crashes among others) is hence justifiable, but within reasonable limits. For example we can use motif extraction and translation to flag system calls and use translation table to associate motif occurance with probablity of attack.

Data correlation

Monitoring gains value as alerts are correlated from multiple sources of telemetry. But how do we approach this kind of correlation when handling terabytes of log data per day, both in thought process and technology.

Real time collection,aggregation,correlation,detection and communication is ideal solution. Kafka brokers can send multiple types of sensor messages to a storm topology that does correlation and aggregation.Spark managed algorithm implementations can generate accurate and reliable threat information event for an end user’s dashboard for further action to mitigate the security risk.

Thought process behind alert correlation from multiple sources:-

Association – Associating multiple event types and sources across multiple nodes Frequently, event data from multiple sources and nodes is necessary to identify a problem. The correlation engine needs to be able to process data regardless of its origin.

Event sequence  – The current course of action may be influenced by past events. For example, a single port scan by a particular source or network may not be interesting, but comparing that event to short- and long-term histories may unveil a pattern of behavior that requires immediate action.

Event persistence  – For example, short bursts of high load network traffic may be normal, but sustained bursts could indicate a denial of service attack is underway. The ability to link event persistence with periods of time is a critical need of a correlation engine.

Event-directed data collection

As part of correlation, various conditions may require interactions with other systems to complete the process. For example, asset database, customer databases, network device or other agent data may be required. The best correlation solutions go beyond simple security data at run time in order to help diagnose, distinguish and deliver meaningful high priority alerts.

Finally, true correlation is the ability to analyze, compare and match escalated sensor events from multiple sensors in multiple timeframes. Aggregation is an essential pre-requisite for effective, cross-platform, real-time correlation.

Identification & alerting

In the end based on high-signal alerts how do we decide whether a correlated alert needs immediate attention (i.e. going to the security pager) vs a longer time to analysis (like a weekly email wrap up)? This requires threat modeling and design of threat or attack tree to decide where the correlated alert needs to be sent. Examine a network environment from an attacker’s perspective to determine what targets would be most tempting to a person attempting to gain access to a network and what conditions must be met for an attack on such targets to succeed. When vulnerable targets of opportunity have been identified, the environment can be examined to determine how existing safeguards affect the attack conditions. This process reveals relevant threats, which can then be ranked according to the level of risk they present, which remediation activities can deliver the most valuable solution to that threat, and whether mitigation may affect other areas in beneficial or detrimental ways that may affect the value of that remediation.


© Copyright 2017 Topmist, inc. All rights reserved.

The life of your company is at stake if your CEO cannot manage disruption

We are entering a new phase of technology where disruption is the primary model over just empowerment.  This disruption is now a new type of catalyst that is no longer taking down small products or companies but is destroying the business models of Fortune 500 companies.  The tide has changed to one where all types of emerging technologies are forcing every company and person on the planet to embrace change or die.

The CEO’s responsibility used to be to manage risk and set the strategy for the organization.  In today’s world, the CEO is now required to both manage risk and manage disruption.  If the CEO cannot effectively manage disruption, they will find their company losing massive amounts of customers and revenue like some well known companies — Kodak, Blockbuster, Borders, Compaq, Tower Records, etc.  Babson College cited a scary statistic – “Over 40% of the companies that were at the top of the Fortune 500 in 2000 were no longer there in 2010.”

So if you are not willing to face the risks of change you will fall behind and be replaced by the new.  As harsh as it sounds, you can embrace disruption and come out ahead in the process.  You do this by having the right advisors and technologists helping you learn and apply the correct constructive deconstruction strategies to your business and IT organization.  Disruption creates a compression in response time that can be overcome with a mindset and methodology that emphasizes flexibility.  You need to work with a company that has the appropriate agile business mentality to help you develop IT services in a Cloud framework, as well as be a Cloud broker for future growth.

Information Technology is used in every business model and across every aspect of society.  It is now as important as food and water in terms of every day life.  When IT first started its integration into society, it was a force of change.  Its runway was long and there was no problem with having projects that had implementation time lines that ran for 1, 2, or even 3 years.  It was completely acceptable and the norm.  As more and more companies started to consume IT, the model and behavior of it slowly changed.  Now, we find that it has evolved into a force of disruption with a very short runway.  So, if you do not have project and implementation timelines that match the runway, you will crash and burn.

This now critical role of IT in every day life is why the advent of Cloud technology is becoming so important to your business.  A company no longer has the freedom and luxury to slowly develop IT solutions for their business because if they go too slow, they will find themselves falling behind competitors.  By using the Cloud you are able to consume a variety of IT services without the wait, and allow your business to keep up with the demands with which it is faced.  You must have significant business agility and maturity in your IT organization because your company’s life relies on it.  The Cloud is your antidote to this force of disruption that you are seeing every time you sit down with your business leaders.

It is time that every CEO accepts the new demands of their role and learns how to manage disruption.  They need to bring in a set of IT professionals that understand the Cloud in the right context, and can help align it with business demands, so that the forces of disruption can be harnessed and used to propel company revenue growth and customer acquisition.  By leveraging the methodologies and practices of those that have successfully implemented Cloud solutions, a CEO can develop the foundation of an IT organization that can help him manage both risk and disruption for years to come.

© Copyright 2014 Cloudify, inc. All rights reserved. Image: NASA and the NSSDC,