In another life I worked for a SaaS vendor. For the display tier there were logs from firewalls, web servers, database engines, and application servers. We consciously built a log handling service (using splunk) to consume these inputs and analyse them as fast as the professional services and support staff needed to do their job (i.e. a few minutes was okay). The benefit to this was that we could put access control, cleanse the data (PCI/HIPPA/etc.), and be more responsive to customer inquiries Too often the kind of log-file storage infrastructure at organizations is an after-thought. Though linux-geeks and sysadmins can grep logfiles, I feel that it was so much better to give fast access to the data to the people who need it. I think splunk is a great product, but it's quite expensive and in the past couple of years competitors like LogRhythm and logstash have come along
When talking about 'big data' we should qualify it. One person's 'big' is another persons 'small'. Where I work a year's worth of logs for the network infrastructure is probably well over 20TB, but this pales in comparison with what researchers here consider 'big': The data analysis systems for the UW eScience institute supports several PB of storage and I think they're up over 1000 nodes. (They also offers the free coursera course: https://www.coursera.org/course/datasci. on data science as well as Ph.D in Big Data).