Re: [tsc-devel] Retaining old IRC logs...

Quintus | Tue, 12 Jan 2016 17:34:52 UTC

Hi everyone,

apologies for this somewhat long email, but I think it’s easiest if I
make my point of view clear with more detail than I gave in IRC.

I. The facts
============

datahead <…r@f…> writes:

> Post via forum by datahead <…9@x…>:
> Regarding the discussion about purging old IRC logs, I do not think we
> should just delete all logs older than 6 months.

For those who weren’t in IRC today here are the facts: furbot has logged
our IRC channel permanently and reliably for almost two years now, with
only one major interruption. The first available log file is from
2014-06-12. Logfiles are available both in HTML format with nice
colourisation, and in plaintext format suitable for reading on text
consoles.

Regarding disk space occupation here are the stats:

    % du -h logs
    7,6M logs/plainlogs
    26M  logs/htmllogs
    34M  logs

As for backups, duplicity currently keeps 2 full backups on the disk,
and every 4 months it does a full backup (in-between incremental backups
are made every week). That is, our oldest backup currently is 8 months.

The backups are large:

% du -h backup
18M backup/database
45G backup/filesystem
45G backup

Since we have long crossed the 8 month mark, this is the size our backup
storage is going to stay around with (until we add more large files to
alexandria, like TSC releases).

II. My point of view
====================

> If we want to move them off the server, that is fine, but I think they
> should be kept somewhere.  Major meeting and some important
> discussions are in the logs.

I will explain why I made this suggestion. While sydney has correctly
stated that the log files occupate a certain amount of disk space on our
server, I do think this is neglectible. Just see the du(1) outputs
above. If the logs ever were a problem, we could compress them.

I have other reasons why I advocate deletion of old logs. Here are my
main points:

1) No use.

It is quite unlikely that we will ever need to read through an IRC
chatlog from ten years ago. Keeping useless data just for the sake of
keeping it around because one could in some unknown point in a rather
distant future have some unknown use for them is something that makes me
shiver.

2) Abuse.

Building on this, nobody can foresee what the data may be used for by
people other than us (or maybe even by someone of us? Nobody knows). You
cannot assure that someone who wants to damage you in some way in the
future will not find something compromising in the data collected about
you five years ago.

3) Against “But it was public!”

The IRC channel is public, and the fact we are logging it is even
publically announced via ChanServ on joining. However, compared to
forums or a mailinglist, chats are a different means of
communication. The need to reply instantly causes answers that have not
been well thought through before posting them. That is, the risk of
something unwanted being logged (which you might not even have thought
of while writing) is higher than on an asynchronous medium such as a
forum or a mailinglist.

That being said, I would also sympathise with forum and mailinglist
entries not being available publically forever as well. In this regard
it is important to point out that our mailinglist since it was first
established supports the use of the “X-No-Archive” email header. You as
the mail sender can set this header in your email to “yes”. It instructs
the mailinglist software to exclude your email from the mail archive;
the mail will be delivered, but it will neither make it to the official
list archive, neither to the forum gateway. You can participate without
being logged publically. For a similar discussion around disabling
channel logging by default, see my explanations under III.2) further
below.

4) Important events can still be preserved.

Every now and then the TSC team holds the “General Discussion”, a
meeting of the entire (well, ideally) team in IRC and discusses and
resolves a large number of problems that could not be resolved for a
longer time and require some more intense discussion than what the
tracker provides. By the nature of these meetings there are lots of
important decisions made, probably even including some votes, therefore
there is a viable interest of having these discussion archived (as they
contain the reasons why TSC developed in one direction or another).

It however is easily possible to preserve the IRC logs of a General
Discussion (or other important events) separately. They can easily be
copied onto some other location on the server before they are
deleted. The vast majority of the talks in #secretchronicles however is
informal and serves the (very important) use case of keeping the social
aspects in our team intact. These social interactions are interesting to
read when participating, but serve no use later on. The bad jokes I made
three years ago are nothing that needs to be preserved.

5) Project decisions are not made in IRC (alone) outside a GD.

It has always been our practise (and I see no reason to change this)
that decisions that are not resolvable offhand are taken to the forum,
the mailinglist, or the tracker so everybody has time to think about his
arguments. Therefore, outside the well-prepared circumstances of a
General Discussion project-relevant decisions are not made in IRC, at
least not in IRC alone. Deleting the chat logs of days without a GD
doesn’t mean we lose something thus.

6) Log files in backups.

The chatlog files are not going to be unavailable immediately after
deletion. As outlined above, backups of alexandria are made in regular
intervals. This effectively means that deleting them only makes them
inaccessible for the public. Really “deleted” in the sense there is no
possibility to restore them are they first when they have cycled out of
the backups. With the current configuration this would be 8 months.

7) Conclusion.

When I weight the importance of the data in question (mostly informal
social talks) against the impact of keeping them (possible abuse of
data, which might be used against the person who said what was logged),
I conclude that it is better to delete logs whose use is at least
unclear.

Note I don’t suggest to delete all the logs older than a day. I find it
okay if the logs are kept around for six months in public, maybe a
little longer in the backups for restoring in case it’s needed. Just
having them sitting around there forever or for a really long time
bothers me. Feel free to discuss the length of the storing intervals
with me.

III. Other aspects
==================

1) Search function.

> Having a search function, however, would make the old logs much more
> useful.

With this I agree. It can easily be implemented as a simple CGI script
crawling the plaintext logs.

2) Disabling logging?

I have an alternative suggestion. We could disable logging in
#secretchronicles by default and instead only turn it on when we
determine it’s required. furbot already supports the commands !stoplog
and !startlog which disable/enable the logging functionality. These
commands can only be issued by channel ops, so there’s no risk of random
people turning on/off channel logging. It would enable us to decide “on
the fly” when logging a conversation is useful, and it would make
logging even more explicit than it is now. People who don’t want to be
logged, but still want to participate in chats around TSC, would be able
to join our channel thus as well (this is currently not possible,
because everything is logged). Currently, the only possibility to do so
is subscribing to the mailinglist and setting the “X-No-Archive”
header. IMO this is enough, but naturally I wouldn’t be against
positively expanding our privacy handling.

3) No action taken yet.

I have not deleted anything yet. This just for your information.

Valete,
Quintus

-- 
#!/sbin/quintus
Blog: http://www.guelkerdev.de

GnuPG key: F1D8799FBCC8BC4F

By Thread
2016-01-12 16:26:23datahead[tsc-devel] Retaining old IRC logs...
2016-01-12 17:34:52QuintusRe: [tsc-devel] Retaining old IRC logs...
2016-01-12 18:56:04Chris JacobsenRe: [tsc-devel] Retaining old IRC logs...
2016-01-27 22:53:25datahead[tsc-devel] Re: Retaining old IRC logs...
2016-05-04 18:44:04Marvin GülkerRe: [tsc-devel] Re: Retaining old IRC logs...
2016-05-05 11:48:23Lauri OjansivuRe: [tsc-devel] Re: Retaining old IRC logs...
2016-05-05 14:45:30Marvin GülkerRe: [tsc-devel] Re: Retaining old IRC logs...
2016-05-20 18:58:42Lauri OjansivuRe: [tsc-devel] Re: Retaining old IRC logs...
2016-05-21 15:34:48Marvin GülkerRe: [tsc-devel] Re: Retaining old IRC logs...
By Date
[tsc-devel] Retaining old IRC logs...datahead2016-01-12 16:26:23
Re: [tsc-devel] Retaining old IRC logs...Quintus2016-01-12 17:34:52
Re: [tsc-devel] Retaining old IRC logs...Chris Jacobsen2016-01-12 18:56:04
[tsc-devel] Re: Retaining old IRC logs...datahead2016-01-27 22:53:25
Re: [tsc-devel] Re: Retaining old IRC logs...Marvin Gülker2016-05-04 18:44:04
Re: [tsc-devel] Re: Retaining old IRC logs...Lauri Ojansivu2016-05-05 11:48:23
Re: [tsc-devel] Re: Retaining old IRC logs...Marvin Gülker2016-05-05 14:45:30
Re: [tsc-devel] Re: Retaining old IRC logs...Lauri Ojansivu2016-05-20 18:58:42
Re: [tsc-devel] Re: Retaining old IRC logs...Marvin Gülker2016-05-21 15:34:48