I feel like I may be missing something, as the only details I can find are the ones in the "Description" section. If that's all there is at this stage, that's fine; just making sure I'm not missing a document or something.
As for the plan, it seems reasonable, though I have a concern with the term "not external facing". Please, please make sure that as this service is developed, that it is done so with the assumption that it will be attacked with mild perseverance even though it's not external-facing. Denial-of-service attacks aren't really a concern, though attacks that attempt to subvert control flow or access restrictions are likely to be encountered. This includes things like SQL injection, CLI injection, and API fuzzing. The end-goal of the attacker may be (but not limited to) access to the collected records; tampering with the collected records; usurping any machines in the processing pipeline; or discrediting the work of XSEDE, its users, or the NSF.
We cannot assume that all users on all machines that can interact with this service are friendly. Even if there's an intermediate collector host with admin-only logins, the data flowing through it may contain user-supplied content (such as commands run, job names, etc.), all of which may contain hostile payloads.
Based on the description, I interpret this project as a collection of some level of detail of activity from all users of XSEDE resources for all time going forward. Please keep in mind that while, at some level public information (such as in aggregate by award), the specific information collected has the potential to be (ab)used in surprising, misleading, and generally unpleasant ways. If feasible, collecting aggregates in lieu of specifics may be preferable. It's less trouble than collecting specifics and trying to protect them from unauthorized access.
Hi Scott,
We're at the Launch Review phase, which is making stakeholders aware of the work description, who will work on it and how much time they will put into it, the schedule, and the deliverables. The detailed design of this work is the first deliverable and will reviewed later.
You are totally on point about the "not external facing" statement. I'll clarify that this service will only be accessed directly by a handful of staff (what we meant by not public facing), but that it will secured like all network connected XSEDE services. Two factor may be appropriate, but that is a design detail.
Initially we're only tracking usage for XSEDE provided command line tools and network services. Stuff like, Scott used MyProxy from location X on date Y. The usage analysis will look like "how many distinct users used MyProxy in a period", "how many time was MyProxy used in a period". But again, those are design details that will be part of the design review.
Thanks for the input, we'll take note for the design review.
JP