|   | 
Details
   web
Records
Author (up) Barberis, D. et al; Fernandez Casani, A.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Salt, J.; Sanchez, J.; Villaplana Perez, M.
Title The ATLAS EventIndex: A BigData Catalogue for All ATLAS Experiment Events Type Journal Article
Year 2023 Publication Computing and Software for Big Science Abbreviated Journal Comput. Softw. Big Sci.
Volume 7 Issue Pages 2 - 21pp
Keywords
Abstract The ATLAS EventIndex system comprises the catalogue of all events collected, processed or generated by the ATLAS experiment at the CERN LHC accelerator, and all associated software tools to collect, store and query this information. ATLAS records several billion particle interactions every year of operation, processes them for analysis and generates even larger simulated data samples; a global catalogue is needed to keep track of the location of each event record and be able to search and retrieve specific events for in-depth investigations. Each EventIndex record includes summary information on the event itself and the pointers to the files containing the full event. Most components of the EventIndex system are implemented using BigData free and open-source software. This paper describes the architectural choices and their evolution in time, as well as the past, current and foreseen future implementations of all EventIndex components.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Is ISI yes International Collaboration yes
Call Number IFIC @ pastor @ Serial 6079
Permanent link to this record
 

 
Author (up) Fernandez Casani, A.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Salt, J.; Sanchez, J.; Villaplana Perez, M.
Title Big Data Analytics for the ATLAS EventIndex Project with Apache Spark Type Journal Article
Year 2023 Publication Computational and Mathematical Methods Abbreviated Journal Comput. Math. Methods
Volume 2023 Issue Pages 6900908 - 19pp
Keywords
Abstract The ATLAS EventIndex was designed to provide a global event catalogue and limited event-level metadata for ATLAS experiment of the Large Hadron Collider (LHC) and their analysis groups and users during Run 2 (2015-2018) and has been running in production since. The LHC Run 3, started in 2022, has seen increased data-taking and simulation production rates, with which the current infrastructure would still cope but may be stretched to its limits by the end of Run 3. A new core storage service is being developed in HBase/Phoenix, and there is work in progress to provide at least the same functionality as the current one for increased data ingestion and search rates and with increasing volumes of stored data. In addition, new tools are being developed for solving the needed access cases within the new storage. This paper describes a new tool using Spark and implemented in Scala for accessing the big data quantities of the EventIndex project stored in HBase/Phoenix. With this tool, we can offer data discovery capabilities at different granularities, providing Spark Dataframes that can be used or refined within the same framework. Data analytic cases of the EventIndex project are implemented, like the search for duplicates of events from the same or different datasets. An algorithm and implementation for the calculation of overlap matrices of events across different datasets are presented. Our approach can be used by other higher-level tools and users, to ease access to the data in a performant and standard way using Spark abstractions. The provided tools decouple data access from the actual data schema, which makes it convenient to hide complexity and possible changes on the backed storage.
Address [Casani, Alvaro Fernandez; Montoro, Carlos Garcia; de la Hoz, Santiago Gonzalez; Salt, Jose; Sanchez, Javier; Perez, Miguel Villaplana] CSIC UV, Inst Corpuscular Phys IFIC, E-46980 Paterna, Spain, Email: alvaro.fernandez@ific.uv.es;
Corporate Author Thesis
Publisher Wiley-Hindawi Place of Publication Editor
Language English Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes WOS:001079548500001 Approved no
Is ISI yes International Collaboration no
Call Number IFIC @ pastor @ Serial 5706
Permanent link to this record
 

 
Author (up) Fernandez Casani, A.; Orduña, J.M.; Sanchez, J.; Gonzalez de la Hoz, S.
Title A Reliable Large Distributed Object Store Based Platform for Collecting Event Metadata Type Journal Article
Year 2021 Publication Journal of Grid Computing Abbreviated Journal J. Grid Comput.
Volume 19 Issue 3 Pages 39 - 19pp
Keywords Grid computing; Hadoop file system; Object-Based storage
Abstract The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousands of physics users. The ATLAS EventIndex project, currently running in production, builds a complete catalogue of particle collisions, or events, for the ATLAS experiment at the LHC. The distributed nature of the experiment data model is exploited by running jobs at over one hundred Grid data centers worldwide. Millions of files with petabytes of data are indexed, extracting a small quantity of metadata per event, that is conveyed with a data collection system in real time to a central Hadoop instance at CERN. After a successful first implementation based on a messaging system, some issues suggested performance bottlenecks for the challenging higher rates in next runs of the experiment. In this work we characterize the weaknesses of the previous messaging system, regarding complexity, scalability, performance and resource consumption. A new approach based on an object-based storage method was designed and implemented, taking into account the lessons learned and leveraging the ATLAS experience with this kind of systems. We present the experiment that we run during three months in the real production scenario worldwide, in order to evaluate the messaging and object store approaches. The results of the experiment show that the new object-based storage method can efficiently support large-scale data collection for big data environments like the next runs of the ATLAS experiment at the LHC.
Address [Fernandez Casani, Alvaro; Sanchez, Javier; Gonzalez de la Hoz, Santiago] Univ Valencia, Inst Fis Corpuscular IFIC, Burjassot, Spain, Email: alvaro.fernandez@ific.uv.es;
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language English Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1570-7873 ISBN Medium
Area Expedition Conference
Notes WOS:000692413100001 Approved no
Is ISI yes International Collaboration no
Call Number IFIC @ pastor @ Serial 4953
Permanent link to this record