A table is a logical collection of rows. The AWS Glue may create such not materialized entity from many data sources. One of them are old-school XML files. Let’s try to import and read 27 files with total size of 35GB.
To create a table we need to define new database, a classifier and a crawler. The classifier defines how the data looks like and the crawler inspects the data, classifies it and updates table metadata.
The crawler ran just 1 minute and correctly recognized files with 36M rows. Unfortunately querying the data directly with Athena does not work.
I tried first to crawl single 35GB file, unfortunately it did not work.