David. Posted June 1, 2013 Share Posted June 1, 2013 Hey guys, Let's say I'm trying to parse data from a website that has no JSON, RSS feeds. For instance, this forum website. If I were to make an app for this website would it be wise to parse raw HTML or is there a better way? Thanks! Link to comment Share on other sites More sharing options...
OzzySM12 Posted June 3, 2013 Share Posted June 3, 2013 It will depend what you can get from the site. Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t. Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them. Link to comment Share on other sites More sharing options...
fastman92 Posted June 3, 2013 Share Posted June 3, 2013 If you want to parse HTML document and read neccessary informations, then there are few decent HTML parsers. For C# i have used Html Agility Pack. Using RegEx is not good way to extract informations from HTML document, HTML document may be poorly written. You can't write a complete HTML parser, unless you have solid programming skills. Link to comment Share on other sites More sharing options...
eggburt Posted June 3, 2013 Share Posted June 3, 2013 What language are you using? The best ones I know of are (Python) Beautiful Soup (PHP) PHP Dom (Javascript) jQuery If you clarify what language you're after I'd be happy to provide a few more hints :-) Link to comment Share on other sites More sharing options...
David. Posted June 3, 2013 Author Share Posted June 3, 2013 It will depend what you can get from the site. Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t. Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them. Well, for GTAforums.com there is no mobile version and I don't think we'll have an RSS feed setup anytime soon. @fastman92 since I'm coding in Java, I've found that TagSoup and HTML Cleaner are pretty good. I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app? Link to comment Share on other sites More sharing options...
fastman92 Posted June 5, 2013 Share Posted June 5, 2013 It will depend what you can get from the site. Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t. Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them. Well, for GTAforums.com there is no mobile version and I don't think we'll have an RSS feed setup anytime soon. @fastman92 since I'm coding in Java, I've found that TagSoup and HTML Cleaner are pretty good. I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app? In case that server doesn't provide any data except pages, then it's necessary to parse HTML pages and get informations. You can find information by parsing a HTML with library and extracting them by selecting item by certain ID/class/name/tag or different way. Link to comment Share on other sites More sharing options...
eggburt Posted June 6, 2013 Share Posted June 6, 2013 I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app? Purely from a support point of view I'd try to find another option. Generally speaking if the site changes their layout in any way your app will break. What site is it you're looking to app-ify? If it was indeed GTAF your best bet would be the printable version of each page. For example this topic is http://www.gtaforums.com/index.php?showtopic=558931 You'd take the showtopic value and use it for the t value here http://www.gtaforums.com/index.php?act=Pri...rinter&t=558931 Which is more likely to remain the same layout for a long time. Something else to watch out for however is that in this case the printable page shows the entire topic. Your app may not last too long if it becomes popular due to the amount of load it could put the site under - and depending what the site's hosting plan is how much bandwidth it chomps down. I'd definitely recommend seeking permission from the site first, they may even have a better option for you to use you may not know about Link to comment Share on other sites More sharing options...
David. Posted June 7, 2013 Author Share Posted June 7, 2013 Thanks for all the help guys! I guess I'll try asking Mr. Tank in a mail or something if there are any additional options. Otherwise I'll just go ahead with the printable version like eggburt suggested. I'm rather new to Android development, and I thought it would be a fun project to try out-making an app for GTAF. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now